We propose a discrete transition-based reweighting analysis method (dTRAM) for analyzing configuration-space-discretized simulation trajectories produced at different thermodynamic states (temperatures, Hamiltonians, etc.) dTRAM provides maximum-likelihood estimates of stationary quantities (probabilities, free energies, expectation values) at any thermodynamic state. In contrast to the weighted histogram analysis method (WHAM), dTRAM does not require data to be sampled from global equilibrium, and can thus produce superior estimates for enhanced sampling data such as parallel/simulated tempering, replica exchange, umbrella sampling, or metadynamics. In addition, dTRAM provides optimal estimates of Markov state models (MSMs) from the discretized state-space trajectories at all thermodynamic states. Under suitable conditions, these MSMs can be used to calculate kinetic quantities (e.g., rates, timescales). In the limit of a single thermodynamic state, dTRAM estimates a maximum likelihood reversible MSM, while in the limit of uncorrelated sampling data, dTRAM is identical to WHAM. dTRAM is thus a generalization to both estimators.

The dynamics of complex stochastic systems are often governed by rare events – examples include protein folding, macromolecular association, or phase transitions. These rare events lead to sampling problems when trying to compute expectation values from computer simulations, such as molecular dynamics (MD) or Markov chain Monte Carlo (MCMC).

One approach to alleviate such sampling problems is to increase the rate at which the rare events occur by generating and combining simulations at different thermodynamic states. For example, proteins can easily be unfolded at high temperatures, and protein-ligand complexes can dissociate at artificial Hamiltonians, where protein and ligand have reduced interaction energies. Generalized ensemble methods, such as replica exchange molecular dynamics,1 parallel tempering,2–4 and simulated tempering5 exploit this observation by coupling simulations at different temperatures or Hamiltonians within a MCMC framework. Yet another example is umbrella sampling6 which uses a set of biased Hamiltonians to ensure approximately uniform sampling along a set of pre-defined slow coordinates.

All of the aforementioned enhanced sampling methods are constructed such that for long simulation times, the equilibrium distribution of each thermodynamic state will be sampled from. With that in mind, reweighting estimators make use of all simulation data by reweighting each probability density from the thermodynamic state sampled at to the condition of interest via the Boltzmann density. The most frequently used reweighting estimators are the weighted histogram analysis method (WHAM),7,8 bin-less WHAM,9 and the multi-state Bennett acceptance ratio (MBAR) method.10 The most common use of reweighting is to obtain equilibrium expectations or free energy differences. Reweighting can also be applied to obtain dynamical information from the available contiguous trajectory pieces.11 When the probability density of trajectories can be evaluated, MBAR can be applied to trajectories instead of sample configurations, obtaining estimates of dynamical expectations.12–15 Both WHAM and MBAR are statistically optimal under specific assumptions as they can be derived from maximum-likelihood or minimum variance principles.7,8,10,16 However, a key assumption of both estimators is that data are given as statistically independent samples of the respective equilibrium distributions. In reality, MD and MCMC simulations provide time-correlated data trajectories. Consequently, estimators exploiting the time-correlation in the data can achieve significantly better results.17–20 In particular, WHAM and MBAR cannot obtain unbiased estimates from datasets for which the initial conditions of harvested trajectories do not come from a probability density that is known a priori. Examples of such datasets are swarms of short uncoupled trajectories21–23 or Metadynamics and conformational flooding during the fill-up phase.24,25

A complementary approach to address sampling problems is Markov state modeling.11,26–32 A Markov state model (MSM) transition matrix contains conditional transition probabilities between discrete configuration states at some lagtime τ. Given the transition matrix, an equilibrium distribution can be computed that is unbiased if each transition event used for estimating the MSM originates from a local equilibrium distribution restricted to respective discrete state.33 In this case, MSMs are able to reweigh trajectories that are not sampled from global equilibrium without any a priori knowledge of the transition probabilities. In order to use MSMs for computing kinetics, τ must be long enough for transition events to approximately decorrelate for a given configuration space discretization.27,32,33 Since MSM estimators are purely based on observed transition statistics they cannot combine the information from different thermodynamic states. Therefore, the orders-of-magnitude speedup that can sometimes be achieved with enhanced sampling methods has not been accessible to MSMs as yet.

The transition-based reweighting analysis method (TRAM)18,19 aims at combining the advantages of reweighting estimators and MSMs. In Ref. 19, we have defined TRAM as a class of estimators that (1) take the statistical weights of samples at different thermodynamic states into account, in order to reweigh these samples; and (2) exploits transitions observed in the sampled trajectories, without assuming that these trajectories are sampled from equilibrium. Reference 34 introduced a statistically optimal estimator for non-equilibrium trajectories given that the statistical weight of each trajectory can be evaluated. Conceptually, an optimal TRAM estimator could be formulated from this principle. In practice, however, data are typically not stored at every integration time step, such that trajectory probability densities are not available.

In Ref. 18, we have introduced the first TRAM estimator that is applicable to practical molecular dynamics data, and could show that it can provide superior estimates of equilibrium probabilities and free energy compared with WHAM. However, the estimator in Ref. 18 is only approximately optimal and is very tedious to compute. Another TRAM estimator is presented in Ref. 20 and called the dynamic histogram analysis method (DHAM). DHAM was shown to avoid systematic errors that may occur when analyzing umbrella sampling with WHAM. DHAM uses a dynamical model (diffusion along a reaction coordinate) to relate transition matrices of simulations at different bias potentials. This assumption is helpful when a diffusion model is appropriate and there is one or a few slow reaction coordinates only (e.g., as in the case of umbrella sampling). In this way, it regularizes the solution and therefore makes the estimates statistically more stable. In Ref. 19, we introduced xTRAM, which does not assume a specific dynamical model but only exploits the fact that samples can be reweighted in order to couple different thermodynamic states. Moreover, xTRAM is as yet the only TRAM estimator that avoids the discretization of reweighting factors required in the other TRAM methods and as well as in WHAM. xTRAM may provide substantial improvements compared to MBAR for time-correlated data, is shown to be asymptotically exact, and is shown to converge to MBAR when the data are not time correlated.19 However, xTRAM was not derived from a maximum-likelihood or minimum-variance principle and is therefore probably not statistically optimal for finite data sets.

In the present paper, we provide for the first time a statistically optimal TRAM method by presenting a maximum-likelihood solution to the discrete TRAM problem formulated in Ref. 18. We derive a set of self-consistent equations whose solution yields the maximum likelihood dTRAM estimator. It is shown that the dTRAM solution is an asymptotically correct estimator, i.e., it converges to the exact equilibrium probabilities and transition probabilities with an increasing amount of simulation data. The dTRAM equations can be solved using a Newton method, as done before for WHAM and MBAR,35,36 or an easy-to-implement iterative algorithm provided here.

We show that dTRAM becomes identical to WHAM in the limit of statistically independent samples, i.e., sampled from the global equilibrium at each thermodynamic state. Moreover, dTRAM becomes identical to a reversible MSM when we have only a single thermodynamic state. A number of applications are shown to demonstrate the usefulness and versatility of dTRAM.

We assume that a set of MD or MCMC simulations have been performed, each in one of K thermodynamic states (indexed by the superscript k ∈ {1, …, K}). For simulations in which the thermodynamic state is frequently changed, such as in replica-exchange simulations, each contiguous sequence is treated as a separate trajectory at one of the K thermodynamic states. Furthermore, we assume that the data have been discretized to a configuration space partition (indexed by subscripts i, j ∈ {1, …, n}). We are primarily interested in the free energy, or equivalently, the equilibrium probability of discrete states in some unbiased or reference ensemble (πi)i = 1, …, n. In addition, we might be interested in the equilibrium probability of states under all biased ensembles. If the simulation trajectories are long enough, we will also be able to compute kinetic properties, as discussed later.

We will be dealing with simulations where the unbiased, or reference probability πi and the biased probability at simulation condition k,

$\pi _{i}^{(k)}$
πi(k) are related by a known and constant bias factor
$\gamma _{i}^{(k)}$
γi(k)

(1)
(2)

where f(k) is a normalization constant. Thus, the bias is multiplicative in probabilities or additive in the potential. This formalism is applicable whenever one has simulations conducted at different thermodynamic states, such as replica-exchange methods or umbrella sampling, but also direct simulations at different temperatures or Hamiltonians. In Sec. III, we will show how the reweighting factor

$\gamma _{i}^{(k)}$
γi(k) can be computed for a few selected examples. The most common analysis method used in the present scenario is WHAM. WHAM uses the histogram counts
$N_{i}^{(k)}$
Ni(k)
, i.e., the number of samples falling into bin i at thermodynamic state k. Although WHAM was originally derived as a minimum-variance estimator,7,8 it can be derived as a maximum-likelihood estimator16 using the likelihood

(3)

which simply assumes that every count

$N_{i}^{(k)}$
Ni(k) is independently drawn from the biased distribution
$\pi _{i}^{(k)}$
πi(k)
, which is linked to the unbiased distribution πvia Eq. (1).

Let us now turn to reversible Markov state models.31,32,37,38 The maximum likelihood Markov model is the transition matrix P = (pij) between n discrete configuration states, that maximizes the likelihood of the observed transitions between these states. The likelihood of a Markov model is well known,39 and simply a product of all transition probabilities corresponding to the observed trajectory. To obtain a reversible Markov state model, this likelihood is maximized using the constraints that the transition probabilities pij must fulfill detailed balance with respect to the equilibrium distribution π

(4)
(5)

where cij is the number of times the trajectories were observed in state i at time t and in state j at a later time t + τ, where τ is the lag time at which the Markov model is estimated. For a MSM, all simulation data producing counts cij, have to be generated at the same thermodynamic state (e.g., temperature, Hamiltonian), and the estimated P is then only valid for this thermodynamic state. The reversibility of the MSM is ensured by the constraint equations (5). Estimators that maximize Eqs. (4) and (5) usually provide both P and the equilibrium distribution π.32,38

In TRAM, we combine these two approaches as follows: we avoid the WHAM assumption that every count is sampled from global equilibrium, and instead treat every trajectory at thermodynamic condition k as a Markov chain with the configuration-state transition counts

$c_{ij}^{(k)}$
cij(k)⁠. However, in contrast to Markov models we exploit the fact that equilibrium probabilities can be reweighted between different thermodynamic states via (1) and (2). The resulting likelihood of all P(k) and π, based on simulations at all thermodynamic states can be formulated as

(6)
(7)

Here,

$\mathbf {P}^{(k)}=(p_{ij}^{(k)})$
P(k)=(pij(k)) is the Markov transition matrix at thermodynamic state k, and
$c_{ij}^{(k)}$
cij(k)
are the number of transitions observed at that simulation condition. π(k) is the vector of equilibrium probabilities of discrete states at each thermodynamic state. Note that all of these K equilibrium distributions are coupled through Eqs. (1) and (2). Because each Markov model P(k) must have the distribution π(k) as a stationary distribution, all Markov models are coupled too. This is what makes the maximization of the TRAM likelihood Eqs. (6) and (7) difficult, and it can neither be achieved by WHAM, nor by existing MSM estimators. We call Eqs. (1), (2), (6), and (7) the discrete TRAM problem. In the present paper, we will obtain the reweighting factors
$\gamma _{i}^{(k)}$
γi(k)
in Eqs. (1) and (2) by a configuration space discretization or binning, such as in WHAM. For this reason, we call the present solution a discrete TRAM method, which should be distinguished from approaches where the reweighting is done for individual samples.10,19

We will seek the maximum likelihood of Eq. (6). As in common practice, we work with the logarithm of the likelihood, because it has the same maximum point as the likelihood but can be treated more easily

(8)

Moreover, we have the following constraints. Using detailed balance Eq. (7) with the reweighting equations (1) and (2) results in

(9)

Note that the normalization factors, f(k), have cancelled. In addition, P(k) should be a transition matrix and π should be a probability vector, so we have the normalization conditions

(10)
(11)

The normalization of π(k) is achieved by the normalization constants in Eqs. (1) and (2).

In order to solve the discrete TRAM problem, we have to maximize the log likelihood (8) under the constraints (9)–(11). The variables are both the unbiased equilibrium probabilities π (providing n − 1 variables due to the constraint (17)), and the biased transition matrices P(k) (each having n(n − 1)/2 remaining free variables that are not fixed by constraints (9) and (10)).

Note that changing the simulation conditions, such as bias or temperature, will modify the transition probabilities in a non-trivial way that depends on the simulation condition, the integrator and thermostat used, and the state space discretization. Therefore, we cannot relate the different P(k) without restricting the generality of our estimator. The only general connection between these Markov models is the coupling of their equilibrium distributions via Eqs. (1) and (2).

In Appendices  A 1–A 3, we use Lagrange duality theory to show that the optimal solution of the discrete TRAM problem above fulfills the following two conditions:

(12)
(13)

where

$v_{i}^{(k)}$
vi(k) are unknown Lagrange multipliers. In the setting with detailed balance, we can unfortunately not give a closed expression for them, but we can optimize them along with the equilibrium distribution π. Note that the equations above do not require the transition probabilities
$p_{ij}^{(k)}$
pij(k)
to be computed explicitly. If these are desired, they can be subsequently computed from the solution of Eqs. (12) and (13) (see Sec. II D below).

In Appendix A 4, we prove that the dTRAM equations above are asymptotically correct. This means that in the limit of a lot of simulation data – either realized by long trajectories or many short trajectories – the estimate will converge to the correct stationary distributions π and π(k).

We can rewrite the self-consistent equations (12) and (13) to derive the following iteration (fixed-point method), that can be used to numerically solve the discrete TRAM problem. First, we initialize π and v(k) by the simple guess

(14)
(15)

and then we iterate the following equations until π is converged:

(16)
(17)

Instead of the simple 1/n initialization for πi in Eq. (14), we could use the standard WHAM algorithm to obtain a much better guess.7,8 While a better starting point might be relevant for optimizing computational performance, we have not observed the estimation result to depend on this choice.

As an alternative to the fixed-point iteration (16) and (17), we can solve Eqs. (12) and (13) by using the multidimensional Newton method for root finding available in many numerics packages.

Given π and v(k) at their optimal values, the transition probabilities can be computed for any thermodynamic state k simulated at by

(18)

See Appendix A 2 for the derivation. In Eq. (18), we have explicitly stated that transition counts, and hence the transition probabilities are estimated at a given lag time τ. As a consequence of the asymptotic correctness of dTRAM (Appendix A 4), the estimates of

$p_{ij}^{(k)}(\tau )$
pij(k)(τ) are also asymptotically correct, that is, for either long trajectories or many short trajectories we will get an unbiased estimate of the transition probabilities.

In order to compute kinetics, such as transition rates or timescales, the transition matrices P(k) do not only have to be valid for the lag time τ estimated at, but they have to be Markov models that predict the kinetics at longer times correctly. How adequate P(k) is as a Markov model should be tested by validating that the relaxation timescales computed from the eigenvalues of P(k) are approximately constant in τ27 and by checking that the Chapman-Kolmogorov P(k)(nτ) ≈ [P(k)(τ)]n approximately holds.32 

The P(k) can only be used as Markov models when the contiguous simulation trajectories are long enough to support a suitable lag time τ. Generalized ensemble simulations, such as replica-exchange, parallel or simulated tempering generally only provide very short contiguous trajectory pieces and are only suitable for constructing Markov models of small systems and using excellent configuration state discretizations.11,13,14

Based on umbrella sampling simulations, the construction of Markov models at the different umbrellas k is usually possible, but for the unbiased system we can only obtain the equilibrium distribution π and not the Markov model. The reason is that the transition matrices (18) can only be estimated at the different simulation conditions k, whereas the equilibrium probability of the chosen reference ensemble π is computed through reweighting, and is thus also available for thermodynamic states not simulated at. However, the umbrella-Markov models P(k) could still provide useful information. For example, comparing the longest relaxation timescale of each umbrella Markov model with the respective simulation length could be used as an indicator of convergence, and whether some or all simulation lengths should be increased.20 

Unbiased MD simulations at different thermodynamic states are most suitable for constructing Markov models, because one has the choice of running simulations long enough to accommodate a suitable lag time τ. A systematic way of constructing such simulations is the random swapping protocol.19 Note that such simulations may not only violate the sampling from global equilibrium, but also the sampling from local equilibrium, it is possible that the estimation of π and all associated stationary estimates are biased for short lag times τ. Therefore, when using dTRAM to analyze unbiased MD simulations at different thermodynamic states, one should definitely compute the estimates as a function of τ in order to ensure that a large enough τ is used to obtain unbiased estimates.

We now show that the commonly used WHAM method is obtained as a special case of dTRAM. Starting from the dTRAM estimator, we employ the WHAM assumption that each sample at thermodynamic state k is independently generated from the biased probability distribution π(k). This means that transition probabilities

$p^{(k)}_{ij}$
pij(k) are equal to the probability of observing state j without knowledge of i

(19)

In a setting where counts are generated independently, the transition counts

$c_{ij}^{(k)}$
cij(k) can be modeled by splitting up the total counts ending in bin j according to the equilibrium probability that they have been in a given bin i before

(20)

Note that this selection generates actually observed histogram counts as

$\sum _{i}c_{ij}^{(k)}=N_{j}^{(k)}\sum _{i}\pi _{i}^{(k)}=N_{j}^{(k)}$
icij(k)=Nj(k)iπi(k)=Nj(k)⁠. Substituting
$\pi _{j}^{(k)}$
πj(k)
in (19) and (20) using (1) and (2) and inserting the result into Eq. (18) yields the equalities

(21)

which must hold for all i and k. This is exactly the case when the Lagrange multipliers become

(22)

Substituting (20) and (22) into (17) gives us the solution for the unbiased stationary probabilities

(23)
(24)

which is exactly the WHAM algorithm.7,8 Therefore, WHAM is a special case of dTRAM, suggesting that TRAM should yield estimates that are at least as good as WHAM, but should give better estimates when the WHAM assumptions of sampling from global equilibrium at condition k does not hold.

Now we relate dTRAM to reversible Markov models. Suppose we only have a single thermodynamic state k and one or several simulation trajectories generating counts cij at this condition. In this case, we can drop the index k, all reweighting factors are unity γi ≡ 1, and Eqs. (12) and (13) become

(25)
(26)

We can combine both equations to

(27)

Thus, we solve for the Lagrange multipliers

(28)

Substituting vi = Ni into (13) leads to the optimality condition for πi

(29)

Inserting the result into (18) yields the reversible transition matrix estimator

(30)

which is identical to the known optimality condition for a reversible Markov model transition matrix and the corresponding iterative estimator.32,38 Therefore, a reversible MSM is a special case of dTRAM.

We start with a simple example to illustrate a scenario in which the classical WHAM estimator fails because the assumption of sampling from global equilibrium is not fulfilled. Fig. 1(a) shows the energy levels (uA = 4, uTS = 8, uB = 0) of a discrete three-state system. We consider a Metropolis-Hastings jump process between neighboring states. Due to the high-energy transition state TS, the minima A and B are long-lived and escaping them is a rare event. Now we run a simulation consisting of three independent trajectories:

  1. An unbiased trajectory of length L starting state A,

  2. An unbiased trajectory of length L starting state B,

  3. A trajectory of length L using bias (bA = 4, bTS = 0, bB = 8) starting in state TS.

FIG. 1.

Comparison of WHAM and dTRAM estimation results using three independent trajectories sampling from three discrete states (two unbiased trajectories starting in A and B, respectively, and biased trajectory sampling from a flat energy landscape over A, TS, and B). (a) Schematic of the energies of three states. Error bars correspond to one standard deviation of the estimate computed over 25 independent runs. (b) Estimation of the energies of A, TS, and B using WHAM (solid lines) and dTRAM (dashed lines).

FIG. 1.

Comparison of WHAM and dTRAM estimation results using three independent trajectories sampling from three discrete states (two unbiased trajectories starting in A and B, respectively, and biased trajectory sampling from a flat energy landscape over A, TS, and B). (a) Schematic of the energies of three states. Error bars correspond to one standard deviation of the estimate computed over 25 independent runs. (b) Estimation of the energies of A, TS, and B using WHAM (solid lines) and dTRAM (dashed lines).

Close modal

The biased trajectory 3 samples from an energy landscape that is flat over A, B, and TS. Trajectories 1 and 2 will be stuck in states A or B, respectively, and only be able to escape them when L is sufficiently long.

Trajectories 1 and 2 are using the same unbiased Hamiltonian and are therefore in the same thermodynamic state k = 1. The corresponding reweighting factor is simply

for all states i = A, TS, B. Trajectory 3 is in a different thermodynamic state with a biased Hamiltonian that we shall call k = 2. We use the bias as a reweighting factor

for all states i = A, TS, B. In this way, we can reweigh all samples between the two thermodynamic states and produce estimates of the energies uA, uTS, uB using both WHAM with Eqs. (23) and (24) and dTRAM with Eqs. (16) and (17).

Because states A and B are long-lived, we expect that the unbiased trajectories 1 and 2 cannot sample from the global equilibrium, unless the simulation times L is very long. As a result, the WHAM estimator is strongly biased initially and systematically underestimates the energy differences – see solid lines in Fig. 1(b). Only after about L = 10 000 steps, the bias of the WHAM estimator is negligible. In contrast, the dTRAM estimate is unbiased even for short simulation lengths – see dashed lines in Fig. 1(b). dTRAM does not suffer from the WHAM bias because each transition is indeed independent from the previous transition, even when the simulation does not sample from global equilibrium. Moreover, the uncertainty of the dTRAM estimate is much smaller than the uncertainty of the WHAM estimate. This is because every transition is an independent sample, and therefore dTRAM benefits from a much larger statistical efficiency than WHAM.

While illustrative, this example is over-simplistic. In reality, we do not have discrete states and the dynamics is no Markov chain. Let us investigate next how dTRAM performs in a more realistic setting.

In umbrella sampling, one introduces for each simulation k an additive bias potential, such that our total potential in simulation k is given by

(31)

where i is a bin index of one or several finely discretized coordinates in which the bias potential is acting. ui is the unbiased potential evaluated at bin i (usually at its center), and

$b_{i}^{(k)}$
bi(k) is the value of the kth umbrella (bias) potential. All energies are dimensionless (i.e., in units of the thermal energy kBT). The biased equilibrium probabilities are proportional to

(32)

with the unbiased equilibrium probability and the reweighting factors given by

(33)
(34)

Umbrella sampling is typically employed using stiff constraining potentials, such that the simulations quickly decorrelate. In such a scenario, WHAM is a suitable estimator for extracting unbiased equilibrium probabilities and free energy differences. However, an analysis of umbrella sampling data can indeed benefit from using dTRAM. First, dTRAM uses conditional transition events, and the number of statistically independent observations thus depends on a lag time τ which can be thought of as a local decorrelation time, i.e., a time at which transitions become statistically independent. This time is generally shorter than the global decorrelation time (or statistical inefficiency) tcorr, at which samples generated by the trajectory become statistically independent. Thus, whenever tcorr is longer than the interval at which configurations are saved, dTRAM will be able to exploit that this data more efficiently than WHAM. This should lead to improved estimates.

Moreover, umbrella sampling can, in conjunction with WHAM, result in systematic errors that can be avoided with a TRAM estimator: As shown in Ref. 20, using umbrellas that are too weak to stabilize the simulation at a transition state can lead to umbrella simulations that still have high internal free energy barriers and therefore do not generate samples that are drawn from the respective equilibrium distribution. Although a WHAM estimate would be correct in the limit of infinitely long simulation times, it may provide drastically biased estimates for practical simulation times.

Here, we employ umbrella sampling simulations on a one-dimensional double-well potential with a dimensionless energy

(35)

illustrated in Figs. 2(b) and 2(c). The configuration space is discretized using a one-dimensional binning: All x-values are assigned to the closest point from the set { − 5, −4.9, …, 4.9, 5}, generating up to 101 discrete states. However, in practice only about 80 states are visited and we exclude empty bins from the analysis. Umbrella sampling simulations are conducted using K = 11 different biasing potentials given by

(36)

where

$\bar{x}^{(k)}=k-6$
x¯(k)=k6 is the center of the kth biasing potential. The dynamics are simulated using the Metropolis process described in Appendix  B. The bias potentials are chosen relatively weak, such that the umbrella simulations near the transition state contain rare events. In this case, WHAM is a poor estimator because it takes relatively long to generate statistically independent samples.

FIG. 2.

Estimation results of WHAM, the present method (dTRAM), and our early approximate TRAM method (aTRAM) from Ref. 18 based on umbrella sampling simulations in a double-well potential. (a) Mean and standard deviation of the energy barrier estimation error calculated over 30 independent umbrella sampling runs with K = 11 umbrellas each. The x-axis shows the number of steps in an umbrella trajectory. (b) Mean and standard deviation of estimates of the potential u(x) generated by WHAM, dTRAM, and approximate TRAM using a trajectory length of 500. (c) Same as (b), but with a trajectory length of 10 000.

FIG. 2.

Estimation results of WHAM, the present method (dTRAM), and our early approximate TRAM method (aTRAM) from Ref. 18 based on umbrella sampling simulations in a double-well potential. (a) Mean and standard deviation of the energy barrier estimation error calculated over 30 independent umbrella sampling runs with K = 11 umbrellas each. The x-axis shows the number of steps in an umbrella trajectory. (b) Mean and standard deviation of estimates of the potential u(x) generated by WHAM, dTRAM, and approximate TRAM using a trajectory length of 500. (c) Same as (b), but with a trajectory length of 10 000.

Close modal

In order to apply dTRAM, we evaluate all bias potentials for each discrete state, compute the reweighting factors according to Eq. (34), and store this information in a reweighting matrix. We additionally store all state-to-state transition counts

$c_{ij}^{(k)}$
cij(k) for each umbrella simulation k. Given these data the dTRAM estimation is computed by iterating Eqs. (16) and (17) to convergence.

The performances of WHAM and dTRAM are compared in terms of the mean error of the estimated energy barriers

(37)

where ΔuAB and ΔuBA are the energy barriers for the AB and BA process, respectively, and the superscript “approx” represents the approximate value obtained from the estimated u(x). Fig. 2(a) compares the energy barrier estimation error using WHAM (black) and dTRAM (red) as a function of the length of the umbrella simulations. In addition, estimation results of our earlier approximate TRAM method from Ref. 18 are shown in blue. Figs. 2(b) and 2(c) show the energy profile estimated from the data using trajectory lengths of 910 and 10 000 steps per umbrella, respectively. All means and standard deviations are obtained by repeating the simulation 30 times.

It is apparent that both estimators converge to the correct energy barriers and energy profile in the limit of a large amount of simulation data. For short simulations, the dTRAM estimates are significantly better than the WHAM and approximate TRAM estimates – or in other words much less simulation data are needed with dTRAM to obtain estimates of equal quality than when WHAM or approximate TRAM are used.

In order to demonstrate the validity of dTRAM on molecular dynamics data of a large protein system, we have used it to analyze umbrella sampling simulations of the passage of Na+ ions through the transmembrane pore of the GLIC channel (Fig. 3). The data were generated in the simulations of Zhu and Hummer.36 WHAM and dTRAM provide a similar free energy profile (Fig. 3(b)), but for a given number of bins used to discretize the membrane normal used as a reaction coordinate, dTRAM provides a smaller systematic error. The systematic error is measured in terms of the energy difference calculated for the two end-states which should be 0 as a result of the periodic boundary conditions used in the simulation (Fig. 3(c)).

FIG. 3.

Estimation results of WHAM and dTRAM based on umbrella sampling simulations of Na+ passage through a GLIC channel using simulations of Zhu and Hummer.36 (a) Structure of the ion channel. (b) Free energy profile computed by WHAM and dTRAM when using 400 bins to discretize the reaction coordinate (membrane normal). (c) Systematic estimating error of the energies of the end-states, measured by its difference from 0, as a function of the number of discretization bins used.

FIG. 3.

Estimation results of WHAM and dTRAM based on umbrella sampling simulations of Na+ passage through a GLIC channel using simulations of Zhu and Hummer.36 (a) Structure of the ion channel. (b) Free energy profile computed by WHAM and dTRAM when using 400 bins to discretize the reaction coordinate (membrane normal). (c) Systematic estimating error of the energies of the end-states, measured by its difference from 0, as a function of the number of discretization bins used.

Close modal

We have derived a maximum likelihood estimator for the TRAM. This estimator optimally combines simulation trajectories produced at different thermodynamic states. This is done by taking into account both the time correlations in the trajectories via transition counts, and by considering that the weight of every configuration can be given in any thermodynamic state via the Boltzmann distribution. The present estimator operates a configuration space discretization, and in particular also discretizes the reweighting factors between different thermodynamic states. Hence, we call the present method discrete TRAM, or in short dTRAM.

dTRAM combines ideas from the WHAM and reversible MSMs. We have shown that dTRAM is in fact a proper generalization of both methods, i.e., both the WHAM estimator and the reversible MSM estimator can be derived as special cases from the dTRAM equations. Consequently, dTRAM can be applied to any kind of simulation data that either WHAM or MSMs can be applied to. In particular, dTRAM is useful for getting improved estimates from general-ensemble simulations (replica-exchange molecular dynamics, parallel or simulated tempering), multiple biased simulations (umbrella sampling, metadynamics, conformational flooding, etc.). Like MSMs, dTRAM is useful for obtaining estimates from swarms of short simulations that are not in global equilibrium, but beyond MSMs it can do so for trajectories produced under different thermodynamic conditions, such as temperatures or Hamiltonians.

dTRAM provides estimates for both the equilibrium distribution at a thermodynamic state of interest, and the kinetics at the thermodynamic states simulated at. The kinetic estimates should be treated with care as they are only useful when the data admit the choice of a lagtime that is sufficiently long to parametrize a Markov model that can predict long-term kinetics. For such estimates, it should be checked whether they are converged as a function of the lag time, as it is common practice in Markov state modeling. We will investigate the suitability of dTRAM to estimate kinetic data in future work. In the present paper, we have only worked with short trajectory segments, and hence have only used dTRAM in order to estimate equilibrium distributions and free energies. It has been demonstrated that dTRAM provides superior estimates of these properties compared to WHAM when applied to biased simulations.

The present reweighting formulation is straightforwardly applied when the configuration space can be discretized in such a way that the bias factor

$\gamma _{i}^{(k)}$
γi(k) is approximately constant within each bin. Although such a discretization is easily done for umbrella sampling while biasing one or two coordinates, it is unsuitable in other cases, such as replica-exchange simulations. It has been suggested to construct a joint discretization in configuration and energy space.40 An optimal solution to the problem will involve a self-consistent evaluation of
$\gamma _{i}^{(k)}$
γi(k)
in a way that integrates the sampled configurations assigned to bin i with a MBAR-like approach.10 

A python implementation of the dTRAM method is available under https://github.com/markovmodel/pytram.

The authors thank the following colleagues for enlightening discussions: John D. Chodera (MSKCC, New York), Gerhard Hummer (MPI for Biophysics, Frankfurt), Alessandro Laio (SISSA, Trieste), Roland R. Netz (FU Berlin), and the entire CMB team at FU Berlin. We thank Fangqiang Zhu for providing us with the umbrella sampling simulation data of the GLIC ion channel. This work was funded by the following grants: WU 744/ 1-1 and Deutsche Forschungsgemeinschaft (DFG) SFB 1114 (Wu), DFG SFB 740 and 958 (Mey), ERC grant “pcCell” (Noé).

1. Lagrange duality

For a detailed description of Lagrange duality theory, we refer to textbooks in Refs. 41,42. Here, we just summarize a few aspects of the theory relevant for solving the TRAM problem.

Consider a constrained minimization problem of the following form:

(A1)

The Lagrangian function can be defined by

(A2)

If (A1) is a convex problem and some technical assumption such as Slater condition holds, it can be proven that (A1) is equivalent to the following maximization problem:

(A3)

where g(v) is the Lagrange dual function defined by

(A4)

The equivalence here means the optimal values of the two problems are equal, i.e., the solution x* to (A1) and the solution v* to (A3) satisfies g(v*) = f(x*). Moreover,

(A5)
2. Single thermodynamic state

We first consider the dual Lagrange approach to solving dTRAM at a single thermodynamic state, i.e., the situation that is equivalent with a reversible Markov state model. Here, we first ignore the normalization of the stationary distribution π, i.e., ∑i πi may not be equal to 1. The maximum likelihood estimation of the transition matrix P with a fixed equilibrium distribution π is given by the following optimization problem:

(A6)

where

(A7)

is the log-likelihood function of P. By using the Lagrange duality theory, we have the following lemma:

Lemma: The minimization problem (A6) is equivalent to the maximization problem

(A8)

and the optimal solution P* to (A6) and the optimal solution v* to (A8) satisfy

(A9)

Proof: The Lagrangian function of (A6) is defined as

(A10)

Note that

(A11)

Then the dual function of (A6) is

(A12)

and (A6) is equivalent to the maximization problem

(A13)

From (A12) we can get

(A14)

and

(A15)

Therefore,

(A16)

and the optimal value of (A6) is equal to

(A17)

and

(A18)

where

(A19)

We now consider the case that the stationary distribution is unknown. In this case, the maximum likelihood estimate of π is

(A20)
3. Multiple simulations

If we further consider multiple biased simulations with

$\pi _{i}^{(k)}\propto \gamma _{i}^{(k)}\pi _{i}$
πi(k)γi(k)πi⁠, we can also conclude that the maximum likelihood estimate of π can be given by the following min-max problem:

(A21)

For the sake of convenience, here we define ui = −ln πi. Then (A21) can be rewritten as

(A22)

Because h(exp ( − u), v) is a concave function of v if u is fixed and a convex function of u if v is fixed, the solution to (A22) is a saddle point and characterized by (see Sec. 10.3.4 in Ref. 43)

(A23)
(A24)

Considering that

(A25)

and

(A26)
(A27)

We can conclude that the optimal solution to (A21) should satisfy

(A28)
(A29)

This leads to the following iterative algorithm for unbiased estimation of multiple simulations:

(A30)
(A31)
4. Asymptotic correctness of dTRAM

Here, we show that dTRAM converges to the correct equilibrium distribution and transition probabilities in the limit of large statistics. In this limit, either achieved through long simulation trajectories or many short simulation trajectories, the count matrices

$\mathbf {C}^{(k)}=[c_{ij}^{(k)}]$
C(k)=[cij(k)] become

(A32)

where

$\bar{\mathbf {P}}^{(k)}=[\bar{p}_{ij}^{(k)}]$
P¯(k)=[p¯ij(k)] is the matrix of exact transition probabilities (no statistical error), which satisfies

(A33)

where

$\bm{\pi }=[\bar{\pi }_{i}]$
π=[π¯i] are the exact stationary probabilities of configuration states. Inserting (A32) into the dTRAM equations (12) and (13), we obtain

(A34)
(A35)
(A36)
(A37)
(A38)

and thus the first dTRAM equation is satisfied. Furthermore, we obtain

(A39)
(A40)
(A41)
(A42)
(A43)

and thus the second dTRAM equation is satisfied as well. From the above two equations, we can conclude that in the statistical limit (either achieved by long trajectories or many short trajectories), the solution of the dTRAM equations converges to the correct equilibrium distribution and the correct transition probabilities. Note that we have assumed that all trajectory data are in local equilibrium within each starting state i – if this is not the case the counts cij and thus also the estimated πi and pij will be biased. Thus, if the data are of such a nature that local equilibrium is a concern (e.g., metadynamics or uncoupled short MD trajectories), all estimates should be computed as a function of the lag time τ.

The simulation trajectory

$\lbrace x_{t}^{(k)}\rbrace _{t=0}^{M}$
{xt(k)}t=0M is generated by a Metropolis simulation model, which is a reversible Markov chain with initial state
$x_{0}^{(k)}=\bar{x}^{(k)}$
x0(k)=x¯(k)
, stationary distribution
$\pi _{i}^{(k)}\propto \mathrm{e}^{-u_{i}-b_{i}^{(k)}}$
πi(k)euibi(k)
, and transition probability

(B1)

where q(x|x) denotes the proposal distribution which satisfies

$q(x^{\prime }|x)\propto 1_{|x^{\prime }-x|\le 0.2}$
q(x|x)1|xx|0.2 and
$\sum _{x^{\prime }}q(x^{\prime }|x)=1$
xq(x|x)=1
.

1.
C. G.
Greyer
, in
Proceedings of the 23rd Symposium of the Interface on Computing Science and Statistics, Interface Foundation, Fairfax
(
American Statistical Association
,
New York
,
1991
), pp.
156
163
.
2.
K.
Hukushimi
and
K.
Nemoto
,
J. Phys. Soc. Jpn.
65
,
1604
(
1996
).
3.
U. H.
Hansmann
,
Chem. Phys. Lett.
281
,
140
(
1997
).
4.
Y.
Sugita
and
Y.
Okamato
,
Chem. Phys. Lett.
314
,
141
(
1999
).
5.
E.
Marinari
and
G.
Parisi
,
Euro. Phys. Lett.
19
,
451
(
1992
).
6.
G. M.
Torrie
and
J. P.
Valleau
,
J. Comput. Phys.
23
,
187
(
1977
).
7.
A. M.
Ferrenberg
and
R. H.
Swendsen
,
Phys. Rev. Lett.
63
,
1195
(
1989
).
8.
S.
Kumar
,
D.
Bouzida
,
R. H.
Swendsen
,
P. A.
Kollman
, and
J. M.
Rosenberg
,
J. Comput. Chem.
13
,
1011
(
1992
).
9.
M.
Souaille
and
B.
Roux
,
Comput. Phys. Commun.
135
,
40
(
2001
).
10.
M. R.
Shirts
and
J. D.
Chodera
,
J. Chem. Phys.
129
,
124105
(
2008
).
11.
S.
Sriraman
,
I. G.
Kevrekidis
, and
G.
Hummer
,
J. Phys. Chem. B
109
,
6479
(
2005
).
12.
D. D. L.
Minh
and
J. D.
Chodera
,
J. Chem. Phys.
131
,
134110
(
2009
).
13.
J. D.
Chodera
,
W. C.
Swope
,
F.
Noé
,
J.-H.
Prinz
, and
V. S.
Pande
,
J. Phys. Chem.
134
,
244107
(
2011
).
14.
J.-H.
Prinz
,
J. D.
Chodera
,
V. S.
Pande
,
W. C.
Swope
,
J. C.
Smith
, and
F.
Noé
,
J. Chem. Phys.
134
,
244108
(
2011
).
15.
W.
Du
and
P. G.
Bolhuis
,
J. Chem. Phys.
139
,
044105
(
2013
).
17.
S.
Sakuraba
and
A.
Kitao
,
J. Comput. Chem.
30
,
1850
(
2009
).
18.
H.
Wu
and
F.
Noé
,
SIAM Multiscale Model. Simul.
12
,
25
(
2014
).
19.
A. S. J. S.
Mey
,
H.
Wu
, and
F.
Noé
,
Phys. Rev. X
4
,
041018
(
2014
).
20.
E.
Rosta
and
G.
Hummer
, “
Free energies from dynamic weighted histogram analysis using unbiased Markov state model
,”
J. Chem. Theory Comput.
(submitted).
21.
M.
Shirts
and
V. S.
Pande
,
Science
290
,
1903
(
2000
).
22.
F.
Noé
,
C.
Schütte
,
E.
Vanden-Eijnden
,
L.
Reich
, and
T.
Weikl
,
Proc. Natl. Acad. Sci. U.S.A.
106
,
19011
(
2009
).
23.
I.
Buch
,
T.
Giorgino
, and
G.
de Fabritiis
,
Proc. Natl. Acad. Sci. U.S.A.
108
,
10184
(
2011
).
24.
A.
Laio
and
M.
Parrinello
,
Proc. Natl. Acad. Sci. U.S.A.
99
,
12562
(
2002
).
25.
H.
Grubmüller
,
Phys. Rev. E
52
,
2893
(
1995
).
26.
C.
Schütte
,
A.
Fischer
,
W.
Huisinga
, and
P.
Deuflhard
,
J. Comput. Phys.
151
,
146
(
1999
).
27.
W. C.
Swope
,
J. W.
Pitera
, and
F.
Suits
,
J. Phys. Chem. B
108
,
6571
(
2004
).
28.
N.
Singhal
and
V. S.
Pande
,
J. Chem. Phys.
123
,
204909
(
2005
).
29.
F.
Noé
,
I.
Horenko
,
C.
Schütte
, and
J. C.
Smith
,
J. Chem. Phys.
126
,
155102
(
2007
).
30.
J. D.
Chodera
,
K. A.
Dill
,
N.
Singhal
,
V. S.
Pande
,
W. C.
Swope
, and
J. W.
Pitera
,
J. Chem. Phys.
126
,
155101
(
2007
).
31.
N. V.
Buchete
and
G.
Hummer
,
J. Phys. Chem. B
112
,
6057
(
2008
).
32.
J.-H.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J. D.
Chodera
,
C.
Schütte
, and
F.
Noé
,
J. Chem. Phys.
134
,
174105
(
2011
).
33.
M.
Sarich
,
F.
Noé
, and
C.
Schütte
,
SIAM Multiscale Model. Simul.
8
,
1154
(
2010
).
34.
Z.
Tan
,
J. Stat. Plan. Inf.
138
,
1967
(
2008
).
35.
Z.
Tan
,
E.
Gallicchio
,
M.
Lapelosa
, and
R. M.
Levy
,
J. Chem. Phys.
136
,
144102
(
2012
).
36.
F.
Zhu
and
G.
Hummer
,
J. Comput. Chem.
33
,
453
(
2012
).
37.
F.
Noé
,
J. Chem. Phys.
128
,
244103
(
2008
).
38.
G. R.
Bowman
,
K. A.
Beauchamp
,
G.
Boxer
, and
V. S.
Pande
,
J. Chem. Phys.
131
,
124101
(
2009
).
39.
J. R.
Norris
,
Markov Chains
,
Cambridge Series in Statistical and Probabilistic Mathematics
Vol.
2
(
Cambridge University Press
,
Cambridge
,
1998
).
40.
E.
Gallicchio
,
M.
Andrec
,
A. K.
Felts
, and
R. M.
Levy
,
J. Phys. Chem. B
109
,
6722
(
2005
).
41.
J.-B.
Hiriart-Urruty
and
C.
Lemaréchal
,
Convex Analysis and Minimization Algorithms I: Fundamentals
(
Springer
,
Heidelberg
,
1993
).
42.
J.-B.
Hiriart-Urruty
and
C.
Lemaréchal
,
Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods
(
Springer
,
Heidelberg
,
1993
).
43.
S.
Boyd
and
L.
Vandenberghe
,
Convex Optimization
(
Cambridge University Press
,
2004
).