Reversibility is a key concept in Markov models and master-equation models of molecular kinetics. The analysis and interpretation of the transition matrix encoding the kinetic properties of the model rely heavily on the reversibility property. The estimation of a reversible transition matrix from simulation data is, therefore, crucial to the successful application of the previously developed theory. In this work, we discuss methods for the maximum likelihood estimation of transition matrices from finite simulation data and present a new algorithm for the estimation if reversibility with respect to a given stationary vector is desired. We also develop new methods for the Bayesian posterior inference of reversible transition matrices with and without given stationary vector taking into account the need for a suitable prior distribution preserving the meta-stable features of the observed process during posterior inference. All algorithms here are implemented in the PyEMMA software — http://pyemma.org — as of version 2.0.

Markov models, Markov state models (MSMs), or master-equation models are a powerful framework to reduce the great complexity of bio-molecular dynamics to a simple kinetic description that represents the underlying transitions between distinct conformations.1–7 These models allow us to analyze the longest-living (metastable) sets of structures,8 the effective transition rates between them,9,10 the kinetic relaxation processes and their relationship to equilibrium kinetics experiments,7,11–14 and the thermodynamics and kinetics over multiple thermodynamic states.15–18 A key advantage of MSMs is that they are estimated from conditional transition statistics between states, and they thus do not require the data to be in global equilibrium across all states. As a result, they are an excellent tool to integrate the data of multiple simulation trajectories that have been run independently and from different initial states into a single informative model.19–21 

A variety of complex molecular processes have been successfully described using MSMs. Examples include the folding of proteins into their native folded structure,20,22,23 the dynamics of natively unstructured proteins,24,25 and the binding of a ligand to a target protein.21,26–30

There are two key steps in the construction of a MSM. At first, a suitable discretization of the continuous conformation space has to be obtained. In most cases, no good a priori discretization is known and the discretization has to be found based on the simulation data. The appropriate choice of discretization is a topic of ongoing research.24,31,32 The error incurred by the discretization and by the subsequent approximation of the jump-process as a Markov process can be systematically controlled and evaluated.7,33–35

In the second step, one estimates the transition probabilities between pairs of states based on the transition statistics. The most common approach to estimating Markov models from data is by means of a Bayesian framework. One first harvests the transition counts, cij(τ) from the data, i.e., how often trajectories were found in discrete state i at some time t and in discrete state j at some later time t + τ. The parameter τ is called lag time and is crucial for the quality of the Markov model.7,33 Next, one computes the transition matrix either by maximizing the likelihood, i.e., the probability over all possible Markov model transition (or rate) matrices that may have generated the observed transition counts7,36,37 or by sampling Markov models from the posterior distribution.38–42 A maximum likelihood estimate gives a single-point estimate, i.e., a single Markov model that is “most representative” given the data. However, if some transition events are rare compared to the total simulation length — and this is the typical case in molecular dynamics simulation — this maximum likelihood model might be very uncertain and thus far away from the model that the one would converge to by increasing amount of simulation data. The Bayesian posterior ensemble is a natural approach to quantify such statistical uncertainties and thus to make meaningful comparisons between a Markov model obtained from different sets of simulations or to experimental data.

A key property of molecular dynamics at thermal equilibrium, and a necessary consequence of the second law of thermodynamics, is microscopic reversibility of the equations of motion. This property is ensured by many simulation procedures43,44 and carries over to detailed balance between discrete states, i.e., formally leads to a (time-) reversible Markov model. A reversible Markov model is not only physically desirable, it offers statistical advantages as it has only about half as many independent parameters compared to a nonreversible model, and it allows the equilibrium kinetics to be analyzed in a straightforward and meaningful manner. Furthermore, imposing detailed balance with respect to a given stationary vector can be used to aid the efficient estimation of rare-event processes from MSMs.45 

Algorithms imposing detailed balance during likelihood maximization have been discussed in Refs. 7, 16, 36, and 37. First methods for sampling the posterior distribution of reversible transition matrices have been suggested in Ref. 39 and later in Ref. 46. A method working with natural priors for reversible chains was proposed in Ref. 47. The sampling of transition matrices reversible with respect to a fixed stationary distribution was also presented in Ref. 39, while a Gibbs sampling algorithm with a significantly improved convergence rate has been developed in Ref. 42. Reference 48 discusses methods for goodness-of-fit tests for Markov chains.

This article is deliberately broad and presents new concepts, insights, and algorithms for reversible Markov model estimation in general, maximum likelihood estimators, and Bayesian estimators that mutually benefit from each other. For this reason, we first give a survey of principles and consequences of reversible Markov models. We then extend the framework of maximum likelihood estimation of transition matrices by giving a simplified maximum likelihood estimator (MLE) for reversible transition matrices and a new estimator for reversible transition matrices with a fixed equilibrium distribution. The main part of the paper comprises new algorithms for the full Bayesian analysis of the posterior ensemble of reversible Markov models. As yet, three fundamental problems have not been satisfactorily solved: (i) How can one harvest statistically uncorrelated transition counts from trajectories in which subsequent transitions are correlated, so as to give rise to meaningful uncertainty intervals? (ii) Which prior should be used in a Bayesian analysis so as to get error intervals that envelop the true value even for Markov models with many states? (iii) How can we design efficient sampling algorithms for the reversible posterior ensemble, i.e., algorithms that allow to quickly compute reliable error bars for Markov models with many states? In this paper we discuss (i) and give a rather complete treatment of problems (ii) and (iii). Efficient sampling algorithms are derived for reversible Markov models and reversible Markov models with fixed equilibrium distribution.

In this section, we will show that microscopic reversibility carries over to the discretized situation and discuss the desirable properties of a reversible Markov state model.

Let μ(x) denote the equilibrium distribution on the microscopic degrees of freedom x ∈ Ω, e.g., all-atom coordinates of the system of interest, and let pτ(x, y) denote the conditional transition density of the MD implementation. pτ(x, y) is the probability that the system is found in state y at time t + τ given that it has been in state x at time t. The MD implementation fulfills microscopic reversibility if the following detailed balance equation

(1)

holds for all pairs of states x, y ∈ Ω. Hence, the terms “detailed balance” and “reversible” are equivalent in our context. Since μ(x) pτ(x,  y) is the unconditional probability to find the transition (x,  t) → (y,  t + τ), Eq. (1) means that the system is on average time-reversible — the absolute number of transitions from x to y is equal to the reverse. Microscopic detailed balance is desirable to have in any MD implementation when the aim is to perform simulations in thermodynamic equilibrium. If (1) would be violated, that would imply the existence of cycles xyzx along which there is a net transport. Since such cycles could be used to generate work, their existence in a system that is driven by purely thermal energy would be inconsistent with the second law of thermodynamics.

Dynamical models that are commonly employed in MD implementations fulfill detailed balance. Brownian (overdamped Langevin) dynamics fulfills detailed balance. Hamiltonian and non-overdamped Langevin dynamics fulfill generalized detailed balance with respect to momentum inversion in phase space, but when integrating over the distribution of momenta they do fulfill ordinary detailed balance in position space.49 In practice, some finite time-stepping integrators do not obey exact detailed balance with respect to the Boltzmann distribution, but we here consider that the MD implementation has been chosen such that detailed balance is at least approximately fulfilled.

Now suppose that the state space Ω is partitioned into non-overlapping subsets S1,  …,  Sn that we shall call discrete states here. Each set has an equilibrium probability given by

(2)

and the transition density gives rise to a discrete state transition matrix P(τ) with entries

(3)

Using (2) and (3) and microscopic detailed balance, (1), it is straightforward to verify that

(4)

Note that (4) holds independently of the choice of the lag time τ. Moreover, (4) implies that π is the equilibrium probability vector of P(τ). By defining the diagonal matrix Π = diag(π1,  …,  πn), we can alternatively write (4) as a matrix equation,

(5)
(6)

As a result, if the microscopic dynamics are reversible, the Markov model transition matrix must also be reversible. However, a direct estimate of P from a finite amount of simulation data cannot be expected to fulfill (4) exactly. Thus, the principle validity of detailed balance motivates us to enforce(4) in the process of estimating P.

Enforcing detailed balance helps to reduce the statistical error of an estimator for P because it reduces the number of independent variables roughly by approximately one half (see Table I). However, there are other consequences of having (4): Given detailed balance, we can compute the molecular equilibrium kinetics in a physically meaningful way and employ some useful analysis tools that are not defined for nonreversible Markov models. Moreover, we can employ more efficient and robust matrix algebra routines when exploiting that P is a reversible matrix.

TABLE I.

Number of independent variables (degrees of freedom, dof) and their approximated values for transition matrices depending on the constraints.

Constraints dof
None  n(n − 1)  n2 
Reversible  1 2 n ( n 1 ) + n 1   1 2 n 2  
Reversible, fixed π  1 2 n ( n 1 )   1 2 n 2  
Constraints dof
None  n(n − 1)  n2 
Reversible  1 2 n ( n 1 ) + n 1   1 2 n 2  
Reversible, fixed π  1 2 n ( n 1 )   1 2 n 2  

Many methods to analyze the molecular kinetics based on a Markov model transition matrix rely on the eigenvalue decomposition of P. Using the diagonal eigenvalue matrix Λ = diag(λ1,  …,  λn), we can formulate a right eigenvalue problem with right column eigenvectors R = (r1,  …,  rn), ri ∈ ℝn, and left row eigenvectors L = (l1,  …,  ln),

(7)
(8)

From (7), we can obtain a generalized eigenvalue problem,

Π is symmetric positive definite and as a result of detailed balance, ΠP is symmetric. Hence, all eigenvalues λ1,  …,  λn are real, and the eigenvectors are orthogonal with respect to the equilibrium distribution50 

where we have used the weighted scalar product u , v π = i π i u i v i . We can make R orthonormal by scaling an arbitrarily obtained eigenvector ri by r i , r i π 1 / 2 .

Inserting detailed balance formulation (6) into decomposition (7) immediately gives

which is a left eigenvalue problem with the choice

(9)
Thus, detailed balance establishes a 1-to-1 relation between the left and the right eigenvectors. We can decompose the transition matrix into its spectral components by just using one set of eigenvectors and the equilibrium distribution, such as

(10)

1. Example 1

Consider the following reversible 3 × 3 transition matrix:

(11)

Suppose we generate a Markov chain of length 20 starting from state 1, resulting in the count matrix at lag τ = 1,

(12)

Now we conduct a nonreversible and a reversible maximum likelihood estimation of the transition matrix given C. Eigenvalues for the exact transition matrix in (11) and both nonreversible and reversible estimates for the given count matrix in (12) are shown in Fig. 1.

FIG. 1.

Eigenvalues of 3 × 3 example system. The eigenvalues obtained from the reversible estimate (green) are a closer approximation to the true eigenvalues (red) than the eigenvalues obtained from the non-reversible estimate (blue). The unique eigenvalue λ = 1 is faithfully reproduced by both estimates.

FIG. 1.

Eigenvalues of 3 × 3 example system. The eigenvalues obtained from the reversible estimate (green) are a closer approximation to the true eigenvalues (red) than the eigenvalues obtained from the non-reversible estimate (blue). The unique eigenvalue λ = 1 is faithfully reproduced by both estimates.

Close modal

It is seen that the nonreversible estimate contains complex eigenvalues. These generally come in complex conjugate pairs. Fig. 1 shows a much higher accuracy of the reversible estimate compared to the nonreversible estimate. In order to explore the statistical significance of this observation, we run N = 1000 chains of length L = 20 using transition matrix (11). The reversible and nonreversible estimation results, together with the true eigenvalues, are reported in Table II.

TABLE II.

Results for non-reversible and reversible transition matrix estimation using count matrix (12).

λ1 Re{λ2} Im{λ2} Re{λ3} Im{λ3}
Exact  0.42    0.0   0.18    0.0   
Reversible  0.36 ± 0.31  0.0   0.18 ± 0.04  0.0   
Non-reversible  0.32 ± 0.29  −0.04 ± 0.08  0.07 ± 0.21  0.04 ± 0.09 
λ1 Re{λ2} Im{λ2} Re{λ3} Im{λ3}
Exact  0.42    0.0   0.18    0.0   
Reversible  0.36 ± 0.31  0.0   0.18 ± 0.04  0.0   
Non-reversible  0.32 ± 0.29  −0.04 ± 0.08  0.07 ± 0.21  0.04 ± 0.09 

It is seen that the reversible estimates do not only have the correct real-valued structure but can also have smaller uncertainties (here, especially for λ3). This is expected to be a general result due to the smaller number of degrees of freedom in the reversible estimate.

2. Example 2

Fig. 2 shows the distribution of eigenvalues from nonreversible and reversible Markov models from simulation data for the alanine-dipeptide molecule (see Sec. V B). The eigenvalues of the reversible estimate are purely real while the non-reversible estimate has eigenvalues with non-zero imaginary part.

FIG. 2.

Eigenvalues for alanine dipeptide. The cluster of dominant eigenvalues indicates that the slowest processes are faithfully reproduced by the non-reversible (blue) as well as by the reversible (green) estimate. Only the eigenvalues of the reversible estimate are purely real.

FIG. 2.

Eigenvalues for alanine dipeptide. The cluster of dominant eigenvalues indicates that the slowest processes are faithfully reproduced by the non-reversible (blue) as well as by the reversible (green) estimate. Only the eigenvalues of the reversible estimate are purely real.

Close modal

Since detailed balance is a consequence of a system simulated at dynamical equilibrium, it is not surprising that detailed balance in the transition matrix P is a prerequisite to analyze the equilibrium kinetics given P. Since kinetics are related to slow processes, we will here only consider the m largest eigenvalues λ1,  …,  λm and assume that they are positive. Here are a few examples for equilibrium kinetics properties computed from reversible transition matrices:

  1. The dominant relaxation rates of the molecular system are
    (13)
    where i ≥ 2 (i = 1 has a relaxation rate of zero and corresponds to the equilibrium distribution). The inverse quantities are the relaxation time scales t i = κ i 1 . These rates or time scales are of special interest because they are often detectable in kinetic experiments such as fluorescence time-correlation spectroscopy, two-dimensional IR spectroscopy, or temperature jump experiments — see Ref. 11 for a discussion.
  2. Decomposition (10) can be used to write kinetic experimental observables in an illuminating form.11,13,14 For example, the long-time scale part of the autocorrelation of a molecular observable a ∈ ℝn, e.g., containing the fluorescence values of every Markov state of a molecule, can be written as
    (14)
  3. The PCCA+ method for seeking m metastable sets of Markov states and its variants8,51 assumes m real-valued eigenvalues and eigenvectors. It is thus only reliably applicable to reversible transition matrices.

  4. Discrete transition path theory20,52,53 computes the statistics of transition pathways from a set of states A to a set of states B given a transition matrix. Discrete TPT can be used with nonreversible and reversible transition matrices. However, in the reversible case, we get that the forward committor and the backward committor are complementary,
    and the net fluxes, when ordering states such that q i + q j + , are given by
    which is analogous to an electric current I = UG, where I = f i j + , U = ( q j + q i + ) is the potential difference and πipij is the conductivity.54 

We restate the transition matrix likelihood and formulate the maximum likelihood estimation problem for Markov model transition matrices. We present new estimation algorithms for reversible Markov models with known or unknown equilibrium distribution π.

Suppose we have a discrete sequence S = {s1,  …,  sN} with si ∈ {1,  …,  n}. If we assume that this sequence is the realization of a Markov chain with lag time τ = 1, the probability that a transition matrix P has generated X is proportional to the product of individual transition probabilities along the trajectory

(15)

We have neglected the proportionality constant because we will not need it in order to maximize or sample from (15). This is very handy because one component in this constant is the probability of generating the first state, ps0, which is often unknown, but is constant for a fixed data set S.

Suppose we have cij transitions from i to j. Then, we can group all pij terms together and get a factor p i j c i j . Doing this for all pairs results in the equivalent likelihood formulation

(16)

We can see that the count matrix C ∈ ℝn×n is a sufficient statistics for the Markov model likelihood ℙ(SP) — it generates the same likelihood although we have discarded the information in which sequence the transitions have occurred.

If multiple trajectories are available, their count matrices are simply added up.

How should we count transitions for longer lag times τ > 1, or if S is not Markovian at lag time τ? Regarding the first case, if S is Markovian at lag time τ, a safe approach seems to subsample the trajectory at time steps of τ and then treat the subsampled trajectory as above.7,9 However, this approach is statistically inefficient: If S is also Markovian for shorter lag times than τ, then we are using less information than we could. Even if S only becomes Markovian at lag times of τ or longer, transitions such as 1 → τ + 1 and τ/2 → τ + τ/2 are usually only partially correlated, such that discarding the second transition is also not fully exploiting the data. In practical MD simulations, the lag times required such that a Markov model is a good approximation need to be quite long (often in the range of nanoseconds), such that subsampling the data at τ will create severe problems with data and connectivity loss. Regarding the second case, if S is not Markovian at lag time τ, then treating every cij as an independent count is incorrect.

Both cases can, in principle, be treated with the following formalism: We always obtain the count matrix cij in a sliding window mode,7 i.e., we harvest all Nτ available transition counts from time pairs (1 → τ),  (2 → τ + 1),  …,  (NτN). Unless S is Markovian at lag time 1, we will now harvest more transition counts than are statistically independent. We can formally correct for this by introducing a statistical inefficiency Iij(τ) for every count at a given lag time, such that c i j eff ( τ ) = I i j ( τ ) c i j ( τ ) is the effective number of counts, resulting in the likelihood

(17)

The determination of statistical inefficiencies for univariate signals is well established.55 Determining Iij(τ) for transition count matrices is an open problem. A first approach that allows for the first time to estimate consistent, although somewhat too small uncertainty intervals for practical MD data is discussed in Ref. 56. Note that the validity of the estimation algorithms described in the present paper is independent of the choice of the count matrix, such that future methods for estimating the effective count matrix can be adopted without changing the estimation algorithms.

We will now assume that the effective counts are given. For better readability, we will subsequently omit the superscript eff and just use C = (cij) to indicate counts. Now we ask the question what is the most likely transition matrix for the observation C, i.e., we seek the MLE that maximizes (15) over the set of transition matrices.

1. Non-reversible estimation

It is well known that the non-reversible MLE for the transition probability from state i to state j is simply given by the ratio of observed counts from i to j divided by the total number of outgoing transitions from state i,57 

(18)

We use the hat in order to denote an estimator. The term non-reversible implies that reversibility has not been used as a constraint in the estimation of P ˆ nonrev . Of course P ˆ nonrev can be coincidentally reversible and will be reversible if the count matrix C is symmetric. For this reason, some early contributions in the field forced symmetry in C by counting S forward and backward. This practice is strongly discouraged as it will create a large bias unless the trajectories used are very long compared to the slowest time scales of the molecule.

2. Reversible estimation

Now we consider the problem of finding the reversible MLE P ˆ rev by enforcing detailed balance (4) with respect to an unknown equilibrium distribution (πi) as constraint in the estimation procedure. Note that the count matrix used for this approach is not modified, i.e., it comes from a forward-only or nonreversible counting and is generally not symmetric. Constraints (4) can be more conveniently handled by defining the new set of variables

(19)

Note that

(20)

We can thus recover the transition matrix from X = (xij) by

(21)

Inserting (21) into (16) and adding constraints for detailed balance and stochasticity leads to the reversible maximum likelihood problem,

(22)

Ignoring the inequality constraints, the optimality conditions are

(23)

with c i = j c i j and x i = j x i j . There is no closed form solution when including the detailed balance constraint so that (22) has to be solved numerically. One option is to directly solve (23) for xij and turn it into a fixed-point iteration, as first proposed in Ref. 37,

(24)

where k counts the iteration number in the algorithm. For a starting iterate x i j ( 0 ) fulfilling the constraints in (22), for example, x i j ( 0 ) = ( c i j + c j i ) / i , j ( c i j + c j i ) , the iterates will be symmetric and fulfill the inequality constraints for all k > 0.

If we sum over j on both sides of (24) and use (20), we can instead reduce the problem to iterative estimation of the equilibrium distribution,

(25)

The iteration is terminated when ||π(k+1)π(k)|| < ϵ. The final estimate π ˆ is then inserted into (23) to recover the reversible transition matrix estimate,

(26)

Note that both the optimum sought by Eqs. (25) and (26) exhibits p ˆ i j = 0 if cij + cji = 0. Thus, in both optimization algorithms, the sparsity structure of the matrix C + CT can be used in order to restrict all iterations to the elements that will result in a nonzero element p ˆ i j > 0 .

Furthermore, note that (25) and (26) are special cases of the transition-based reweighing analysis (TRAM) method — see Ref. 16, Eqs. (29) and (30) — for the special case of a single thermodynamic state. An example for the progress of the self-consistent iteration using an alanine dipeptide simulation is shown in Fig. 3.

FIG. 3.

Performance of algorithms for reversible maximum likelihood estimation. (a) Reversible transition matrix estimated using fixed-point iteration (25) from an n = 228 state count-matrix obtained from alanine-dipeptide simulation data. Convergence is shown for different total simulation lengths T. (b) Performance comparison of direct fixed-point iteration (25) and the quadratic optimizer described in Ref. 7 for reversible transition matrix estimation given the count matrix C = ( 5 , 2 , 0 ) , ( 1 , 1 , 1 ) , ( 2 , 5 , 20 ) . Shown is the difference of the current likelihood to the optimal likelihood. (c) Same as b, but using the 1734 × 1734 count matrix from Pin WW folding simulations used in Ref. 20.

FIG. 3.

Performance of algorithms for reversible maximum likelihood estimation. (a) Reversible transition matrix estimated using fixed-point iteration (25) from an n = 228 state count-matrix obtained from alanine-dipeptide simulation data. Convergence is shown for different total simulation lengths T. (b) Performance comparison of direct fixed-point iteration (25) and the quadratic optimizer described in Ref. 7 for reversible transition matrix estimation given the count matrix C = ( 5 , 2 , 0 ) , ( 1 , 1 , 1 ) , ( 2 , 5 , 20 ) . Shown is the difference of the current likelihood to the optimal likelihood. (c) Same as b, but using the 1734 × 1734 count matrix from Pin WW folding simulations used in Ref. 20.

Close modal

A different method of iterative solution presented in Ref. 7 updates xij with the exact solution to the quadratic problem arising from (23) while holding all other variables xkl fixed. As shown in Fig. 3, this approach can exhibit faster convergence properties than fixed-point iteration (25).

Uniqueness of the estimator: Optimization problem (22) can be equivalently transformed into a convex optimization problem by replacing the decision variables xij with zij = log(xij) (see Ref. 17 for details), which implies the uniqueness of the maximum likelihood estimator.

3. Reversible estimation for given stationary vector

While unbiased MD simulations are useful to estimate state-to-state transition probabilities pij, enhanced sampling algorithms such as umbrella sampling and replica-exchange MD can be much more efficient in order to gain insight of the equilibrium distribution π. Reference 45 demonstrates how an uncertain estimate of π can be combined with unbiased “downhill” trajectories in order to estimate rare event kinetics. A key in such a procedure is a way to estimate a reversible Markov model that is most likely given transition counts observed from MD simulations, but at the same time has a fixed equilibrium distribution π. Here we derive a new, efficient estimation algorithm for this task.

Enforcing reversibility with respect to a given stationary vector results in the following constrained optimization problem:

(27)
(28)
(29)
(30)

π can only be the unique stationary distribution of P if P is irreducible. To ensure irreducibility, we restrict the state space to the largest (weakly) connected set of the undirected graph that is defined by the adjacency matrix C + CT. For a system with n states, Eqs. (27)(30) are a convex minimization problem in O ( n 2 ) unknowns with O ( n 2 ) equality and inequality constraints. Solving this with a standard interior-point method requires the solution of a linear system with O ( n 2 ) unknowns to compute the search direction at each step. The resulting computational effort of O ( n 6 ) operations for solving the linear system quickly becomes unfeasible for increasing n. Therefore, we will propose a fixed-point iteration that is also feasible for large values of n.

To solve the maximization problem, we ignore inequality constraint (29) at first. Row-stochasticity constraint (28) is enforced by introducing Lagrange multipliers λi and adding penalty terms λ i j p i j 1 for all i = 1,  …,  n to the objective function. Detailed balance constraint (30) is included into the likelihood explicitly by the change of variables,

These substitutions result in the Lagrange function

(31)

that we seek to maximize. By setting the gradient of F with respect to all p i j to zero and subsequently reversing the change of variables, we find the following expression for the maximum likelihood estimate:

(32)

Note the similarity of this equation with maximum likelihood result (26) where π has been self-consistently computed from the counts. The row counts ci are here replaced by the yet unknown Lagrange multipliers λi. In order to find the Lagrange multipliers, we sum Eq. (32) over j,

(33)

This does not give a closed-form expression for λi. However, based on this equation, we propose the following fixed-point iteration for the Lagrange multipliers:

(34)

Motivated by the analogy between Lagrange multipliers and row counts described above, we set the starting point to

(35)

Taking the limit λi → 0+ in (34) still leads to a consistent solution. Choosing strictly positive starting parameters according to (35) results in valid iterates from (34). In analogy to the reversible case, we iterate (34) and (35) until λ ( k + 1 ) λ ( k ) < ϵ . An example for the progress of the self-consistent iteration using alanine dipeptide simulation data is shown in Fig. 4. Note that the convergence is nearly three orders of magnitude faster compared to the estimation with unknown equilibrium distribution (Fig. 3).

FIG. 4.

Convergence of the reversible maximum likelihood estimation with fixed stationary vector. The transition matrix is estimated from an n = 228 state count-matrix obtained from alanine-dipeptide simulation data. Convergence is shown for different total simulation lengths. The stationary distribution was obtained using simple counting estimate (83).

FIG. 4.

Convergence of the reversible maximum likelihood estimation with fixed stationary vector. The transition matrix is estimated from an n = 228 state count-matrix obtained from alanine-dipeptide simulation data. Convergence is shown for different total simulation lengths. The stationary distribution was obtained using simple counting estimate (83).

Close modal

Given converged Lagrange multipliers, we can exploit (32) to find the maximum likelihood transition matrix P ˆ . For this algorithm, inequality constraints (29) are automatically fulfilled when cij ≥ 0 for all i, j. Care must be taken in two situations: (i) λi = λj = 0 — one can show that the simultaneous limit λi → 0+ and λj → 0+ can only occur for cij + cji = 0, but then we know that p ˆ i j = 0 . (ii) A diagonal element cii is zero. Depending on the values of π, the solution λi may take the value of zero such that Equation (32) for i = j becomes p ˆ i i = c i i / λ i = 0 / 0 which is meaningless and is not the correct limit of pii as cii goes to zero. However, this can be fixed easily by using p ˆ i i = 1 j i p ˆ i j for the diagonal elements of P. In summary, we use the following equation for computing P ˆ from converged Lagrangian multipliers:

(36)

Uniqueness of the estimator: Since the above estimation algorithm is iterative, it is fair to ask whether the estimator P ˆ it converges to is unique, or whether there might be multiple local maxima that we could get stuck in. In this case, it is easy to show that the estimator is unique: Let P be an optimal transition matrix. p i j = 0 exactly if cij + cji = 0. Let Ω = {pij|cij + cji > 0}. Then, the function f ( P ) = i , j c i j log p i j is strictly convex on Ω and the constraints restrict the solution on a convex subset Ω ̃ Ω . The minimization of a strictly convex function over a convex set has a unique solution.

We introduce new algorithms for sampling the full posterior probability distribution of Markov models, and, in particular, for estimating uncertainties of quantities of interest, such as relaxation time scales or mean first passage times. A key in these algorithms is the choice of a suitable prior which enforces the sampled matrices to have the same sparsity pattern as the transition count matrix, as this allows the credible intervals to lie around the true value even for large transition matrices. The relevance of the prior is first demonstrated for nonreversible Markov models, for which an efficient sampling algorithm is known. We then introduce new Gibbs sampling algorithms for reversible Markov models with and without constraints on the equilibrium distribution that vastly outperform previous algorithms for sampling reversible Markov models.

Bayes’ formula relates the likelihood of an observed effective count matrix C given a probability model P to the posterior probability of the model given the observation

(37)

The posterior accounts for the uncertainty coming from a finite observation. It incorporates a priori knowledge about the quantity of interest using the prior probability ℙ(P). We will see that a suitable choice of the prior is essential for the success of a Bayesian description for high-dimensional systems.

In general, if we are interested in an observable that is a function of a transition matrix, f(P), we would like to compute its posterior moments, such as the mean and the variance,

(38)
(39)

While we usually use the maximum likelihood transition matrix P ˆ to provide “best” estimates, f ( P ˆ ) , the above integrals are of interest because σ ( f ) = V ar ( f ) gives us an estimate of the statistical uncertainty of f. Alternatively, we might be interested in the credible intervals which encompass the true value of f with some probability, such as 0.683 (1σ intervals) or 0.95 (2σ intervals). As integrals (38) and (39) are high-dimensional, we need to use Monte Carlo methods to approximate them.

In Monte Carlo methods, we generate a sample of transition matrices { P ( k ) } k = 1 N distributed according to the posterior and evaluate f at each element P(k) in the ensemble. We then approximate posterior expectation value (38) and posterior variance (39) by

(40)
(41)

Obtaining good and reliable samples of the posterior ℙ(P|C) is very difficult. Previous approaches have suffered from some or all of the following difficulties, that are addressed here:

  1. Choice of the prior: Given n Markov states (typically 100 s to 1000 s), transition matrices have on the order of n2 elements and are thus extremely high dimensional. Most priors used in the past allow to populate all these elements pij, including those for which no transition has been observed. Although the effect of the prior can be overcome by enough simulation data, for any practical amount of simulation data, such priors will lead to posterior distributions whose probability mass is far away from the true model. This problem has been first addressed in Ref. 20 by designing a prior that equates mean and MLE for nonreversible transition matrices, leading to credible intervals that nicely envelop the true value. Here, we design corresponding priors for reversible Markov models.

  2. Uncorrelated transition countsC: As discussed in Sec. III B, the likelihood, and thus the posterior, depends on how transition counts are harvested from the discrete trajectories which are generally time-correlated and not exactly Markovian at any particular lag time τ. While the MLE is often not or little affected by the exact way of counting C, the uncertainties will be dramatically different if C is, e.g., counted in a sliding window mode (using transitions starting at all times t = 0,  1,  2,  …), or by subsampling the trajectory (using transitions starting at all times t = 0,  τ,  2τ). Whereas the first approach underestimates the uncertainties, the second approach often overestimates them and is often not practical for large lagtimes τ. Here, we suggest to use the effective number of uncorrelated transition counts, C = Ceff, and a first approach to compute them is described in Ref. 56.

  3. Efficiency of the sampler: Finally, given a choice of prior and C, a sampling algorithm needs to explore the high-dimensional space of transition matrices in a reasonable time. This is especially problematic for reversible Markov models. The first Monte Carlo algorithm for sampling the reversible posterior, described in Ref. 39, suffers from poor mixing due to small acceptance probabilities of the individual steps. In Ref. 42, an improved sampler was proposed. Here, we propose sampling algorithms for reversible Markov models with and without fixed equilibrium distribution whose efficiencies go far beyond previous approaches.

Let us first illustrate the effect of prior choice on Bayesian estimation of nonreversible Markov models. A convenient functional form for the prior is the Dirichlet prior

(42)

where B = (bij) is a matrix of prior-counts. For this choice, the posterior is given by

(43)

zij = cij + bij is the matrix of posterior pseudo-counts. In the non-reversible case, we can generate independent samples from (43) by drawing rows of P(k) independently from Dirichlet distributions j p i j α i j 1 with Dirichlet parameters αij = zij + 1 = cij + bij + 1.9 

Choosing a uniform prior, bij = 0, assigns equal prior probability to all entries, pij, in the posterior ensemble. But this a priori assumption can lead to serious problems when estimating quantities for meta-stable systems.

Consider, for example, the following transition matrix for a birth-death chain consisting of two meta-stable sets A = {1, …, m}, B = {m + 2, …, n}, separated by a kinetic bottleneck in form of a single transition state,

(44)

For barrier parameter b = 3 and sets with m = 50 and n = 101, the expected time for hitting B from state x = 1 is 2 ⋅ 105 steps. Now we are interested in the Bayesian estimator for a simulation of length L = 107. The true distribution can be estimated with arbitrary precision by repeating the simulation many times. Here, 103 repetitions led to an estimate of the 90% percentile for the mean first passage time of [1.5,  2.7] ⋅ 105 (see Table III).

TABLE III.

Estimates and credible intervals for the mean first passage time in a birth-death chain using uniform and sparse priors.

Method Estimate
True  2 . 0 1 . 5 2 . 7 1 0 5  
Uniform prior bij = 0  1 . 9 5 1 . 9 2 . 0 1 0 3  
Sparse prior bij = − 1  2 . 0 1 . 5 2 . 7 1 0 5  
Method Estimate
True  2 . 0 1 . 5 2 . 7 1 0 5  
Uniform prior bij = 0  1 . 9 5 1 . 9 2 . 0 1 0 3  
Sparse prior bij = − 1  2 . 0 1 . 5 2 . 7 1 0 5  

In practice, we cannot afford to repeat the simulation many times but would like to approximate the true value and its statistical uncertainty from the given simulation data. Sampling the nonreversible posterior given expected counts for a single chain of length L = 107 with a uniform prior, bij = 0, results in non-zero transition probabilities for elements pij which are zero in the true transition matrix. As a result, artificial kinetic pathways circumventing the bottleneck are appearing in the posterior ensemble which lead to a dramatic underestimate of the mean first passage time. The Bayesian estimate with 90% credible interval obtained from 103 posterior samples is [1.9,  2.0] ⋅ 103, and thus two orders of magnitude smaller than the true value 2 ⋅ 105 (Table III).

Using the prior bij = − 1 suggested in Ref. 20 results in 90% credible intervals, [1.5, 2.7] ⋅ 105, which clearly cover the true value 2 ⋅ 105 (Table III). The choice bij = − 1 leads to a posterior distribution in which sampled transition matrices P have the same sparsity structure as the count matrix C, i.e., pij = 0 if cij = 0. As count matrices in the present context are generally sparse, we call this prior briefly sparse prior. Apparently, the sparse prior leads to consistent credible intervals covering the true value.

Fig. 5 shows the convergence of the 90% credible interval for the sparse and the uniform prior. The credible interval for the sparse prior envelopes the true value already given little data. To achieve consistency using the uniform prior requires simulations order of magnitudes longer than the time scale of the slowest process, thus rendering inference under this prior unpractical.

FIG. 5.

Convergence of the 90% credible interval for the sparse prior bij = − 1 and the uniform prior bij = 0. The dashed line indicates the true value. The credible interval for the improper prior covers the true value orders of magnitude before the credible interval for the uniform prior.

FIG. 5.

Convergence of the 90% credible interval for the sparse prior bij = − 1 and the uniform prior bij = 0. The dashed line indicates the true value. The credible interval for the improper prior covers the true value orders of magnitude before the credible interval for the uniform prior.

Close modal

Note that our prior induces a fixed sparsity structure. This concept should not be confused with other sparsity inducing priors used, i.e., in the context of Bayesian compressed sensing,58 where the sparsity pattern is subject to uncertainty.

Now we will present a new method for the sampling of reversible transition matrices. In our new approach, we replace Dirichlet prior (42) by a new prior for reversible sampling.

Similarly as in reversible maximum likelihood estimation, we define our reversible transition matrix sampler in the space of unconditional transition probabilities xij. For convenience, we restrict ourselves to the independent set of variables with ij (remember that xij = xji for reversible matrices), and keep them normalized to 1,

(45)
(46)

Although X is defined slightly different as in the maximum-likelihood case, the mapping from X back to P is still given by Eq. (21). We define a prior for reversible sampling on the set of X matrices rather than on P: Choosing xij as the set of independent variables has the advantage that obeying detailed balance amounts to sampling symmetric matrices, X = XT,

(47)

The posterior for reversible sampling is then given by

(48)

Below we will first consider how to sample from (48) using general prior counts bij. Then, we will consider the specific choice bij = − 1 for all ij and show that this choice has similar properties as the sparse prior in the nonreversible case.

There is no known method to generate independent samples from the posterior under the reversibility requirement. Instead we will use a Markov chain Monte Carlo (MCMC) method to generate samples from the posterior ensuring that each sampled transition matrix fulfills detailed balance condition (4). Our Markov chain will generate the ensemble { X ( k ) } k = 1 N via a set of updates advancing the chain from X(k)X(k+1) starting from a valid initial state X(0). We can do a simple row-normalization of the X matrices to obtain the desired ensemble { P ( k ) } k = 1 N . Expectation values and variances will again be estimated using (40) and (41).

Similarly as in Refs. 39 and 42, we will construct our Markov chain using a Gibbs sampling procedure, where we sample a single element of X in each step while leaving the other elements unchanged. We repeat this sampling procedure for every element of X, thus completing a Gibbs sweep. As detailed in the  Appendix, we can use the following general Gibbs step to sample posterior (48):

  1. Select an arbitrary element xkl. Propose a new (unscaled) matrix XX′ by sampling this element from the proposal density q ( x k l | X ) ,
    (49)
    Here, q ( x k l | X ) is an arbitrary, scale-invariant density. Scale-invariance means that q ( x k l | X ) q ( c x k l | c X ) for any positive constant c.
  2. Accept X ̄ as a new step in our Markov chain with probability min{1, pacc} where
    (50)
    where b 0 = k l b k l and γ is the marginal density,
  3. Renormalize the matrix X X ̄ such that it fulfills (46),
    (51)

While this approach will work for any choice of prior counts, we will now use the sparse prior bij = − 1 for all i, j with the hope to achieve similarly good results as in the nonreversible case. For this choice, γ ( x k l | X ) is scale-invariant, i.e., γ ( x k l | X ) = γ ( c x k l | c X ) , and the Jacobian pre-factor in (50) is one. Thus, we have

(52)

and the ideal choice of the proposal density is qγ, which would guarantee that the acceptance probability is always 1. This proposal density degenerates to a point probability at zero if ckl + clk = 0, which implies bij = − 1 encodes a priori belief that any transition for which neither the forward direction nor the backward direction has ever been observed in the data has zero probability in the posterior ensemble. Thus, this prior enforces P to have the same sparsity structure as the count matrix, like the choice bij = − 1 for nonreversible sampling. Note that the reversible MLE has the same sparsity structure as can be seen from update rule (24).

We will choose proposal densities q ( x k l | X ) that are also scale-invariant. In this case, normalization step 3 above ( X X ̄ ) can be omitted, i.e., if we accept X′, we can directly set it as our new sample X(k) and obtain P(k) by row normalization. We will now outline how to design the proposal density γ such that the acceptance probability is 1 or nearly 1. For k = l, sampling x k k γ ( x k k | X ) is equivalent to sampling the following transformed variable (see the  Appendix):

(53)

So we can simply define q ( x k k | X ) γ ( x k k | X ) and generate x k k by

(54)
For kl, there is no known way to draw independent samples, but γ ( x k l | X ) can be well approximated by a gamma distribution by matching the maximum point and the second derivative at the maximum. A gamma distribution can be efficiently sampled and we use it as proposal density and accept the resulting x k l with probability min{1, pacc}. Specifically, our proposal step is

(55)

with the parameters

(56)
(57)

using

(58)
(59)
(60)
(61)
(62)

which matches the value and the first two derivatives of the true marginal density at the maximum (see the  Appendix for derivation) and leads to acceptance probabilities close to one for most values of xkl. However, if the current value of xkl is in one of the heavy tails of the distribution γ ( x k l | X ) , the acceptance probability can be much less than 1 and the Markov chain can get stuck. In order to avoid this problem, we utilize a second step to generate x k l : After we sample x k l from proposal density (55), we sample x k l according to

(63)

where N ( x | m , s ) denotes the normal distribution of x with mean m and standard deviation s. The proposal density defined by the above update is

(64)

and the corresponding acceptance probability is

(65)

In summary, the proposed Algorithm 1 is a Metropolis within Gibbs MCMC algorithm with adapted proposal probabilities for each Gibbs sampling step. For efficiency reasons, transition matrix elements (i,  j) for which no forward or backward transition counts have been observed can be neglected in the sampling algorithm, in order to account for the effect of the sparse prior.

ALGORITHM 1.

Reversible sampling algorithm.

Input: C, X(j) 
Output: X(j+1) 
forall indexes (k,  l) withklandckl + clk > 0 do 
ifk = lthen 
Sample x k k ( j + 1 ) from (54) 
end 
else 
Calculate α and β by (56), (57), and sample x k l  
from Gamma(α,  β). 
p acc = γ ( x k l | X ) γ ( x k l | X ) Gamma ( x k l | α , β ) Gamma ( x k l | α , β )  
Accept x k l as x k l ( j + 1 ) with probability min{1, pacc
Sample x k l by log x k l N ( log x k l , 1 )
p acc = γ ( x k l | X ) γ ( x k l | X ) x k l x k l  
Accept x k l as x k l ( j + 1 ) with probability min{1, pacc
end 
end 
Input: C, X(j) 
Output: X(j+1) 
forall indexes (k,  l) withklandckl + clk > 0 do 
ifk = lthen 
Sample x k k ( j + 1 ) from (54) 
end 
else 
Calculate α and β by (56), (57), and sample x k l  
from Gamma(α,  β). 
p acc = γ ( x k l | X ) γ ( x k l | X ) Gamma ( x k l | α , β ) Gamma ( x k l | α , β )  
Accept x k l as x k l ( j + 1 ) with probability min{1, pacc
Sample x k l by log x k l N ( log x k l , 1 )
p acc = γ ( x k l | X ) γ ( x k l | X ) x k l x k l  
Accept x k l as x k l ( j + 1 ) with probability min{1, pacc
end 
end 

As before we will be working with variables xij = πipij related to transition matrix entries pij via (45). In contrast to the previous algorithm, π is not a function of P but fixed. Thus, single normalization condition (46) is replaced by a condition for each row,

(66)

in order to ensure reversibility with respect to the given π.

All xkl in the lower triangle (k > l) are used as independent variables. Given a valid X matrix, an update that respects the constraints is given by

(67a)
(67b)
(67c)
(67d)

(67b) and (67d) ensure that normalization condition (66) holds for the new sample and will thus keep π constant, while (67c) restores the symmetry and thus ensures reversibility of P. We will again use prior (47) defined on the set of X matrices and sample from posterior (48). The ideal proposal density of x k l is

(67e)

which is the conditional distribution density for given all off-diagonal elements of X (except xkl), π, and the counts C.

We have seen that a correct choice of prior parameters was essential in order to successfully apply the posterior sampling for meta-stable systems. As in the reversible case, we will use bkl = − 1 for k > l to enforce xkl = 0 whenever ckl + clk = 0.

However, the choice of prior counts for the diagonal elements bkk is less straightforward. According to our experience, a good choice is to determine the value of bkk based on the maximum likelihood estimate p ˆ k k of the kth diagonal element as

(68)

which ensures that the posterior expectation of pkk is zero if and only if p ˆ k k = 0 , and the conditional expectation of (67e),

(69)

matches the MLE of the one-dimensional likelihood function for xkl given X if p ˆ k k > 0 and ckk = 0. (Note that for the MLE of P, there is at most one k which satisfies p ˆ k k > 0 and ckk = 0 — see proof in the  Appendix.)

However, in the case that p ˆ k k = 0 , conditional (67e) would then degenerate so that x k k = 0 with probability one, and the kth row and column of X would remain fixed in the sampling process. This effect can break ergodicity in the sampled Markov chain and therefore prevent convergence of the algorithm. This problem is avoided by regularizing the prior choosing the prior parameter as bkk = − 1 + ϵ for p ˆ k k = 0 such that (67e) does not degenerate, where ϵ > 0 is a small number. In addition, we need to ensure that the Markov chain is started from an initial state X(0) with x k k ( 0 ) > 0 . In summary, we select the prior of X for reversible sampling with fixed π as

(70)

This choice of prior will again ensure that ckl + clk = 0 results in pkl = 0 and plk = 0 for all k < l and for all posterior samples, a property shared by the reversible MLE with fixed stationary vector. This ensures that the posterior mass is located around the maximum likelihood estimate P ˆ and again prevents the occurrence of artificial kinetic pathways in the posterior ensemble.

We now investigate how to efficiently sample conditional (67e). Here, we assume without loss of generality that xkk < xll and transform x k l ( 0 , x k k + x k l ) into a new variable v′ ∈ (0, + ∞) via

(71)

The ideal proposal density of v′ is then

(72)

with

Like in the previous algorithm, we can approximate the conditional of v by a gamma distribution as

(73)

with

(74)
(75)

and

(76)
(77)
(78)
(79)
(80)

See the  Appendix for derivation. Then, v′ can be sampled by a Metropolis sampling step with the proposal density Gamma(v′|α, β) and the acceptance ratio

where v = xkl/xkk denotes the original value of v. In addition, in order to avoid the sampler from getting stuck at an extremely small or large value of v, we also utilize the same strategy as in Section IV D to generate v′ by

The proposed Algorithm 2 for sampling of reversible transition matrices with fixed stationary vector can again be characterized as a Metropolis within Gibbs MCMC algorithm with adapted proposal probabilities.

ALGORITHM 2.

Reversible sampling algorithm with fixed stationary vector.

Input: C, π, X(j) 
Output: X(j+1) 
forall indexes (k,  l) withk > landckl + clk > 0 do 
ifxkk < xllthen 
v = x k l x k k  
s = x l l + x k l x k k + x k l  
a1 = ckl + clk + bkl 
a2 = ckk + bkk 
a3 = cll + bll 
end 
else 
v = x k l x l l  
s = x k k + x k l x l l + x k l  
a1 = ckl + clk + bkl 
a2 = cll + bll 
a3 = ckk + bkk 
end 
Calculate α and β by (74) and (75) 
Sample v′ from Gamma(α, β). 
Let x k l = min { x k k + x k l , x l l + x k l } v 1 + v
p acc = γ v ( v | X ) Gamma ( v | α , β ) γ v ( v | X ) Gamma ( v | α , β ) using (67a)–(67d) 
Accept xkl as x k l ( j + 1 ) with probability min{1, pacc
Sample v′ by log v N ( log v , 1 )
p acc = γ V ( v | X ) v γ V ( x k l | X ) v  
Let x k l = min { x k k + x k l , x l l + x k l } v 1 + v
Accept x k l as x k l ( j + 1 ) with probability min{1, pacc
end 
Input: C, π, X(j) 
Output: X(j+1) 
forall indexes (k,  l) withk > landckl + clk > 0 do 
ifxkk < xllthen 
v = x k l x k k  
s = x l l + x k l x k k + x k l  
a1 = ckl + clk + bkl 
a2 = ckk + bkk 
a3 = cll + bll 
end 
else 
v = x k l x l l  
s = x k k + x k l x l l + x k l  
a1 = ckl + clk + bkl 
a2 = cll + bll 
a3 = ckk + bkk 
end 
Calculate α and β by (74) and (75) 
Sample v′ from Gamma(α, β). 
Let x k l = min { x k k + x k l , x l l + x k l } v 1 + v
p acc = γ v ( v | X ) Gamma ( v | α , β ) γ v ( v | X ) Gamma ( v | α , β ) using (67a)–(67d) 
Accept xkl as x k l ( j + 1 ) with probability min{1, pacc
Sample v′ by log v N ( log v , 1 )
p acc = γ V ( v | X ) v γ V ( x k l | X ) v  
Let x k l = min { x k k + x k l , x l l + x k l } v 1 + v
Accept x k l as x k l ( j + 1 ) with probability min{1, pacc
end 

We first demonstrate the validity of the reversible sampling algorithm for the following 2 × 2 count-matrix:

(81)

In Fig. 6, we compare the sampled histograms using Algorithm 1 with analytical values for the non-reversible posterior with Dirichlet-prior-counts bij = − 1. Any 2 × 2 transition matrix automatically fulfills detailed balance, and therefore the analytical and sampled densities are expected to be equal. The histogram for samples of the reversible algorithm is indeed in agreement with the analytical posterior.

FIG. 6.

Sampled histogram frequency (a) and analytical probability density (b) of reversible posterior for 2 × 2 count matrix. Sampled frequencies are in agreement with the analytical probabilities.

FIG. 6.

Sampled histogram frequency (a) and analytical probability density (b) of reversible posterior for 2 × 2 count matrix. Sampled frequencies are in agreement with the analytical probabilities.

Close modal

In Fig. 7, the sampled histogram for count matrix (81) with fixed stationary distribution π = 0 . 25 , 0 . 75 using Algorithm 2 is compared with the exact posterior distribution. Detailed balance relation (4) with fixed stationary vector enforces a linear dependency between the transition matrix element p12 and p21. The resulting posterior is, therefore, restricted to the line π1p12 = π2p21 such that the projection on p12 in Fig. 7 already contains the full information about the one-dimensional posterior. A comparison between histogram frequency and analytical density demonstrates the validity of the algorithm.

FIG. 7.

Sampled histograms and analytical probability density for reversible posterior with fixed stationary vector for 2 × 2 count-matrix.

FIG. 7.

Sampled histograms and analytical probability density for reversible posterior with fixed stationary vector for 2 × 2 count-matrix.

Close modal

To demonstrate the usefulness of the proposed algorithms, we apply them to molecular dynamics simulation data. Here, two systems are chosen to illustrate our methods: (1) the alanine-dipeptide molecule and (2) the bovine pancreatic trypsin inhibitor (BPTI) molecule.

We start by discussing the alanine dipeptide results. The system was simulated on GPU-hardware using the OpenMM simulation package59 using the amber99sb-ildn forcefield60 and the tip3p water model.61 The cubic box of length 2.7 nm contained a total of 652 solvent molecules. We used Langevin equations at T = 300 K with a time step of 2 fs. A total of 10 μs of simulation data was generated. The ϕ and ψ dihedral angles were discretized using a 20 × 20 regular grid to obtain a matrix of transition counts C = (cij), here by sampling one count per lag time τ. Below we will show histograms for two important observables, largest implied time scalesti and expected hitting times, τ(AB), for pairs A, B of meta-stable sets. We compute the posterior sample-mean and 90% credible intervals for 1 μs of simulation data and show that the credible intervals nicely envelop a reference value obtained from the MLE transition matrix for the total simulation data, supporting the proposed prior as a “good” choice for reversible sampling in meta-stable systems.

1. Alanine dipeptide, reversible sampling

In Fig. 8, we show histograms of implied time scales computed from a reversible posterior sample. The mean values estimated from the posterior sample are in good agreement with the reference values. Tables IV and V compare the reference values with the sample mean μ and sample standard deviation σ for each observable.

FIG. 8.

(a)-(c) Implied time scales, ti. Histograms obtained from reversible posterior sampling. Dashed lines indicate the reference value, t ˆ i , dotted lines indicate the posterior sample mean, μ(ti). MLE and posterior mean are in very good agreement for the proposed choice of prior. The 90% credible intervals are the shaded regions in gray. Expected hitting times, τ, (d)-(f). Histograms obtained from reversible posterior sampling. Dashed lines indicate the reference value, τ ˆ , dotted lines indicate the posterior sample mean, μ(τ). In all cases, the reference value obtained from a long simulation is clearly compatible with the posterior sample (credible interval).

FIG. 8.

(a)-(c) Implied time scales, ti. Histograms obtained from reversible posterior sampling. Dashed lines indicate the reference value, t ˆ i , dotted lines indicate the posterior sample mean, μ(ti). MLE and posterior mean are in very good agreement for the proposed choice of prior. The 90% credible intervals are the shaded regions in gray. Expected hitting times, τ, (d)-(f). Histograms obtained from reversible posterior sampling. Dashed lines indicate the reference value, τ ˆ , dotted lines indicate the posterior sample mean, μ(τ). In all cases, the reference value obtained from a long simulation is clearly compatible with the posterior sample (credible interval).

Close modal
TABLE IV.

Comparison of reference implied time scales ( t ˆ i ) with mean μ and standard deviation σ from the reversible posterior using N = 105 samples. ϵ is the estimated error of the mean μ and tcorr is the autocorrelation time of the sampled quantity.

t ˆ i (ps) μ (ps) σ (ps) ϵ (ps) tcorr
t2  1462  1556  303  19.00  197 
t3  71  73  0.01  10 
t4  36  43  0.06 
t ˆ i (ps) μ (ps) σ (ps) ϵ (ps) tcorr
t2  1462  1556  303  19.00  197 
t3  71  73  0.01  10 
t4  36  43  0.06 
TABLE V.

Expected hitting times computed from reversible posterior using N = 105 samples. See Table IV for definition of other symbols.

τ ˆ (ns) μ (ns) σ (ns) ϵ (ns) tcorr
τ ( C 5 C 7 a x )   60.4  56.7  12.0  0.77  202 
τ(C5αL 43.6  40.6  8.3  0.53  206 
τ(C5αR 0.253  0.250  0.005  0.0004  218 
τ ˆ (ns) μ (ns) σ (ns) ϵ (ns) tcorr
τ ( C 5 C 7 a x )   60.4  56.7  12.0  0.77  202 
τ(C5αL 43.6  40.6  8.3  0.53  206 
τ(C5αR 0.253  0.250  0.005  0.0004  218 

In order to gain a first impression of the efficiency of the sampling and the quality of our estimates, we compute the integrated autocorrelation time tcorr for each quantity sampled (here implied time scales and hitting times). The error of the sample mean m[f] compared to the true mean 〈f〉 can then be estimated as

(82)

where Neff = N/(1 + 2tcorr) is the effective number of samples with N the total number of samples. See Ref. 62 for a thorough discussion. tcorr and ϵ are also reported in Table IV.

Figs. 8(d)8(f) show histograms for expected hitting times for the three transitions C 5 C 7 a x , C5αL, and C5αR between meta-stable sets in the ϕ and ψ dihedral angle plane. Again, mean values are in good agreement with the corresponding reference values. Table V summarizes the computed results. The table columns again contain the reference value τ ˆ , the mean value μ, the standard deviation σ, the estimated correlation time tcorr, and the error of the mean value ϵ.

2. Alanine dipeptide, reversible sampling with fixed equilibrium distribution

Below we report results for sampling with fixed stationary distribution. The stationary distribution πi was, for sake of simplicity, computed using the relative frequencies of state occurrences,

(83)

It should be noted that a more useful and independent source of π are enhanced sampling simulations targeted at rapidly generating a good estimate of the equilibrium probabilities alone. See Ref. 45 for methods and applications that combine MD simulations and enhanced sampling simulations in order to efficiently compute rare-event kinetics.

Results are shown in Table VI and Fig. 9. The sample mean is again in good agreement with the reference value. For the computation of the reference value, we use the MLE transition matrix of the full simulation data reversible with respect to the input stationary distribution for the posterior sampling. The additional constraint imposed by fixing the stationary distribution is clearly reflected in smaller standard deviations for all shown observables compared to the reversible case.

TABLE VI.

Comparison of reference implied time scales, ( t ˆ i ), with mean μ and standard deviation σ from the reversible posterior using a fixed equilibrium distribution and N = 105 samples. ϵ is the estimated error of the mean μ and tcorr is the autocorrelation time of the sampled quantity.

t ˆ i (ps) μ (ps) σ (ps) ϵ (ps) tcorr
t2  1594  1520  196  0.6 
t3  72  73  0.003 
t4  38  41  0.01 
t ˆ i (ps) μ (ps) σ (ps) ϵ (ps) tcorr
t2  1594  1520  196  0.6 
t3  72  73  0.003 
t4  38  41  0.01 
FIG. 9.

(a)-(c) Implied time scales, ti. Histograms obtained from reversible posterior sampling with fixed stationary vector. Dashed lines indicate the reference value, t ˆ i , dotted lines indicate the posterior sample mean, μ(ti). The 90% credible intervals are the shaded regions in gray. (d)-(f) Expected hitting times τ. Histograms obtained from reversible posterior sampling with fixed stationary vector. Dashed lines indicate the reference value, τ ˆ , dotted lines indicate the posterior sample mean, μ(τ). The reference value obtained from a long simulation is clearly compatible with the posterior sample (credible interval).

FIG. 9.

(a)-(c) Implied time scales, ti. Histograms obtained from reversible posterior sampling with fixed stationary vector. Dashed lines indicate the reference value, t ˆ i , dotted lines indicate the posterior sample mean, μ(ti). The 90% credible intervals are the shaded regions in gray. (d)-(f) Expected hitting times τ. Histograms obtained from reversible posterior sampling with fixed stationary vector. Dashed lines indicate the reference value, τ ˆ , dotted lines indicate the posterior sample mean, μ(τ). The reference value obtained from a long simulation is clearly compatible with the posterior sample (credible interval).

Close modal

Histograms for expected hitting times between meta-stable sets are shown in Figs. 9(d)9(f). The sample mean is again in good agreement with the reference value. Again, we summarize our results, cf. Table VII.

TABLE VII.

Expected hitting times computed from reversible sampling with fixed stationary vector using N = 105 samples. See Table VI for the definition of symbols.

τ ˆ (ns) μ (ns) σ (ns) tcorr ϵ (ns)
τ ( C 5 C 7 a x )   56.0  54.5  5.0  0.02   
τ(C5αL 41.5  39.5  5.1  0.02   
τ(C5αR 0.249  0.251  0.003  9.7 ⋅ 10−6 
τ ˆ (ns) μ (ns) σ (ns) tcorr ϵ (ns)
τ ( C 5 C 7 a x )   56.0  54.5  5.0  0.02   
τ(C5αL 41.5  39.5  5.1  0.02   
τ(C5αR 0.249  0.251  0.003  9.7 ⋅ 10−6 

3. Bovine pancreatic trypsin inhibitor, reversible sampling

For BPTI, we used the 1 ms simulation generated on the Anton supercomputer.63 Please refer to that paper for system setup and simulation details. We prepared data as follows: Cα atom positions were oriented to the mean structure and saved every 10 ns, resulting in about 100 000 configurations with 174 dimensions. Time-lagged independent component analysis (TICA)24,32 was applied to reduce this 174-dimensional space to the two dominant ICs as a spectral gap was found after the second nontrivial eigenvalue. k-means clustering with k = 100 was used to discretize this space.

Effective count matrices were obtained using the method described in Ref. 56 at a range of lag times up to 2 μs. Fig. 10 shows the implied relaxation time scales obtained from a maximum likelihood estimate with values comparable to the Hidden Markov model analysis in Ref. 10. The figure also shows uncertainties computed from a reversible transition matrix sampling as described above with N = 1000 samples. Only every 20th transition matrix sample was used to compute time scales in order to reduce the computational effort to 50 eigenvalue decompositions. It is seen that the MLE is nicely contained in the 2σ (95%) credible interval. The entire transition matrix sampling for Fig. 10 took about 12.5 s on a 1.7 GHz Intel Core i7. Given that 8 lag times were used for 1000 samples of 100 × 100 matrices that contained about 40% non-zeros, about 2.56 × 106 elements are sampled per second, and about 640 full transition matrix samples are generated per second. Below, a more systematic analysis of the computational efficiency is made.

FIG. 10.

Implied time scales for a Markov model of bovine pancreatic trypsin inhibitor (BPTI). The error bars are 95% confidence intervals estimated using the reversible transition matrix sampling algorithm described here using transition counts as described in Ref. 56.

FIG. 10.

Implied time scales for a Markov model of bovine pancreatic trypsin inhibitor (BPTI). The error bars are 95% confidence intervals estimated using the reversible transition matrix sampling algorithm described here using transition counts as described in Ref. 56.

Close modal

We compute acceptance probabilities of the Metropolis-Hastings steps and compare the statistical efficiency of the proposed sampling algorithm with the algorithm proposed in Ref. 39 that uses uniform proposal densities. Efficiency is measured in terms of achieved autocorrelation times for sampling of transition matrices with different sizes. As a representative observable, we choose the largest relaxation time scale t2 for the alanine dipeptide molecule and compute autocorrelation functions and autocorrelation times from a large sample of size N = 106. Two differently fine discretizations were used, resulting in n × n-shaped transition matrices with n = 258 and n = 1108.

Fig. 11 shows autocorrelation functions for the second largest relaxation time scale, t2, for a reversible posterior ensemble. The autocorrelation function for the reversible sampling algorithm with posterior adapted proposals shows a much faster decay than the autocorrelation function for the algorithm in Ref. 39. Table VIII compares acceptance probabilities and autocorrelation times. The present proposal steps lead to very high acceptance probabilities, p > 0.99, for the sampling off-diagonal entries. The main advance, however, comes from the fact that the step for sampling diagonal transition matrix elements in Ref. 39 has suffered from a very poor acceptance probability. As that step was the only step that modified the equilibrium distribution, the sampler in Ref. 39 has very poor mixing properties. In contrast, our current algorithm generates independent samples for the diagonal elements, resulting in an acceptance probability of p = 1.0.

FIG. 11.

Autocorrelation function for reversible sampling. Sampling of transition matrices with n = 233 states (a) and (b) with n = 1108 states. Increasing the dimension of the state space has only a small effect on the new sampling algorithm.

FIG. 11.

Autocorrelation function for reversible sampling. Sampling of transition matrices with n = 233 states (a) and (b) with n = 1108 states. Increasing the dimension of the state space has only a small effect on the new sampling algorithm.

Close modal
TABLE VIII.

Acceptance probability and autocorrelation times for old vs. new reversible sampling algorithm. n: number of states; poffdiag, pdiag: acceptance probabilities for off-diagonal and diagonal elements, respectively. tcorr: autocorrelation time for the sampling of the slowest relaxation time scale, in number of transition matrix sampling steps.

Algorithm n poffdiag pdiag tcorr
Old  233  0.216  0.011  1088.1 
1108  0.271  0.005  3241.9 
New  233  0.994  1.0  194.7 
1108  0.995  1.0  242.6 
Algorithm n poffdiag pdiag tcorr
Old  233  0.216  0.011  1088.1 
1108  0.271  0.005  3241.9 
New  233  0.994  1.0  194.7 
1108  0.995  1.0  242.6 

The autocorrelation times for the new sampler are more than a factor 5 shorter for the small (233 state) matrix and more than a factor 13 shorter for the large (1108 state) matrix, indicating a much higher efficiency of the new approach. The autocorrelation time of the new algorithm increases only mildly for matrices of increased dimension, indicating that the present algorithm will be useful for very large Markov models.

Fig. 12 shows autocorrelation functions for reversible sampling with fixed stationary vector. The posterior adapted proposals in reversible sampling algorithm with fixed stationary distribution again result in a much faster decay of the autocorrelation function than the uniform proposals of the algorithm in Ref. 39. Table IX compares acceptance probabilities and autocorrelation times for the two algorithms. For sampling with fixed stationary vector, there is no sampling step for the diagonal elements. Although the average acceptance probabilities are only a factor of 3-4 better for our new algorithm, the autocorrelation times are decreased by a factor 35 for the small system (233 states) and over a factor 300 for the large system (1108 states). Again, there is only a mild increase in autocorrelation time when the dimension of the sampled space is increased.

FIG. 12.

Autocorrelation function for reversible sampling with fixed stationary distribution. Sampling of transition matrices with n = 233 states (a) and (b) with n = 1108 states. Increasing the dimension of the state space has only a small effect on the new sampling algorithm.

FIG. 12.

Autocorrelation function for reversible sampling with fixed stationary distribution. Sampling of transition matrices with n = 233 states (a) and (b) with n = 1108 states. Increasing the dimension of the state space has only a small effect on the new sampling algorithm.

Close modal
TABLE IX.

Acceptance probability and autocorrelation times for old vs. new reversible sampling algorithm with fixed stationary distribution. Symbols as in Table VIII.

Algorithm n poffdiag tcorr
Old  233  0.175  72.096 
1108  0.230  >1000a 
New  233  0.752  2.893 
1108  0.706  3.157 
Algorithm n poffdiag tcorr
Old  233  0.175  72.096 
1108  0.230  >1000a 
New  233  0.752  2.893 
1108  0.706  3.157 
a

Autocorrelation function not converged.

In this work, we have described and significantly extended the state of the art in reversible Markov model estimation. Reversible Markov models are expected to naturally arise from molecular dynamics implementations that fulfill microscopic reversibility. Reversibility is an essential property in order to analyze the equilibrium kinetics of a molecule. However, in order to have reversibility in a Markov model, it needs to be enforced in the estimation procedure. When done correctly, reversible estimation does not bias the model but rather reduces statistical errors as a result of a smaller number of degrees of freedom.

We have presented minor improvements to an existing self-consistent estimation algorithm for reversible Markov models. Then, we have presented a new and efficient algorithm to estimate reversible maximum likelihood Markov models given a fixed equilibrium distribution.

The main part of the presented work focuses on the long-standing problem of Bayesian estimation of the posterior ensemble of reversible transition matrices. Although several algorithms to sample reversible Markov models have been presented in the past, they have been hampered by three fundamental problems, two of which are addressed here: (i) Which prior should be chosen such that the posterior is located around the true value rather than completely elsewhere? (ii) How should transition counts be obtained when time series are correlated and not really Markovian at any given lag time τ? (iii) How can the sampling algorithm be made efficient such that also large transition matrices can be sampled in reasonable time?

To address problem (i), we develop priors that ensure that reference values and sample mean are similar. The key property of these priors is that they make the a priori assumption that transitions between states that have not been sampled in the trajectory in either direction have zero probability. This is a sparse prior, i.e., an improper prior enforcing that the sampled transition matrix has the same sparsity structure as the maximum likelihood estimate and as induced by the observation. In contrast to most other priors that have been previously suggested, these sparse priors achieve the desired property of creating errors bars that nicely envelop the reference estimates.

For problem (ii), we have described the principles of how it can be addressed. We suggest that effective count matrices are obtained using the concept of statistical inefficients. A separate preprint56 suggests an initial solution towards this aim that is successfully applied on simulation data of the bovine pancreatic trypsin inhibitor in the present paper. The solution of problem (ii) is still in its infancy and needs further investigation.

For problem (iii), we present highly efficient Gibbs sampling algorithms for reversible transition matrices and reversible transition matrices with fixed equilibrium distribution. Both methods are demonstrated to have acceptance probabilities close to 1 in their individual update steps. Autocorrelation times from samples of the slowest relaxation time scale are one or two orders of magnitude shorter than with a previous Gibbs sampling algorithm, indicating a high statistical efficiency of our sampler.

Implementations of all algorithms described here are available in PyEMMA64 as of version 2.0 or later — www.pyemma.org.

This work was funded by the Deutsche Forschungsgemeinschaft (DFG) Project Nos. SFB 1114/C03 (F.P.), SFB 1114/A04 (H.W.), Grant No. 825/3-1 (B.T.S.), and the European Research Council (ERC) starting grant “pcCell” (F.N., B.T.S., F.P.). The authors would like to thank the anonymous referees for their valuable comments.

1. Reversible transition matrix sampling: Derivation of marginal densities

We first pick a single element (k,  l) of X (diagonal or off-diagonal) and sample it from a proposal density x k l q ( x k l | X ) that is scale-invariant with q ( x k l | X ) q ( c x k l | c X ) for all c > 0,

(A1)

and then renormalize the matrix such that it retains an element sum of 1,

(A2)

Since q ( x k l | X ) is a probability density function, we can obtain from its scale-invariance that

(A3)

and

(A4)

According to Theorem 13.1 in Ref. 65, the posterior distribution ℙ(X|C) is the invariant distribution of the proposed update step if we accept X ̄ as the new sample with probability min{1, pacc} and

(A5)

where q x ( x ̄ k l | X ) denotes the proposal density of x ̄ i j given X. Note X only contains n(n + 1)/2 − 1 free variables. So we select {xij|ij, (i, j) ≠ (i′, j′)} as the free variable set of X, where xij is an arbitrary element of X with i′ ≥ j′ and (i′, j′) ≠ (k, l). Let us consider each term on the right hand side of (A5).

From the definition of X ̄ , we have

(A6)

and

(A7)

The proposal density of x ̄ k l given X can be expressed as

(A8)

Therefore,

(A9)

and

(A10)

The partial derivative of x ̄ i j with respect to xij for (i, j) ≠ (k, l) can be calculated according to (A2) as

(A11)

Combining all the above results leads to

(A12)

with

(A13)

2. Reversible transition matrix sampling: Efficient proposal densities

a. Diagonals

Let us define variable s = x k k x k x k k + x k k . If x k k is sampled from the proposal density γ ( x k k | X ) , the corresponding proposal density of s′ can be expressed as

(A14)

Note that

(A15)

Therefore,

(A16)

and

(A17)

The above equation implies that s′ follows the beta distribution with parameters ckk and ckckk. So we can sample s′ ∼ Beta(ckk, ckckk) and get x k k by the back-transform.

b. Off-diagonals

We consider how to approximate γ ( x k l | X ) with kl and bkl = − 1 by a gamma distribution density function. Define

(A18)

The function f ( x k l ) is then given by

(A19)

We approximate f using a three parameter family of functions

(A20)

so that the corresponding approximate γ ( x k l | X ) is

(A21)

(A21) is a gamma distribution with parameters α, β which can be efficiently sampled.66 

The three parameters α, β, f0 are obtained matching f and f ˆ up to second derivatives at the maximum point

(A22)

This leads to the following linear system:

(A23a)
(A23b)
(A23c)

with solution

(A23d)
(A23e)
(A23f)

and

The maximum point can be computed as the root of a quadratic equation in the usual way,

(A23g)

with parameters a, b, c given by

(A23h)
(A23i)
(A23j)

The second solution corresponding to (A23g) with negative sign in front of the square root can be safely excluded since y ̄ is required to be non-negative.

3. Reversible sampling with fixed stationary distribution: Efficient proposal densities

We first prove that there exists a maximum likelihood estimate P ˆ = [ p ˆ i j ] of the transition matrix which satisfies | { k | p ˆ k k > 0 , c k k = 0 } | 1 . Suppose that X = [ x i j ] is a maximum likelihood estimate of X and there are two different indices k, l which satisfy that x l l x k k > 0 and ckk = cll = 0. We can then construct a new matrix X = [ x i j ] with

It can be verified that the likelihood of X″ is larger than X′, which implies that X″ is also a maximum likelihood estimate of X. Repeat the above procedure, we can finally get an maximum likelihood estimate X ˆ of X which has at most one k with x ˆ k k > 0 and ckk = 0, and the corresponding P ˆ satisfies that | { k | p ˆ k k > 0 , c k k = 0 } | 1 .

We will now investigate how to approximate the density

(A24)

with s > 1 by a gamma distribution density.

As in the reversible case, we will use the representation

(A25)

and approximate f(v′) using the three parameter family, f ˆ ( v | α , β , f 0 ) , given in (A20). The resulting approximate density, γ ˆ v ( v | X ) , has the same nice properties as the one from (A21).

Parameters α, β, and f0 are given by (A23d), (A23e), and (A23f). The maximum point v ̄ is given by (A23g). The parameters a, b, c are

(A26a)
(A26b)
(A26c)
1.
C.
Schütte
,
A.
Fischer
,
W.
Huisinga
, and
P.
Deuflhard
,
J. Comput. Phys.
151
,
146
(
1999
).
2.
W. C.
Swope
,
J. W.
Pitera
, and
F.
Suits
,
J. Phys. Chem. B
108
,
6571
(
2004
).
3.
N.
Singhal
,
C. D.
Snow
, and
V. S.
Pande
,
J. Chem. Phys.
121
,
415
(
2004
).
4.
V.
Schultheis
,
T.
Hirschberger
,
H.
Carstens
, and
P.
Tavan
,
J. Chem. Theory Comput.
1
,
515
(
2005
).
5.
F.
Noe
,
I.
Horenko
,
C.
Schütte
, and
J. C.
Smith
,
J. Chem. Phys.
126
,
155102
(
2007
).
6.
A.
Pan
and
B.
Roux
,
J. Chem. Phys.
129
,
064107
(
2008
).
7.
J.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J.
Chodera
,
C.
Schütte
, and
F.
Noé
,
J. Chem. Phys.
134
,
174105
(
2011
).
8.
P.
Deuflhard
and
M.
Weber
, in
Linear Algebra and Its Applications
, edited by
M.
Dellnitz
,
S.
Kirkland
,
M.
Neumann
, and
C.
Schütte
(
Elsevier
,
New York
,
2005
), Vol.
398C
, pp.
161
184
.
9.
N.
Singhal
and
V. S.
Pande
,
J. Chem. Phys.
123
,
204909
(
2005
).
10.
F.
Noé
,
H.
Wu
,
J.-H.
Prinz
, and
N.
Plattner
,
J. Chem. Phys.
139
,
184114
(
2013
).
11.
F.
Noé
,
S.
Doose
,
I.
Daidone
,
M.
Löllmann
,
J. D.
Chodera
,
M.
Sauer
, and
J. C.
Smith
,
Proc. Natl. Acad. Sci. U. S. A.
108
,
4822
(
2011
).
12.
W.
Zhuang
,
R. Z.
Cui
,
D.-A.
Silva
, and
X.
Huang
,
J. Phys. Chem. B
115
,
5415
(
2011
).
13.
B. G.
Keller
,
J.-H.
Prinz
, and
F.
Noé
,
Chem. Phys.
396
,
92
(
2012
).
14.
B.
Lindner
,
Z.
Yi
,
J.-H.
Prinz
,
J. C.
Smith
, and
F.
Noé
,
J. Chem. Phys.
139
,
175101
(
2013
).
15.
S.
Sriraman
,
I. G.
Kevrekidis
, and
G.
Hummer
,
J. Phys. Chem. B
109
,
6479
(
2005
).
16.
H.
Wu
,
A. S. J. S.
Mey
,
E.
Rosta
, and
F.
Noé
,
J. Chem. Phys.
141
,
214106
(
2014
).
17.
H.
Wu
and
F.
Noé
,
SIAM Multiscale Model. Simul.
12
,
25
(
2014
).
18.
E.
Rosta
and
G.
Hummer
,
J. Chem. Theory Comput.
11
,
276
(
2015
).
19.
S. P.
Elmer
,
S.
Park
, and
V. S.
Pande
,
J. Chem. Phys.
123
,
114903
(
2005
).
20.
F.
Noé
,
C.
Schütte
,
E.
Vanden-Eijnden
,
L.
Reich
, and
T. R.
Weikl
,
Proc. Natl. Acad. Sci. U. S. A.
106
,
19011
(
2009
).
21.
I.
Buch
,
T.
Giorgino
, and
G.
De Fabritiis
,
Proc. Natl. Acad. Sci. U. S. A.
108
,
10184
(
2011
).
22.
T. J.
Lane
,
G. R.
Bowman
,
K.
Beauchamp
,
V. A.
Voelz
, and
V. S.
Pande
,
J. Am. Chem. Soc.
133
,
18413
(
2011
).
23.
G. R.
Bowman
,
V. A.
Voelz
, and
V. S.
Pande
,
J. Am. Chem. Soc.
133
,
664
(
2011
).
24.
G.
Perez-Hernandez
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noe
,
J. Chem. Phys.
139
,
015102
(
2013
).
25.
N.
Stanley
,
S.
Esteban-Martin
, and
G. D.
Fabritiis
,
Nat. Commun.
5
,
6272
(
2014
).
26.
M.
Held
,
P.
Metzner
,
J.
Prinz
, and
F.
Noé
,
Biophys. J.
100
,
701
(
2011
).
27.
D.
Huang
and
A.
Caflisch
,
PLoS Comput. Biol.
7
,
e1002002
(
2011
).
28.
D.-A.
Silva
,
G. R.
Bowman
,
A.
Sosa-Peinado
, and
X.
Huang
,
PLoS Comput. Biol.
7
,
e1002054
(
2011
).
29.
G.
Bowman
and
P.
Geissler
,
Proc. Natl. Acad. Sci. U. S. A.
109
,
11681
(
2012
).
30.
N.
Plattner
and
F.
Noé
,
Nat. Commun.
6
,
7653
(
2015
).
31.
J.
Chodera
,
N.
Singhal
,
V.
Pande
,
K.
Dill
, and
W.
Swope
,
J. Chem. Phys.
126
,
155101
(
2007
).
32.
C. R.
Schwantes
and
V. S.
Pande
,
J. Chem. Theory Comput.
9
,
2000
(
2013
).
33.
M.
Sarich
,
F.
Noé
, and
C.
Schütte
,
SIAM Multiscale Model. Simul.
8
,
1154
(
2010
).
34.
F.
Noé
and
F.
Nüske
,
Multiscale Model. Simul.
11
,
635
(
2013
).
35.
F.
Nüske
,
B. G.
Keller
,
G.
Pérez-Hernández
,
A. S. J. S.
Mey
, and
F.
Noé
,
J. Chem. Theory Comput.
10
,
1739
(
2014
).
36.
N. V.
Buchete
and
G.
Hummer
,
J. Phys. Chem. B
112
,
6057
(
2008
).
37.
G. R.
Bowman
,
K. A.
Beauchamp
,
G.
Boxer
, and
V. S.
Pande
,
J. Chem. Phys.
131
,
124101
(
2009
).
38.
N. S.
Hinrichs
and
V. S.
Pande
,
J. Chem. Phys.
126
,
244101
(
2007
).
39.
F.
Noé
,
J. Chem. Phys.
128
,
244103
(
2008
).
40.
J. D.
Chodera
and
F.
Noé
,
J. Chem. Phys.
133
,
105102
(
2010
).
41.
C.
Schütte
,
F.
Noé
,
J.
Lu
,
M.
Sarich
, and
E.
Vanden-Eijnden
,
J. Chem. Phys.
134
,
204105
(
2011
).
42.
B.
Trendelkamp-Schroer
and
F.
Noé
,
J. Chem. Phys.
138
,
164113
(
2013
).
43.
W. F.
Van Gunsteren
and
H. J. C.
Berendsen
,
Mol. Simul.
1
,
173
(
1988
).
44.
M.
Tuckerman
,
B. J.
Berne
, and
G. J.
Martyna
,
J. Chem. Phys.
97
,
1990
(
1992
).
45.
B.
Trendelkamp-Schroer
and
F.
Noé
, e-print arXiv:1409.6439 (
2014
).
46.
P.
Metzner
,
F.
Noé
, and
C.
Schütte
,
Phys. Rev. E
80
,
021106
(
2009
).
47.
S.
Bacallado
,
J. D.
Chodera
, and
V.
Pande
,
J. Chem. Phys.
131
,
045106
(
2009
).
48.
J.
Besag
and
D.
Mondal
,
Biometrics
69
,
488
(
2013
).
49.
A.
Bittracher
,
P.
Koltai
, and
O.
Junge
, “
Pseudo generators of spatial transfer operators
,”
SIAM J. Appl. Dyn. Syst.
14
(
3
),
1478
(
2015
).
50.
B. N.
Parlett
,
The Symmetric Eigenvalue Problem
,
Classics in Applied Mathematics
(
Society for Industrial and Applied Mathematics
,
Philadelphia
,
1998
).
51.
S.
Röblitz
and
M.
Weber
,
Adv. Data Anal. Classif.
7
,
147
(
2013
).
52.
P.
Metzner
,
C.
Schütte
, and
E.
Vanden-Eijnden
,
Multiscale Model. Simul.
7
,
1192
(
2009
).
53.
A.
Berezhkovskii
,
G.
Hummer
, and
A.
Szabo
,
J. Chem. Phys.
130
,
205102
(
2009
).
54.
W.
E
and
E.
Vanden-Eijnden
,
J. Stat. Phys.
123
,
503
(
2006
).
55.
W.
Janke
, in
Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms
,
NIC Series
, edited by
J.
Grotendorst
,
D.
Marx
, and
A.
Muramatsu
(
John von Neumann Institute for Computing
,
Jülich
,
2002
), Vol.
10
, pp.
423
445
.
56.
F.
Noé
, preprint, http://publications.mi.fu-berlin.de/1699/ (
2015
).
57.
T. W.
Anderson
and
L. A.
Goodman
,
Ann. Math. Stat.
28
,
89
(
1957
).
58.
S.
Ji
,
Y.
Xue
, and
L.
Carin
,
IEEE Trans. Signal Process.
56
,
2346
(
2008
).
59.
P.
Eastman
,
M. S.
Friedrichs
,
J. D.
Chodera
,
R. J.
Radmer
,
C. M.
Bruns
,
J. P.
Ku
,
K. A.
Beauchamp
,
T. J.
Lane
,
L.-P.
Wang
,
D.
Shukla
,
T.
Tye
,
M.
Houston
,
T.
Stich
,
C.
Klein
,
M. R.
Shirts
, and
V. S.
Pande
,
J. Chem. Theory Comput.
9
,
461
(
2013
).
60.
K.
Lindorff-Larsen
,
S.
Piana
,
K.
Palmo
,
P.
Maragakis
,
J. L.
Klepeis
,
R. O.
Dror
, and
D. E.
Shaw
,
Proteins: Struct., Funct., Bioinf.
78
,
1950
(
2010
).
61.
W. L.
Jorgensen
,
J.
Chandrasekhar
,
J. D.
Madura
,
R. W.
Impey
, and
M. L.
Klein
,
J. Chem. Phys.
79
,
926
(
1983
).
62.
63.
D. E.
Shaw
,
P.
Maragakis
,
K.
Lindorff-Larsen
,
S.
Piana
,
R.
Dror
,
M.
Eastwood
,
J.
Bank
,
J.
Jumper
,
J.
Salmon
,
Y.
Shan
, and
W.
Wriggers
,
Science
330
,
341
(
2010
).
64.
M. K.
Scherer
,
B.
Trendelkamp-Schroer
,
F.
Paul
,
G.
Perez-Hernandez
,
M.
Hoffmann
,
N.
Plattner
,
C.
Wehmeyer
,
J.-H.
Prinz
, and
F.
Noé
, “
PyEMMA 2: A software package for estimation, validation, and analysis of Markov models
,”
J. Chem. Theory Comput.
(published online)
65.
S.
Sawyer
,
The Metropolitan-Hastings Algorithm and Extensions
(
Washington University
,
2006
).
66.
L.
Devroye
,
Non-Uniform Random Variate Generation
(
Springer
,
1986
).