Understanding dynamics in complex systems is challenging because there are many degrees of freedom, and those that are most important for describing events of interest are often not obvious. The leading eigenfunctions of the transition operator are useful for visualization, and they can provide an efficient basis for computing statistics, such as the likelihood and average time of events (predictions). Here, we develop inexact iterative linear algebra methods for computing these eigenfunctions (spectral estimation) and making predictions from a dataset of short trajectories sampled at finite intervals. We demonstrate the methods on a low-dimensional model that facilitates visualization and a high-dimensional model of a biomolecular system. Implications for the prediction problem in reinforcement learning are discussed.

Modern observational, experimental, and computational approaches often yield high-dimensional time series data (trajectories) for complex systems. In principle, these trajectories contain rich information about dynamics and, in particular, the infrequent events that are often most consequential. In practice, however, high-dimensional trajectory data are often difficult to parse for useful insight. The need for more efficient statistical analysis tools for trajectory data is critical, especially when the goal is to understand rare-events that may not be well represented in the data.

We consider dynamics that can be treated as Markov processes. A common starting point for statistical analyses of Markov processes is the transition operator, which describes the evolution of function expectations. The eigenfunctions of the transition operator characterize the most slowly decorrelating features (modes) of the system.1–5 These can be used for dimensionality reduction to obtain a qualitative understanding of the dynamics,6,7 or they can be used as the starting point for further computations.8–10 Similarly, prediction functions, which provide information about the likelihood and timing of future events as a function of the current state, are defined through linear equations of the transition operator.10,11

A straightforward numerical approach to obtaining these functions is to convert the transition operator to a matrix by projecting onto a finite basis for Galerkin approximation.1,2,10–15 The performance of such a linear approximation depends on the choice of basis,10,11,15 and previous work often resorts to a set of indicator functions on a partition of the state space (resulting in a Markov state model or MSM14) for lack of a better choice. While Galerkin approximation has yielded many insights,16,17 the limited expressivity of the basis expansion has stimulated interest in (nonlinear) alternatives.

In particular, artificial neural networks can be harnessed to learn eigenfunctions of the transition operator and prediction functions from data.5,18–25 However, existing approaches based on neural networks suffer from various drawbacks. As discussed in Ref. 5, their performance can often be very sensitive to hyperparameters, requiring extensive tuning and varying with random initialization. Many use loss functions that are estimated against the stationary distribution,25–30 so that metastable states contribute most heavily, which negatively impacts performance.24,30 Assumptions about the dynamics (e.g., microscopic reversibility) limit applicability. In Ref. 24, we introduced an approach that overcomes the issues above, but it uses multiple trajectories from each initial condition; this limits the approach to analysis of simulations and moreover requires specially prepared datasets.

The need to compute prediction functions from observed trajectory data also arises in reinforcement learning. There the goal is to optimize an expected future reward (the prediction function) over a policy (a Markov process). For a fixed Markov process, the prediction problem in reinforcement learning is often solved by temporal difference (TD) methods, which allow the use of arbitrary ensembles of trajectories without knowledge of the details of the underlying dynamics.31 TD methods have a close relationship with an inexact form of power iteration, which, as we describe, can perform poorly on rare-event related problems.

Motivated by this relationship, as well as by an inexact power iteration scheme previously proposed for approximating the stationary probability distribution of a Markov process using trajectory data,32 we propose a computational framework for spectral estimation and rare-event prediction based on inexact iterative numerical linear algebra. Our framework includes an inexact Richardson iteration for the prediction problem, as well as an extension to inexact subspace iteration for the prediction and spectral estimation problems. The theoretical properties of exact subspace iteration suggest that eigenfunctions outside the span of the approximation will contribute significantly to the error of our inexact iterative schemes.33 Consistent with this prediction, we demonstrate that learning additional eigenvalues and eigenfunctions simultaneously through inexact subspace iteration accelerates convergence dramatically relative to inexact Richardson and power iteration in the context of rare events. While we assume the dynamics can be modeled by a Markov process, we do not require knowledge of their form or a specific underlying model. The method shares a number of further advantages with the approach discussed in Ref. 24 without the need for multiple trajectories from each initial condition in the dataset. This opens the door to treating a wide range of observational, experimental, and computational datasets.

The remainder of the paper is organized as follows: In Sec. II, we describe the quantities that we seek to compute in terms of linear operators. In Secs. III and IV, we introduce an inexact subspace iteration algorithm that we use to solve for these quantities. Section V illustrates how the loss function can be tailored to the known properties of the desired quantity. Section VI summarizes the two test systems that we use to illustrate our methods: a two-dimensional potential, for which we can compute accurate reference solutions, and a molecular example that is high-dimensional but still sufficiently tractable that statistics for comparison can be computed from long trajectories. In Sec. VII, we explain the details of the invariant subspace iteration and then demonstrate its application to our two examples. Finally, Sec. VIII details how the subspace iteration can be modified to compute prediction functions and compares the effect of different loss functions, as well as the convergence properties of power iteration and subspace iteration. We conclude with implications for reinforcement learning.

We have two primary applications in mind in this article. First, we would like to estimate the dominant eigenfunctions and eigenvalues of the transition operator Tt for a Markov process XtRd, defined as
(1)
where f is an arbitrary real-valued function and the subscript x indicates the initial condition X0 = x. The transition operator encodes how expectations of functions evolve in time. The right eigenfunctions of Tt with the largest eigenvalues characterize the most slowly decorrelating features (modes) of the Markov process.1,2,4,5
Our second application is to compute prediction functions of the general form
(2)
where T is the first time XtD from some domain D, and Ψ and Γ are functions associated with the escape event (rewards in reinforcement learning). Prototypical examples of prediction functions that appear in our numerical results are the mean first passage time (MFPT) from each x
(3)
and the committor
(4)
where A and B are disjoint sets (“reactant” and “product” states) and D=(AB)c. The MFPT is important for estimating rates of transitions between regions of state space, while the committor can serve as a reaction coordinate29,34–36 and as a key ingredient in transition path theory statistics.24,37,38 For any τ > 0, the prediction function u(x) satisfies the linear equation
(5)
for xD, with boundary condition
(6)
for xD. In (5), I is the identity operator and
(7)
is the “stopped” transition operator.10 We use the notation τT = min{τ, T}. The parameter τ is known as the lag time. While it is an integer corresponding to a number of integration steps, in our numerical examples, we often express it in terms of equivalent time units.
Our specific goal is to solve both the eigenproblem and the prediction problem for Xt in high dimensions and without direct access to a model governing its evolution. Instead, we have access to trajectories of Xt of a fixed, finite duration τ. A natural and generally applicable approach to finding an approximate solution to the prediction problem is to attempt to minimize the residual of (5) over parameters θ of a neural network uθ(x) representing u(x). For example, we recently suggested an algorithm that minimizes the residual norm
(8)
where r(x) is the right-hand side of (5) and μ is an arbitrary distribution of initial conditions X0 (boundary conditions were enforced by an additional term).24 The norm ‖⋅‖μ is defined by ‖ fμ2=⟨ f,fμ, where ⟨ f,gμ= ∫f(x)g(x)μ(dx) for arbitrary functions f and g. The gradient of the residual norm in (8) with respect to neural-network parameters θ can be written as
(9)
Given a dataset of initial conditions {Xj0}j=1n drawn from μ and a single sample trajectory {Xjt}t=0τ of Xt for each Xj0, the first term in the gradient (9) can be approximated without bias as
(10)
where Tj is the first time XjtD.

In contrast, unbiased estimation of the second term in (9) requires access to two independent trajectories of Xt for each sample initial condition since it is quadratic in Sτ.24,31 In the reinforcement learning community, TD methods were developed for the purpose of minimizing a loss of a very similar form to (8), and they perform a “semigradient” descent by following only the first term in (9).31 As in the semigradient approximation, the algorithms proposed in this paper only require access to inner products of the form f,Agμ for an operator A related to Tτ or Sτ, and they avoid terms that are non-linear in A. As we explain, such inner products can be estimated using trajectory data. However, rather than attempting to minimize the residual directly by an approximate gradient descent, we view the eigenproblem and prediction problems through the lens of iterative numerical linear algebra schemes.

To motivate the iterative numerical linear algebra point of view, observe that the first term in (9) is the exact gradient with respect to θ′ of the loss
(11)
evaluated at θ′ = θ. The result of many steps of gradient descent (later, “inner iterations”) on this loss with θ held fixed can then be viewed as an inexact Richardson iteration for (5), resulting in a sequence
(12)
where, for each s, uθs is a parameterized neural-network approximation of the solution to (5). To unify our discussion of the prediction and spectral estimation problems, it is helpful to observe that (12) can be recast as an equivalent inexact power iteration
(13)
where
(14)
Note that (1, u) is the dominant eigenfunction of A and has eigenvalue equal to 1.

Reference 32 introduced an inexact power iteration to compute the stationary probability measure of Tτ, i.e., its dominant left eigenfunction. As those authors note, an inexact power update, such as (13), can be enforced using a variety of loss functions. In our setting, the Lμ2 norm in (11) can be replaced by any other measure of the difference between uθ and Sτuθ+r, as long as an unbiased estimator of its gradient with respect to θ′ is available. This flexibility is discussed in more detail in Sec. V, and we exploit it in applications presented later in this article. For now, we focus instead on another important implication of this viewpoint: the flexibility in the form of the iteration itself.

We will see that when the spectral gap of A, the difference between its largest and second largest eigenvalues, is small, the inexact power iteration (or the equivalent Richardson iteration) described above fails to reach an accurate solution. The largest eigenvalue of A in (14) is 1 and its next largest eigenvalue is the dominant eigenvalue of Sτ. For rare-event problems, the difference between these two eigenvalues is expected to be very small. Indeed, when X0 is drawn from the quasi-stationary distribution of Xt in D, the logarithm of the largest eigenvalue of Sτ is exactly τ/E[T].39 Thus, when the mean exit time from D is large, we can expect the spectral gap of A in (14) to be very small. In this case, classical convergence results tell us that exact power iteration converges slowly, with the largest contributions to the error coming from the eigenfunctions of Sτ with largest magnitude eigenvalues.33 Iterative schemes that approximate multiple dominant eigenfunctions simultaneously, such as subspace iteration and Krylov methods, can converge much more rapidly.33 In Sec. IV, we introduce an inexact form of subspace iteration to address this shortcoming. Beyond dramatically improving performance on the prediction problem for rare-events when applied with A in (14), it can also be applied with A=Tτ to compute multiple dominant eigenfunctions and values of Tτ itself.

Our inexact subspace iteration for the k dominant eigenfunctions/values of A proceeds as follows. Let φθsaa=1k be a sequence of k basis functions parameterized by θs (these can be scalar or vector valued functions depending on the form of A). We can represent this basis as the vector valued function
(15)
Then, we can obtain a new set of k basis functions by approximately applying A to each of the components of Uθs,
(16)
where Ks+1 is an invertible, upper-triangular k × k matrix that does not change the span of the components of Uθs+1 but is included to facilitate training. One way the approximate application of A can be accomplished is by minimizing
(17)
over θ and K with θs held fixed.
The eigenvalues and eigenfunctions of A are then approximated by solving the finite-dimensional generalized eigenproblem
(18)
where
(19)
(20)
each inner product is estimated using trajectory data, and W and Λ are k × k matrices. The matrix Λ is diagonal, and the ath eigenvalue λa of A is approximated by Λaa; the corresponding eigenfunction va is approximated by b=1kWabφθsb.

Even when sampling is not required to estimate the matrices in (19) and (20), the numerical rank of Cτ becomes very small as the eigenfunctions become increasingly aligned with the single dominant eigenfunction. To overcome this issue, we apply an orthogonalization step between iterations (or every few iterations). Just as the matrices C0 and Cτ can be estimated using trajectory data, the orthogonalization procedure can also be implemented approximately using data.

Finally, in our experiments, we find it advantageous to damp large parameter fluctuations during training by mixing the operator A with a multiple of the identity, i.e., we perform our inexact subspace iteration on the operator (1αs)I+αsA in place of A itself. This new operator has the same eigenfunctions as A. In our experiments, decreasing the parameter αs as the number of iterations increases results in better generalization properties of the final solution and helps ensure convergence of the iteration. For our numerical experiments, we use
(21)
where σ is a user chosen hyperparameter that sets the number of iterations performed before damping begins.

The details, including estimators for all required inner products, in the case of the eigenproblem (A=Tτ) are given in Sec. VII and Algorithm 1. For the prediction problem with A as in (14), they are given in Sec. VIII and Algorithm 2.

ALGORITHM 1.

Inexact subspace iteration (with Lμ2 loss) for spectral estimation.

Require: Subspace dimension k, transition data {Xj0,Xjτ}j=1n, batch size B, learning rate η, number of subspace iterations S, number of inner iterations M, regularization parameters γ1 and γ2 
1: Initialize {φθa}a=1k and φ̃1aa=1k 
2: for s = 1 … S do 
3: for m = 1 … M do 
4:   Sample a batch of data {Xj0,Xjτ}j=1B 
5:   L̂11Bj=1Ba=1k12(b=1kφθb(Xj0)Kba)2b=1kφθb(Xj0)Kbaαsφ̃sa(Xjτ)+(1αs)φ̃sa(Xj0) 
6:   L̂Kγ1Kdiag(K)F2 
7:   L̂normγ2a=1k(2νa(1Bj=1Bφθa(Xj0)21)νa2) 
8:   L̂L̂1+L̂K+L̂norm 
9:   θθηθL̂ 
10:   KKηtriu(KL̂) 
11:   νaνa+ηνaL̂ 
12:  end for 
13:  Compute the matrix Φia=φθa(Xi0)ΦRn×k 
14:  Compute diagonal matrix Naa2=iφθa(Xi0)2 
15:  Compute QR-decomposition Φ = QRQRn×k and RRk×k 
16:  φ̃sab=1kφθb(R1N)ba 
17:  K1:i,iargminc1nj=1n(a=1iφθa(Xj0)caφ̃sa(Xjτ))2+γ2a=1i1ca2 
18: end for 
19: Compute the matrices Cabt=1nj=1nφ̃sa(Xj0)φ̃sb(Xjt) for t = 0, τCtRk×k 
20: Solve the generalized eigenproblem CτW = C0WΛ for W and Λ 
21: return eigenvalues Λ, eigenfunctions va=b=1kWabφ̃sb 
Require: Subspace dimension k, transition data {Xj0,Xjτ}j=1n, batch size B, learning rate η, number of subspace iterations S, number of inner iterations M, regularization parameters γ1 and γ2 
1: Initialize {φθa}a=1k and φ̃1aa=1k 
2: for s = 1 … S do 
3: for m = 1 … M do 
4:   Sample a batch of data {Xj0,Xjτ}j=1B 
5:   L̂11Bj=1Ba=1k12(b=1kφθb(Xj0)Kba)2b=1kφθb(Xj0)Kbaαsφ̃sa(Xjτ)+(1αs)φ̃sa(Xj0) 
6:   L̂Kγ1Kdiag(K)F2 
7:   L̂normγ2a=1k(2νa(1Bj=1Bφθa(Xj0)21)νa2) 
8:   L̂L̂1+L̂K+L̂norm 
9:   θθηθL̂ 
10:   KKηtriu(KL̂) 
11:   νaνa+ηνaL̂ 
12:  end for 
13:  Compute the matrix Φia=φθa(Xi0)ΦRn×k 
14:  Compute diagonal matrix Naa2=iφθa(Xi0)2 
15:  Compute QR-decomposition Φ = QRQRn×k and RRk×k 
16:  φ̃sab=1kφθb(R1N)ba 
17:  K1:i,iargminc1nj=1n(a=1iφθa(Xj0)caφ̃sa(Xjτ))2+γ2a=1i1ca2 
18: end for 
19: Compute the matrices Cabt=1nj=1nφ̃sa(Xj0)φ̃sb(Xjt) for t = 0, τCtRk×k 
20: Solve the generalized eigenproblem CτW = C0WΛ for W and Λ 
21: return eigenvalues Λ, eigenfunctions va=b=1kWabφ̃sb 
ALGORITHM 2.

Inexact subspace iteration (with Lμ2 loss) for prediction functions.

Require: Subspace dimension k, stopped transition data Xj0,XjτTjj=1n, reward data ρ(Xj)j=1n, batch size B, learning rate η, number of subspace iterations S, number of inner iterations M, regularization parameters γ1 and γ2 
1: Initialize {φθa}a=1k and φ̃1aa=1k 
2: for s = 1 … S do 
3: for m = 1 … M do 
4:   Sample a batch of data {Xj0,XjτTj}j=1B, ρ(Xj)j=1B 
5:   L̂11Bj=1Ba=1k12(b=1kφθb(Xj0)Kba)2αsb=1kφθbKba(φ̃sa(XjτTj)+ρ(Xj)Ia1) 
6:   L̂21Bj=1Ba=1k(1αs)b=1kφθbKbaφ̃sa(Xj0) 
7:   L̂Kγ1Kdiag(K)F2 
8:  L̂normγ2a=2k(2νa(1Bj=1Bφθa(Xj0)21)νa2) 
9:   L̂L̂1+L̂2+L̂K+L̂norm 
10:   θθηθL̂ 
11:   KKη(triu(KL̂)) 
12:   νaνa+ηνaL̂ 
13:  end for 
14:  if Ψ(x) = 0 then 
15:   Compute the matrix Φia=φθa(Xi0)ΦRn×k 
16:  else 
17:   Compute the matrix Φia=φθa(Xi0) for a > 1 ⊳ ΦRn×(k1) 
18:  end if 
19:  Compute QR-decomposition Φ = QR 
20:  Compute diagonal matrix Naa2=iφθa(Xi0)2 
21:  φ̃sab=1kφθb(R1N)ba ⊳ if Ψ(x) = 0 exclude a = 1 
22:  K1:i,iargminc1nj=1na=1iφθa(Xj0)caφ̃sa(XjτTj)2+γ2a=1i1ca2 
23: end for 
24: Compute the matrix Cabt=1nj=1nφθa(Xj0)φθb(Xjt) for t = 0, τTjCtRk×k 
25: Compute the vector pa=1nj=1nφθa(Xj0)ρ(Xj) 
26: Solve linear system (C0Cτ)w = p ⊳ if Ψ(x) = 0 enforce w1 = 1 
27: return u=a=1kwaφθa 
Require: Subspace dimension k, stopped transition data Xj0,XjτTjj=1n, reward data ρ(Xj)j=1n, batch size B, learning rate η, number of subspace iterations S, number of inner iterations M, regularization parameters γ1 and γ2 
1: Initialize {φθa}a=1k and φ̃1aa=1k 
2: for s = 1 … S do 
3: for m = 1 … M do 
4:   Sample a batch of data {Xj0,XjτTj}j=1B, ρ(Xj)j=1B 
5:   L̂11Bj=1Ba=1k12(b=1kφθb(Xj0)Kba)2αsb=1kφθbKba(φ̃sa(XjτTj)+ρ(Xj)Ia1) 
6:   L̂21Bj=1Ba=1k(1αs)b=1kφθbKbaφ̃sa(Xj0) 
7:   L̂Kγ1Kdiag(K)F2 
8:  L̂normγ2a=2k(2νa(1Bj=1Bφθa(Xj0)21)νa2) 
9:   L̂L̂1+L̂2+L̂K+L̂norm 
10:   θθηθL̂ 
11:   KKη(triu(KL̂)) 
12:   νaνa+ηνaL̂ 
13:  end for 
14:  if Ψ(x) = 0 then 
15:   Compute the matrix Φia=φθa(Xi0)ΦRn×k 
16:  else 
17:   Compute the matrix Φia=φθa(Xi0) for a > 1 ⊳ ΦRn×(k1) 
18:  end if 
19:  Compute QR-decomposition Φ = QR 
20:  Compute diagonal matrix Naa2=iφθa(Xi0)2 
21:  φ̃sab=1kφθb(R1N)ba ⊳ if Ψ(x) = 0 exclude a = 1 
22:  K1:i,iargminc1nj=1na=1iφθa(Xj0)caφ̃sa(XjτTj)2+γ2a=1i1ca2 
23: end for 
24: Compute the matrix Cabt=1nj=1nφθa(Xj0)φθb(Xjt) for t = 0, τTjCtRk×k 
25: Compute the vector pa=1nj=1nφθa(Xj0)ρ(Xj) 
26: Solve linear system (C0Cτ)w = p ⊳ if Ψ(x) = 0 enforce w1 = 1 
27: return u=a=1kwaφθa 

As mentioned above, the inexact application of the operator A can be accomplished by minimizing loss functions other than (17). The key requirement for a loss function in the present study is that A appears in its gradient only through terms of the form f,Agμ for some functions f and g, so that the gradient can be estimated using trajectory data. As a result, we have flexibility in the choice of loss and, in turn, the representation of u. In particular, we consider the representation uθ = ζ(zθ), where ζ is an increasing function, and zθ is a function parameterized by a neural network. An advantage of doing so is that the function ζ can restrict the output values of uθ to some range. For example, when computing a probability such as the committor, a natural choice is the sigmoid function ζ(x)=(1+ex)1.

Our goal is to train a sequence of parameter values so that uθs approximately follows a subspace iteration, i.e., so that ζ(zθs+1)Auθs. To this end, we minimize with respect to θ the loss function
(22)
where V is an antiderivative of ζ, and θs is fixed. The subscript X0μ in this expression indicates that X0 is drawn from μ. Note that, as desired, A appears in the gradient of (22) only in an inner product of the required form, and an approximate minimizer, θs+1, of this loss satisfies ζ(zθs+1)Auθs. This general form of loss function is adapted from variational expressions for the divergence of two probability measures.32,40

The Lμ2 loss in (17), which we use in several of our numerical experiments, corresponds to the choice ζ(x) = x and V(x) = x2/2. The choice of ζ(x)=(1+ex)1 mentioned above corresponds to V(x) = log(1 + ex); we refer to the loss in (22) with this choice of V as the “softplus” loss. We note that in the context of committor function estimation, in the limit of infinite τ the “softplus” loss corresponds directly to the maximum log-likelihood loss (for independent Bernoulli random variables) that defines the logistic regression estimate of the probability of reaching B before A. Logistic regression has previously been used in conjunction with datasets of trajectories integrated all the way until reaching A or B to estimate the committor function.41–46 

We illustrate our methods with two well-characterized systems that exemplify features of molecular transitions. In this section, we provide key details of these systems.

The first system is defined by the Müller–Brown potential47 (Fig. 1)
(23)
The two-dimensional nature of this model facilitates visualization. The presence of multiple minima and saddlepoints that are connected by a path that does not align with the coordinate axes makes the system challenging for both sampling and analysis methods. In Secs. VII A and VIII A, we use Ci = { − 200, −100, −170, 15}, ai = { − 1, −1, −6.5, 0.7}, bi = {0, 0, 11, 0.6}, ci = { − 10, −10, −6.5, 0.7}, yi = {1, −0.27, −0.5, −1}, and zi = {0, 0.5, 1.5, 1}. In Sec. VIII B, we tune the parameters to make transitions between minima rarer; there, the parameters are Ci = { − 250, −150, −170, 15}, ai = { − 1, −3, −6.5, 0.7}, bi = {0, 0, 11, 0.6}, ci = { − 10, −30, −6.5, 0.7}, yi = {1, −0.29, −0.5, −1}, and zi = {0, 0.5, 1.5, 1}. For prediction, we analyze transitions between the upper left minimum (A) and the lower right minimum (B) in Fig. 1; these states are defined as
(24)
FIG. 1.

Müller–Brown potential energy surface. The orange and red ovals indicate the locations of states A and B, respectively, when computing predictions. Contour lines are drawn every 1 β−1.

FIG. 1.

Müller–Brown potential energy surface. The orange and red ovals indicate the locations of states A and B, respectively, when computing predictions. Contour lines are drawn every 1 β−1.

Close modal
To generate a dataset, we randomly draw 50000 initial conditions Xj0 uniformly from the region
(25)
and, from each of these initial conditions, generate one trajectory according to the discretized overdamped Langevin dynamics
(26)
where 1 ≤ tτ, the ξjt are independent standard Gaussian random variables, and the timestep is δt = 0.001. We use an inverse temperature of β = 2, and we vary τ as indicated below (note that τ is an integer number of steps, but we present our results for the Müller–Brown model in terms of the dimensionless scale used for δt immediately above). For each test, we use independent random numbers and redraw the initial conditions because it is faster to generate the trajectories than to read them from disk in this case. All error bars are computed from the empirical standard deviation over ten replicate datasets.
To validate our results, we compare the neural-network results against grid-based references, computed as described in Sec. S4 of the supplementary material of Ref. 11 and the appendix of Ref. 48 (our notation here follows the former more closely). The essential idea is that the terms in the infinitesimal generator of the transition operator can be approximated on a grid to second order in the spacing by expanding both the probability for transitions between sites and a test function. To this end, we define the transition matrix for neighboring sites x = (y, z) and x′ = (y ± δx, z) or (y, z ± δx) on a grid with spacings δx,
(27)
(this definition differs from that in Ref. 11 by a factor of 1/4, and we scale PI, where I is the identity matrix, accordingly to set the time units below). The diagonal entry P(x|x) is fixed to make the transition matrix row-stochastic. We use δx = 0.0125; the grid is a rectangle with y ∈ [−1.5, 1.0], and z ∈ [−0.5, 2.0]. The reference invariant subspaces are computed by diagonalizing the matrix 2(PI)/βδx2 with a sparse eigensolver; we use scipy.sparse.linalg. We obtain the reference committor q̂+ for grid sites in (AB)c by solving
(28)
and for grid sites in AB by setting q̂+=1̂B. Here, q̂+ and 1̂B are vectors corresponding to the functions evaluated at the grid sites.

The second system is a peptide of nine α-aminoisobutyric acids (AIB9; Fig. 2). Because AIB is achiral around its α-carbon atom, AIB9 can form left- and right-handed helices with equal probabilities, and we study the transitions between these two states. This transition was previously studied using MSMs and long unbiased molecular dynamics simulations.49,50 AIB9 poses a stringent test due to the presence of many metastable intermediate states.

FIG. 2.

Helix-to-helix transition of AIB9. (Left) Left- and right-handed helices, which we use as states A and B, respectively, when computing predictions. Carbon, nitrogen, and oxygen atoms are shown in yellow, blue, and red, respectively; hydrogen atoms are omitted. (Right) Potential of mean force constructed from the histogram of value pairs of the first two dihedral angle principal components; data are from the 20 trajectories of 5 µs that we use to construct reference statistics (see text). The left-handed helix corresponds to the left-most basin, and the right-handed helix corresponds to the right-most basin. Contour lines are drawn every 2 β−1 corresponding to a temperature of 300 K.

FIG. 2.

Helix-to-helix transition of AIB9. (Left) Left- and right-handed helices, which we use as states A and B, respectively, when computing predictions. Carbon, nitrogen, and oxygen atoms are shown in yellow, blue, and red, respectively; hydrogen atoms are omitted. (Right) Potential of mean force constructed from the histogram of value pairs of the first two dihedral angle principal components; data are from the 20 trajectories of 5 µs that we use to construct reference statistics (see text). The left-handed helix corresponds to the left-most basin, and the right-handed helix corresponds to the right-most basin. Contour lines are drawn every 2 β−1 corresponding to a temperature of 300 K.

Close modal
The states are defined in terms of the internal ϕ and ψ dihedral angles. We classify an amino acid as being in the “left” state if its dihedral angle values are within a circle of radius 25° centered at (41°, 43°), that is,
Amino acids are classified as being in the “right” state using the same radius, but centered instead at (−41°, −43°). States A and B are defined by the amino acids at sequence positions 3–7 being all left or all right, respectively. We do not use the two residues on each end of AIB9 in defining the states as these are typically more flexible.50 The states can be resolved by projecting onto dihedral angle principal components (dPCs; Fig. 2, right) as described previously.51 

Following a procedure similar to that described in Ref. 50, we generate a dataset of short trajectories. From each of the 691 starting configurations in Ref. 50, we simulate ten trajectories of duration 20 ns with initial velocities drawn randomly from a Maxwell–Boltzmann distribution at a temperature of 300 K. The short trajectory dataset thus contains 6910 trajectories, corresponding to a total sampling time of 138.2 µs. We use a timestep of 4 fs together with a hydrogen mass repartitioning scheme,52 and configurations are saved every 40 ps. We employ the AIB parameters from Forcefield_NCAA53 and the GBNeck2 implicit-solvent model.54 Simulations are performed with the Langevin integrator in OpenMM 7.7.055 using a friction coefficient of 1 ps−1. To generate a reference for comparison to our results, we randomly select 20 configurations from the dataset above and, from each, run a simulation of 5 µs with the same simulation parameters. For all following tests on AIB9, batches consist of pairs of frames separated by τ drawn randomly with replacement from the short trajectories (i.e., from all possible such pairs in the dataset). From each frame, we use only the atom positions because the momenta decorrelate within a few picoseconds, which is much shorter than the lag times that we consider. However, in principle, the momenta could impact the prediction functions56 and be used as neural-network inputs as well.

In this section, we provide some further numerical details for the application of our method to spectral estimation and demonstrate the method on the test problems. For our subspace iteration with A=Tτ, we require estimators for inner products of the form f,Tτgμ. For example, the gradient of the loss function (17) involves inner products of the form
(29)
For these, we use the unbiased data-driven estimator
(30)
As discussed in Sec. IV, applying the operator Tτ repeatedly causes each basis function to converge to the dominant eigenfunction and leads to numerical instabilities. To avoid this, we orthogonalize the outputs of the networks with a QR decomposition at the end of each subspace iteration by constructing the matrix Φia=φθa(Xi0) and then computing the factorization Φ = QR, where Q is an n × k matrix with orthogonal columns and R is an upper triangular k × k matrix. Finally, we set φ̃sa=b=1kφθb(R1N)ba, where N is a diagonal matrix with entries equal to the norms of the columns of Φ (before orthogonalization). To ensure that the networks remain well-separated (i.e., the eigenvalues of C0 remain away from zero), we penalize large off-diagonal entries of K by adding to the loss
(31)
where γ1 allows us to tune the strength of this term relative to others, and ‖·‖F is the Frobenius norm. We control the scale of each network output using the strategy from Ref. 32; that is, we add to the loss a term of the form
(32)
where we have introduced the conjugate variables νa, which we maximize with gradient ascent (or similar optimization). In general, our numerical experiments suggest that it is best to keep γ1 and γ2 relatively small. We find that stability of the algorithm over many subspace iterations is improved if the matrix K is set at its optimal value before each inner loop. To do this, we set
(33)
The above minimization can be solved with linear least squares. Finally, we note that, in practice, any optimizer can be used for the inner iteration steps, though the algorithm below implements stochastic gradient descent. In this work, we use Adam57 for all numerical tests. We summarize our procedure for spectral estimation in Algorithm 1.

As a first test of our method, we compute the k = 3 dominant eigenpairs for the Müller–Brown model. Since we know that the dominant eigenfunction of the transition operator is the constant function v1(y, z) = 1 with eigenvalue λ1 = 1, we directly include this function in the basis as a non-trainable function, i.e., φθ1(y,z)=1. To initialize φ̃1a for each a > 1, we choose a standard Gaussian vector (Ya,Za)R2, and set φ̃1a(y,z)=yYa+zZa. This ensures that the initial basis vectors are well-separated and the first QR step is numerically stable. Here and in all subsequent Müller–Brown tests, batches of trajectories are drawn from the entire dataset with replacement. Other hyperparameters are listed in Table I.

TABLE I.

Parameter choices used in this work.

Spectral estimationCommittorMFPT
HyperparameterMüller-BrownAIB9Müller-BrownModified Müller-BrownAIB9AIB9
Subspace dimension k 2, 1a 
Input dimensionality 174 174 174 
Hidden layers 
Hidden layer width 64 128 64 64 150 150 
Hidden layer nonlinearity CeLU CeLU ReLU ReLU ReLU ReLU 
Output layer nonlinearity None tanh Sigmoid/none None None None 
Outer iterations S 10 100 100 4 + 10a 100 300 
Inner iterations M 5000 2000 200 5000 2000 1000 
σ 50 50 50 
Batch size B 2000 1024 5000 2000 1024 2000 
Learning rate η 0.001 0.0001 0.001 0.001 0.001 0.001 
γ1 0.15 0.001 ⋯ ⋯ ⋯ 0.1 
γ2 0.01 0.01 ⋯ ⋯ ⋯ 10 
Loss for φθ1 Lμ2 Lμ2 Lμ2/softplus Softplus Softplus Lμ2 
Loss for φθa for a > 1 Lμ2 Lμ2 ⋯ Lμ2 ⋯ Lμ2 
Spectral estimationCommittorMFPT
HyperparameterMüller-BrownAIB9Müller-BrownModified Müller-BrownAIB9AIB9
Subspace dimension k 2, 1a 
Input dimensionality 174 174 174 
Hidden layers 
Hidden layer width 64 128 64 64 150 150 
Hidden layer nonlinearity CeLU CeLU ReLU ReLU ReLU ReLU 
Output layer nonlinearity None tanh Sigmoid/none None None None 
Outer iterations S 10 100 100 4 + 10a 100 300 
Inner iterations M 5000 2000 200 5000 2000 1000 
σ 50 50 50 
Batch size B 2000 1024 5000 2000 1024 2000 
Learning rate η 0.001 0.0001 0.001 0.001 0.001 0.001 
γ1 0.15 0.001 ⋯ ⋯ ⋯ 0.1 
γ2 0.01 0.01 ⋯ ⋯ ⋯ 10 
Loss for φθ1 Lμ2 Lμ2 Lμ2/softplus Softplus Softplus Lμ2 
Loss for φθa for a > 1 Lμ2 Lμ2 ⋯ Lμ2 ⋯ Lμ2 
a

Four subspace iterations with k = 2 followed by ten iterations with k = 1.

Figure 3 shows that we obtain good agreement between the estimate produced by the inexact subspace iteration in Algorithm 1 and reference eigenfunctions. Figure 4 (upper panels) shows how the corresponding eigenvalues vary with lag time; again there is good agreement with the reference. Furthermore, there is a significant gap between λ3 and λ4, indicating that a three-dimensional subspace captures the dynamics of interest for this system.

FIG. 3.

First two non-trivial eigenfunctions of the Müller–Brown model. (Top) Grid-based reference. (Bottom) Neural network subspace after ten subspace iteration steps, computed with τ = 300 steps (i.e., 0.3δt1).

FIG. 3.

First two non-trivial eigenfunctions of the Müller–Brown model. (Top) Grid-based reference. (Bottom) Neural network subspace after ten subspace iteration steps, computed with τ = 300 steps (i.e., 0.3δt1).

Close modal
FIG. 4.

Spectral estimation as a function of lag time (in units of δt1) for the Müller–Brown model. (Top left) Second eigenvalue. (Top right) Third and fourth eigenvalues; only the reference fourth eigenvalue is shown to illustrate the spectral gap. (Bottom left) Relative error in the first spectral gap (i.e., 1 − λ2). (Bottom right) Subspace distance between estimated and reference three-dimensional invariant subspaces.

FIG. 4.

Spectral estimation as a function of lag time (in units of δt1) for the Müller–Brown model. (Top left) Second eigenvalue. (Top right) Third and fourth eigenvalues; only the reference fourth eigenvalue is shown to illustrate the spectral gap. (Bottom left) Relative error in the first spectral gap (i.e., 1 − λ2). (Bottom right) Subspace distance between estimated and reference three-dimensional invariant subspaces.

Close modal
We compare the subspace that we obtain from our method with that from an MSM constructed from the same amount of data by using k-means to cluster the configurations into 400 states and counting the transitions between clusters. This is a very fine discretization for this system, and the MSM is sufficiently expressive to yield eigenfunctions in good agreement with the reference. The relative error of 1 − λ2 is comparable for the two methods (Fig. 4, lower left). To compare two finite dimensional subspaces, U and V, we define the subspace distance as4
(34)
where PU and PV denote the orthogonal projections onto U and V, respectively, and ‖·‖F is the Frobenius norm. Figure 4 (lower right) shows the subspace distances from the reference as functions of lag time. We see that the inexact subspace iteration better approximates the three-dimensional dominant eigenspace for moderate to long lag times, even though the eigenvalues are comparable.

For the molecular test system, we compute the dominant five-dimensional subspace. We compare the inexact subspace iteration in Algorithm 1 with MSMs constructed on dihedral angles (“dihedral MSM”) and on Cartesian coordinates (“Cartesian MSM”). We expect the dihedral MSM to be accurate given that the dynamics of AIB9 are well-described by the backbone dihedral angles,49,50 and we thus use it as a reference. It is constructed by clustering the sine and cosine of each of the backbone dihedral angles (ϕ and ψ) for the nine residues (for a total of 2 × 2 × 9 = 36 input features) into 1000 clusters using k-means and counting transitions between clusters. The Cartesian MSM is constructed by similarly counting transitions between 1000 clusters from the k-means algorithm, but the clusters are based on the Cartesian coordinates of all non-hydrogen atoms after aligning the backbone atoms of the trajectories, for a total of 174 input features. Because of the difficulty of clustering high-dimensional data, we expect the Cartesian MSM basis to give poor results. The neural network for the inexact subspace iteration receives the same 174 Cartesian coordinates as input features. We choose to use Cartesian coordinates rather than dihedral angles as inputs because it requires the network to identify nontrivial representations for describing the dynamics.

As in the Müller–Brown example, we use φθ1=1 and a random linear combination of coordinate functions to initialize φ̃1a for a > 1. Other hyperparameters are listed in Table I. With these choices, the neural-network training typically requires about 20 minutes on a single NVIDIA A40 GPU; this is much longer than the time required for diagonalization of the 1000 × 1000 MSM transition matrix, which is nearly instantaneous. However, the time for neural-network training is negligible compared with the time to generate the dataset, which is the same for both approaches.

Taking the dihedral MSM as a reference, the Cartesian MSM systematically underestimates the eigenvalues (Fig. 5). The subspace iteration is very accurate for the first four eigenvalues, but the estimates for the fifth are low and vary considerably from run to run. A very small gap between λ4 and λ5 may contribute to the difficulty in estimating λ5. In Fig. 6, we plot the first two non-trivial eigenfunctions (v2 and v3), which align with the axes of the dPC projection. The eigenfunction v2 corresponds to the transition between the left- and right-handed helices; the eigenfunction v3 is nearly orthogonal to v2 and corresponds to transitions between intermediate states. It is challenging to visualize the remaining two eigenfunctions by projecting onto the first two dPCs because v4 and v5 are orthogonal to v2 and v3. The estimates for v2 are in qualitative agreement for all lag times tested (Fig. 6 shows results for τ corresponding to 40 ps), but the subspace iteration results are less noisy for the shortest lag times. Moreover, the estimate for v3 from subspace iteration agrees more closely with that from the dihedral MSM than does the estimate for v3 from the Cartesian MSM. The subspace distance for v2 and v3 between the subspace iteration and the dihedral MSM is 0.947, compared with a value of 0.969 for the subspace distance between the two MSMs. Together, our results indicate that the neural networks are able to learn the leading eigenfunctions and eigenvalues of the transition operator (dynamical modes) of this system despite being presented with coordinates that are not the natural ones for describing the dynamics.

FIG. 5.

First five eigenvalues of the transition operator for AIB9 as a function of lag time. (Left) Comparison between eigenvalues computed using the dihedral MSM with 1000 clusters (solid lines) and the inexact subspace iteration (dashed lines). The shading indicates standard deviations over five trained networks for the subspace iteration. (Right) Comparison between a dihedral MSM (solid lines) and Cartesian MSMs with 1000 clusters (dashed lines). The standard deviations for the Cartesian MSMs over five random seeds for k-means clustering are too narrow to be seen.

FIG. 5.

First five eigenvalues of the transition operator for AIB9 as a function of lag time. (Left) Comparison between eigenvalues computed using the dihedral MSM with 1000 clusters (solid lines) and the inexact subspace iteration (dashed lines). The shading indicates standard deviations over five trained networks for the subspace iteration. (Right) Comparison between a dihedral MSM (solid lines) and Cartesian MSMs with 1000 clusters (dashed lines). The standard deviations for the Cartesian MSMs over five random seeds for k-means clustering are too narrow to be seen.

Close modal
FIG. 6.

First two non-trivial eigenfunctions of AIB9 projected onto the first two dPCs (i.e., averaged for bins in the two-dimensional space shown). (Top) MSM constructed on sine and cosine of dihedral angles with 1000 clusters and lag time corresponding to 40 ps. (Middle) Inexact subspace iteration using Cartesian coordinates and the same lag time. (Bottom) MSM constructed on Cartesian coordinates with 1000 clusters and the same lag time.

FIG. 6.

First two non-trivial eigenfunctions of AIB9 projected onto the first two dPCs (i.e., averaged for bins in the two-dimensional space shown). (Top) MSM constructed on sine and cosine of dihedral angles with 1000 clusters and lag time corresponding to 40 ps. (Middle) Inexact subspace iteration using Cartesian coordinates and the same lag time. (Bottom) MSM constructed on Cartesian coordinates with 1000 clusters and the same lag time.

Close modal
Inexact subspace iteration for A in (14) is equivalent to performing the inexact Richardson iteration in (12) on the first basis function φθ1 and then performing an inexact subspace iteration for the operator Sτ on the rest of the basis functions. The iteration requires unbiased estimators of the forms
(35)
and
(36)
where Tj is the first time Xjt enters Dc and r is the right-hand side of (5), as previously.

The Richardson iterate, φθ1, must satisfy the boundary condition φθ1(x)=Ψ(x) for xD. The other basis functions should satisfy φθa(x)=0 for xD. In practice, we enforce the boundary conditions by explicitly setting φθ1(x)=Ψ(x) and φθa(x)=0 for a > 1 when xD.

When the boundary condition is zero, as for the MFPT, we find an approximate solution of the form
(37)
by solving the k-dimensional linear system
(38)
where, for a, b ≥ 1,
(39)
for t = 0, τ, and
(40)
In (40), we introduce the notation
(41)
for use in Algorithm 2.
When the boundary condition is non-zero, as for the committor, we restrict (38) to a (k − 1)-dimensional linear system by excluding the indices a = 1 and b = 1 in (39) and (40) and setting
(42)
In this case, the corresponding approximate solution is
(43)
This approximate solution corresponds to the one given by dynamical Galerkin approximation10,11 with the basis {φθa}a=2k and a “guess” function of φθ1.

When the boundary conditions are zero, the orthogonalization procedure and the matrix K are applied to all basis functions as in Sec. VII. When the boundary conditions are non-zero, the orthogonalization procedure is only applied to the basis functions {φθa}a=2k, and Ka1 = Ia1 the a1 element of the identity matrix. We summarize our procedure for prediction in Algorithm 2.

In this section, we demonstrate the use of our method for prediction by estimating the committor for the Müller–Brown model with a shallow intermediate basin at (−0.25, 0.5) (Fig. 1). Here, the sets A and B are defined as in Eq. (24) and T is the time of first entrance to Dc=AB. In this case, a one-dimensional subspace iteration (i.e., k = 1 in Algorithm 2) appears sufficient to accurately solve the prediction problem. Figure 7 shows the largest eigenvalue of the stopped transition operator Sτ [the second largest of A in (14)] computed from our grid-based reference scheme (Sec. VI A). Richardson iteration should converge geometrically in this eigenvalue,33 and so, for moderate lag times, we can expect our method to converge in a few dozen iterations. To initialize the algorithm we choose φ̃11=1B. All other hyperparameters are listed in Table I.

FIG. 7.

First eigenvalue of Sτ [second of A in (14)] for the Müller–Brown model as a function of lag time (in units of δt1). The gap between this eigenvalue and the dominant eigenvalue, which is one, determines the rate of convergence of the Richardson iteration.

FIG. 7.

First eigenvalue of Sτ [second of A in (14)] for the Müller–Brown model as a function of lag time (in units of δt1). The gap between this eigenvalue and the dominant eigenvalue, which is one, determines the rate of convergence of the Richardson iteration.

Close modal
We compare the estimate of the committor from our approach with that from an MSM constructed from the same amount of data by using k-means to cluster the configurations outside A and B into 400 states and counting the transitions between clusters. In addition to the root mean square error (RMSE) for the committor itself, we show the RMSE of
(44)
for points outside A and B. This function amplifies the importance of values close to zero and one. We include ɛ because we want to assign only a finite penalty if the procedure estimates q to be exactly zero or one; we use ɛ = e−20.

Results as a function of lag time are shown in Fig. 8. We see that the Richardson iterate is more accurate than the MSM for all but the shortest lag times. When using the Lμ2 loss, the results are comparable, whereas the softplus loss allows the Richardson iterate to improve the RMSE of the logit function in (44) with no decrease in performance with respect to the RMSE of the committor. Results as a function of the size of the dataset are shown in Fig. 9 for a fixed lag time of τ=0.1δt1. The Richardson iterate generally does as well or better than the MSM. Again, the differences are more apparent in the RMSE of the logit function in (44). By that measure, the Richardson iterate obtained with both loss functions is significantly more accurate than the MSM for small numbers of trajectories. The softplus loss maintains an advantage even for large numbers of trajectories.

FIG. 8.

Committor prediction for the Müller–Brown system as a function of lag time (in units of δt1). (Left) Comparison of the inexact Richardson iteration using the Lμ2 loss and an MSM with 400 states. (Right) Same comparison using the softplus loss in place of the Lμ2 loss.

FIG. 8.

Committor prediction for the Müller–Brown system as a function of lag time (in units of δt1). (Left) Comparison of the inexact Richardson iteration using the Lμ2 loss and an MSM with 400 states. (Right) Same comparison using the softplus loss in place of the Lμ2 loss.

Close modal
FIG. 9.

Committor prediction for the Müller–Brown as a function of number of initial conditions for a fixed lag time of τ=0.1δt1. (Left) Comparison of inexact Richardson iteration using the Lμ2 loss and an MSM with 400 states. (Right) Same comparison using the softplus loss in place of the Lμ2 loss.

FIG. 9.

Committor prediction for the Müller–Brown as a function of number of initial conditions for a fixed lag time of τ=0.1δt1. (Left) Comparison of inexact Richardson iteration using the Lμ2 loss and an MSM with 400 states. (Right) Same comparison using the softplus loss in place of the Lμ2 loss.

Close modal

As discussed in Sec. III, we expect Richardson iteration to converge slowly when the largest eigenvalue of Sτ, λ1, is close to 1. More precisely, the number of iterations required to reach convergence should scale with 1/logλ1=ET/τ, the mean escape time from the quasi-stationary distribution to the boundary of D divided by the lag time. With this in mind, we can expect inexact Richardson iteration for the Müller–Brown to perform poorly if we deepen the intermediate basin at (−0.25, 0.5) as in Fig. 10 (top left). Again, the sets A and B are defined as in (24), and T is the time of first entrance to Dc=AB. In this case, −1/log  λ1 is on the order of 100 for the lag times we consider and, as expected, inexact Richardson iteration converges slowly (Fig. 10, bottom left). Estimates of the committor by inexact Richardson iteration do not reach the correct values even after hundreds of iterations (Fig. 10, bottom right).

FIG. 10.

Richardson iteration for the committor converges slowly for a Müller–Brown potential with a deepened intermediate. (Top left) Potential energy surface, with states A and B indicated. Contour lines are drawn every 1 β−1. (Top right) Reference committor. (Bottom left) Dominant eigenvalue as a function of lag time (in units of δt1) from an MSM with 400 states, subspace iteration, and the grid-based reference. (Bottom right) Example of the Richardson iteration after 400 iterations. Note the overfitting artifacts and lack of convergence near the intermediate state.

FIG. 10.

Richardson iteration for the committor converges slowly for a Müller–Brown potential with a deepened intermediate. (Top left) Potential energy surface, with states A and B indicated. Contour lines are drawn every 1 β−1. (Top right) Reference committor. (Bottom left) Dominant eigenvalue as a function of lag time (in units of δt1) from an MSM with 400 states, subspace iteration, and the grid-based reference. (Bottom right) Example of the Richardson iteration after 400 iterations. Note the overfitting artifacts and lack of convergence near the intermediate state.

Close modal

We now show that convergence can be accelerated dramatically by incorporating additional eigenfunctions of Sτ (i.e., k > 1 in Algorithm 2). For the Müller–Brown model with a deepened intermediate basin, the second eigenvalue of Sτ is of order 10−4 for a lag time of τ = 1000 steps or 1δt1 (while the first is near one as discussed above). We therefore choose k = 2 with φ̃12 initialized as a random linear combination of coordinate functions as in previous examples. We run the subspace iteration for four iterations, compute the committor as a linear combination of the resulting functions, and then refine this result with a further ten Richardson iterations (i.e., k = 1 with the starting vector as the output of the k = 2 subspace iteration). To combine the functions, we use a linear solve which incorporates memory (Algorithm 3).58,59 We find that the use of memory improves the data-efficiency substantially for poorly conditioned problems. For our tests here, we use three memory kernels, corresponding to τmem = ⌊τ/4⌋.

ALGORITHM 3.

Memory-corrected linear solve for predictions.

Require: Stopped transition data Xj0,Xj1Tj,,XjτTjj=1n, guess function h, reward data ρ(Xj)j=1n, basis set {fa}a=1k, lag between memory kernels τmem
1: for s = 0 … (τ/τmem) do 
2:  Initialize the matrix Cs with zeros ⊳ CsR(k+1)×(k+1) 
3:  C11s1 
4:  for a = 2 … k do 
5:   Ca1s1nj=1nfa(Xj0)ρ(XjsτmemTj) 
6:   for b = 2 … k do 
7:    Cabs1nj=1nfa(Xj0)fb(XjsτmemTj) 
8:   end for 
9:  end for 
10: end for 
11: AC1 – C0 
12: for s = 0 … (τ/τmem) − 2 do 
13:  MsCs+2Cs+1A(C0)1Cs+1j=0sMj(C0)1Csj 
14: end for 
15: AmemA+s=0(τ/τmem)2Ms 
16: Solve Amemw = w 
17: return u=h+a=2kwafa 
Require: Stopped transition data Xj0,Xj1Tj,,XjτTjj=1n, guess function h, reward data ρ(Xj)j=1n, basis set {fa}a=1k, lag between memory kernels τmem
1: for s = 0 … (τ/τmem) do 
2:  Initialize the matrix Cs with zeros ⊳ CsR(k+1)×(k+1) 
3:  C11s1 
4:  for a = 2 … k do 
5:   Ca1s1nj=1nfa(Xj0)ρ(XjsτmemTj) 
6:   for b = 2 … k do 
7:    Cabs1nj=1nfa(Xj0)fb(XjsτmemTj) 
8:   end for 
9:  end for 
10: end for 
11: AC1 – C0 
12: for s = 0 … (τ/τmem) − 2 do 
13:  MsCs+2Cs+1A(C0)1Cs+1j=0sMj(C0)1Csj 
14: end for 
15: AmemA+s=0(τ/τmem)2Ms 
16: Solve Amemw = w 
17: return u=h+a=2kwafa 

The bottom row of Fig. 11 illustrates the idea of the subspace iteration. The second eigenfunction (Fig. 11, center) is peaked at the intermediate. As a result, the two neural-network functions linearly combined by the Galerkin approach with memory can yield a good result for the committor (Fig. 11, bottom right). Figure 12 compares the RMSE for the committor and the RMSE for the logit in (44) for Algorithm 2 with k = 1 (pure Richardson iteration) and k = 2 (incorporating the first non-trivial eigenfunction), and an MSM with 400 states. We see that the Richardson iteration suffers large errors at all lag times; as noted previously, this error is mainly in the vicinity of the intermediate. The MSM cannot accurately compute the small probabilities, but does as well as the subspace iteration in terms of RMSE.

FIG. 11.

Illustration of the subspace iteration for the Müller–Brown committor. (Top left) Modified Müller–Brown potential. (Top center) Reference second eigenfunction. (Top right) Reference committor. (Bottom left) Neural-network Richardson iterate after four iterations. (Bottom center) First non-dominant eigenfunction obtained from the neural network after four iterations. (Bottom right) Committor resulting from linear combination of the Richardson iterate and second eigenfunction. Results shown are for τ = 1000 steps (i.e., 1δt1).

FIG. 11.

Illustration of the subspace iteration for the Müller–Brown committor. (Top left) Modified Müller–Brown potential. (Top center) Reference second eigenfunction. (Top right) Reference committor. (Bottom left) Neural-network Richardson iterate after four iterations. (Bottom center) First non-dominant eigenfunction obtained from the neural network after four iterations. (Bottom right) Committor resulting from linear combination of the Richardson iterate and second eigenfunction. Results shown are for τ = 1000 steps (i.e., 1δt1).

Close modal
FIG. 12.

Committor for the Müller–Brown potential with deepened intermediate as a function of lag time (in units of δt1). (Left) Comparison of RMSE for subspace iteration as described above, Richardson iteration (as in Sec. VIII A but instead with 500 subspace iterations), and an MSM with 400 states. (Right) RMSE of the logit function in (44).

FIG. 12.

Committor for the Müller–Brown potential with deepened intermediate as a function of lag time (in units of δt1). (Left) Comparison of RMSE for subspace iteration as described above, Richardson iteration (as in Sec. VIII A but instead with 500 subspace iterations), and an MSM with 400 states. (Right) RMSE of the logit function in (44).

Close modal

As an example of prediction in a high-dimensional system, we compute the committor for the transition between the left- and right-handed helices of AIB9 using the inexact Richardson iteration scheme (k = 1 in Algorithm 2) with the softplus loss function. Specifically, for this committor calculation, T is the time of first entrance to Dc=AB with A and B defined in Sec. VI B. As before, we initialize φ̃11=1B.

To validate our results, we use the 5 µs reference trajectories to compute an empirical committor as a function of the neural network outputs, binned into intervals
(45)
for s ∈ [0, 1 − Δs]. Here, we use Δs = 0.05. The overall error in the committor estimate is defined as
(46)
While this measure of error can only be used when the dataset contains trajectories of long enough duration to reach Dc, it has the advantage that it does not depend on the choice of projection that we use to visualize the results.

Results for the full dataset with τ corresponding to 400 ps are shown in Fig. 13. The projection on the principal components is consistent with the symmetry of the system, and the predictions show good agreement with the empirical committors. As τ decreases, the results become less accurate (Fig. 14, top left); at shorter lag times we would expect further increases in the error. We also examine the dependence of the results on the size of the dataset by subsampling the short trajectories and then training neural networks on the reduced set of trajectories (Fig. 14, top right). We find that the performance steadily drops as the number of trajectories is reduced and degrades rapidly for the datasets subsampled more than 20-fold (Fig. 14, bottom right), corresponding to about 7 µs of total sampling.

FIG. 13.

AIB9 committor for the transition between left- and right-handed helices. (Left) Averages of 1B(XT) for initial conditions in bins in the first two dPCs computed from 20 long (5 µs) trajectories. (Middle) Averages of representative neural-network committors trained on the dataset of 6910 short (20 ns) trajectories; τ corresponds to 400 ps. (Right) Comparison between empirical committors [as defined in (45)] and the neural-network committors (trained as for the middle panel). Error bars indicate standard deviations over ten different initializations of the neural-network parameters.

FIG. 13.

AIB9 committor for the transition between left- and right-handed helices. (Left) Averages of 1B(XT) for initial conditions in bins in the first two dPCs computed from 20 long (5 µs) trajectories. (Middle) Averages of representative neural-network committors trained on the dataset of 6910 short (20 ns) trajectories; τ corresponds to 400 ps. (Right) Comparison between empirical committors [as defined in (45)] and the neural-network committors (trained as for the middle panel). Error bars indicate standard deviations over ten different initializations of the neural-network parameters.

Close modal
FIG. 14.

AIB9 committor for the transition between left- and right-handed helices, as functions of lag time (in ps) and number of initial conditions. (Top left) Error in the committor as a function of lag time (in ps). Shading indicates the standard deviation over ten different initializations of the neural-network parameters. (Top right) Error in the committor as a function of the number of initial conditions with τ corresponding to 160 ps. Shading indicates the standard deviation over ten different random samples of the trajectories. (Bottom) Comparison between empirical committors and neural-network committors trained on datasets with (left) 1/2 and (right) 1/20 of the short trajectories. Error bars indicate standard deviations over ten random samples of the trajectories.

FIG. 14.

AIB9 committor for the transition between left- and right-handed helices, as functions of lag time (in ps) and number of initial conditions. (Top left) Error in the committor as a function of lag time (in ps). Shading indicates the standard deviation over ten different initializations of the neural-network parameters. (Top right) Error in the committor as a function of the number of initial conditions with τ corresponding to 160 ps. Shading indicates the standard deviation over ten different random samples of the trajectories. (Bottom) Comparison between empirical committors and neural-network committors trained on datasets with (left) 1/2 and (right) 1/20 of the short trajectories. Error bars indicate standard deviations over ten random samples of the trajectories.

Close modal

Finally, we compute the MFPT to reach the right-handed helix using the same dataset. For the MFPT calculation T is the time of first entrance to Dc=B. Note that the time of first entrance to B includes long dwell times in A and is expected to be much larger than the time of first entrance to AB.

We compare against an empirical estimate of the MFPT defined by
(47)
for s ∈ [0, mmax − Δs], where Δs = 3 and mmax = 57 ns. Overall error is defined analogously to Eq. (46).

In Fig. 15, we show the MFPT obtained from Algorithm 2 with k = 5 and the Lμ2 loss function. Initially, we set φ̃11 equal to an arbitrary positive function (we use 51A) and φ̃sa for a > 1 to a random linear combination of coordinate functions. In Fig. 16, we examine the convergence of the MFPT from the left-handed helix to the right-handed helix for the MFPT computed with k = 1 (pure Richardson iteration) and k = 5. To obtain the results shown, we train neural networks on the short-trajectory dataset and then average their predictions for structures in the left-handed helix state in the long reference trajectories. The horizontal line in Fig. 16 indicates a MFPT of about 56 ns estimated from the long reference trajectories. We see that the algorithm with k = 5 converges much faster (note the differing scales of the horizontal axes) and yields accurate results at all lag times other than the shortest shown. The need to choose k > 1 for this MFPT calculation is again consistent with theoretical convergence behavior of exact subspace iteration. Because the typical time of first entrance to B from points in A is very large, we expect the dominant eigenvalue of Sτ to be very near to one when D=Bc. In contrast, the committor calculation benefits from the fact that the time of first entrance to AB is much shorter, implying a smaller dominant eigenvalue of Sτ when D=(AB)c.

FIG. 15.

AIB9 MFPT to the right-handed helix. (Left) Averages of the time to next reach B for initial conditions in bins in the first two dPCs computed from 20 long (5 µs) trajectories. (Middle) Averages of representative neural-network committors trained on the dataset of 6910 short (20 ns) trajectories; τ corresponds to 400 ps. (Right) Comparison between empirical committors [as defined in (47)] and the neural-network committors (trained as for the middle panel). Error bars indicate standard deviations over ten different initializations of the neural-network parameters.

FIG. 15.

AIB9 MFPT to the right-handed helix. (Left) Averages of the time to next reach B for initial conditions in bins in the first two dPCs computed from 20 long (5 µs) trajectories. (Middle) Averages of representative neural-network committors trained on the dataset of 6910 short (20 ns) trajectories; τ corresponds to 400 ps. (Right) Comparison between empirical committors [as defined in (47)] and the neural-network committors (trained as for the middle panel). Error bars indicate standard deviations over ten different initializations of the neural-network parameters.

Close modal
FIG. 16.

MFPT between left- and right-handed helices for the AIB9 system. (Top left) Convergence of Richardson iteration. (Top right) Convergence of a five-dimensional subspace iteration. (Bottom left) MFPT after 100 and 200 subspace iterations as a function of lag time. Shading indicates standard deviations over ten different initializations of the neural-network parameters. (Bottom right) Overall error in MFPT. To obtain the results shown in this figure, we first use the short-trajectory dataset to train neural networks to predict the MFPT; we then use these networks with fixed parameters to evaluate the MFPT for all structures in the long reference trajectories and average the results for those structures in the left-handed helix state. The horizontal lines in the top panels are obtained from averaging the time to the right-handed helix for the same structures.

FIG. 16.

MFPT between left- and right-handed helices for the AIB9 system. (Top left) Convergence of Richardson iteration. (Top right) Convergence of a five-dimensional subspace iteration. (Bottom left) MFPT after 100 and 200 subspace iterations as a function of lag time. Shading indicates standard deviations over ten different initializations of the neural-network parameters. (Bottom right) Overall error in MFPT. To obtain the results shown in this figure, we first use the short-trajectory dataset to train neural networks to predict the MFPT; we then use these networks with fixed parameters to evaluate the MFPT for all structures in the long reference trajectories and average the results for those structures in the left-handed helix state. The horizontal lines in the top panels are obtained from averaging the time to the right-handed helix for the same structures.

Close modal

In this work, we have presented a method for spectral estimation and rare-event prediction from short-trajectory data. The key idea is that we use the data as the basis for an inexact subspace iteration. For the test systems that we considered, the method not only outperforms high-resolution MSMs, but it can be tuned through the choice of loss function to compute committor probabilities accurately near the reactants, transition states, and products. Other than the Markov assumption, our method requires no knowledge of the underlying model and puts no restrictions on its dynamics.

As discussed in prior neural-network based prediction work,24,30 our method is sensitive to the quality and distribution of the initial sampling data. However, our work shares with Ref. 24 the major advantage of allowing the use of arbitrary inner products. This enables adaptive sampling of the state space24,60 and—together with the features described above—the application to observational and experimental data, for which the stationary distribution is generally unavailable.

In the present work, we focused on statistics of transition operators, but our method can readily be extended to solve problems involving their adjoint operators as well. By this means, we can obtain the stationary distribution as well as the backward committor. The combination of forward and backward predictions allows the analysis of transition paths using transition path theory without needing to generate full transition paths37,38,48 and has been used to understand rare transition events in molecular dynamics10,13,16,17,61,62 and geophysical flows.63–67 We leave these extensions to future work.

In cases in which trajectories that reach the reactant and product states are available, it would be interesting to compare our inexact iterative schemes against schemes for committor approximation based on logistic regression and related approaches.35,41–46,68 These schemes are closely related to what is called “Monte Carlo” approximation in reinforcement learning,31 and also to the inexact Richardson iteration that we propose here with τ → ∞.

We have seen that temporal difference (TD) learning, a workhorse for prediction in reinforcement learning, is closely related to an inexact form of Richardson iteration. Variants like TD(λ), have similar relationships to inexact iterative schemes. As we showed, subspace iteration is a natural way of addressing slow convergence. We thus anticipate that our results have implications for reinforcement learning, particularly in scenarios in which the value function depends on the occurrence of some rare-event. Finally, we note that our framework can be extended to the wider range of iterative numerical linear algebra algorithms. In particular, Krylov or block Krylov subspace methods may offer further acceleration. In fact, very recently an approach along these lines was introduced for value function estimation in reinforcement learning.69 

We are grateful to Arnaud Doucet for pointing our attention to TD methods and the inexact power iteration in Ref. 32, both of which were key motivations for this work. We also thank Joan Bruna for helpful conversations about reinforcement learning. This work was supported by the National Institutes of Health (Award No. R35 GM136381) and the National Science Foundation (Award No. DMS-2054306). S.C.G. acknowledges support by the National Science Foundation Graduate Research Fellowship under Grant No. 2140001. J.W. acknowledges support from the Advanced Scientific Computing Research Program within the DOE Office of Science through Award No. DE-SC0020427. This work was completed in part with resources provided by the University of Chicago Research Computing Center, and we are grateful for their assistance with the calculations. “Beagle-3: A Shared GPU Cluster for Biomolecular Sciences” was supported by the National Institutes of Health (NIH) under the High-End Instrumentation (HEI) Grant Program Award No. 1S10OD028655-0.

The authors have no conflicts to disclose.

J.S., S.C.G., A.R.D., and J.W. conceptualized research. J.S. developed the method. C.L. adapted the linear solve used for prediction from Refs. 58 and 59. J.S. and S.C.G. performed the numerical tests. A.R.D. and J.W. supervised the research. All authors wrote and edited the manuscript.

John Strahan: Conceptualization (equal); Investigation (equal); Writing – original draft (equal); Writing – review & editing (equal). Spencer C. Guo: Conceptualization (equal); Investigation (equal); Writing – original draft (equal); Writing – review & editing (equal). Chatipat Lorpaiboon: Methodology (equal); Writing – review & editing (equal). Aaron R. Dinner: Conceptualization (equal); Funding acquisition (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). Jonathan Weare: Conceptualization (equal); Funding acquisition (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

The data that support the findings of this study are available within the article. Code implementing the algorithms and a Jupyter notebook illustrating use of the method on the Müller–Brown example are available at https://github.com/dinner-group/inexact-subspace-iteration.

1.
F.
Noé
and
F.
Nüske
, “
A variational approach to modeling slow processes in stochastic dynamical systems
,”
Multiscale Model. Simul.
11
(
2
),
635
655
(
2013
).
2.
F.
Nüske
,
B. G.
Keller
,
G.
Pérez-Hernández
,
A. S. J. S.
Mey
, and
F.
Noé
, “
Variational approach to molecular kinetics
,”
J. Chem. Theory Comput.
10
(
4
),
1739
1752
(
2014
).
3.
S.
Klus
,
F.
Nüske
,
P.
Koltai
,
H.
Wu
,
I.
Kevrekidis
,
C.
Schütte
, and
F.
Noé
, “
Data-driven model reduction and transfer operator approximation
,”
J. Nonlinear Sci.
28
,
985
1010
(
2018
).
4.
R. J.
Webber
,
E. H.
Thiede
,
D.
Dow
,
A. R.
Dinner
, and
J.
Weare
, “
Error bounds for dynamical spectral estimation
,”
SIAM J. Math. Data Sci.
3
(
1
),
225
252
(
2021
).
5.
C.
Lorpaiboon
,
E. H.
Thiede
,
R. J.
Webber
,
J.
Weare
, and
A. R.
Dinner
, “
Integrated variational approach to conformational dynamics: A robust strategy for identifying eigenfunctions of dynamical operators
,”
J. Phys. Chem. B
124
(
42
),
9354
9364
(
2020
).
6.
R. T.
McGibbon
,
B. E.
Husic
, and
V. S.
Pande
, “
Identification of simple reaction coordinates from complex dynamics
,”
J. Chem. Phys.
146
(
4
),
044109
(
2017
).
7.
L.
Busto-Moner
,
C.-J.
Feng
,
A.
Antoszewski
,
A.
Tokmakoff
, and
A. R.
Dinner
, “
Structural ensemble of the insulin monomer
,”
Biochemistry
60
(
42
),
3125
3136
(
2021
).
8.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
(
1
),
015102
(
2013
).
9.
C. R.
Schwantes
and
V. S.
Pande
, “
Improvements in Markov State Model construction reveal many non-native interactions in the folding of NTL9
,”
J. Chem. Theory Comput.
9
(
4
),
2000
2009
(
2013
).
10.
J.
Strahan
,
A.
Antoszewski
,
C.
Lorpaiboon
,
B. P.
Vani
,
J.
Weare
, and
A. R.
Dinner
, “
Long-time-scale predictions from short-trajectory data: A benchmark analysis of the trp-cage miniprotein
,”
J. Chem. Theory Comput.
17
(
5
),
2948
2963
(
2021
).
11.
E. H.
Thiede
,
D.
Giannakis
,
A. R.
Dinner
, and
J.
Weare
, “
Galerkin approximation of dynamical quantities using trajectory data
,”
J. Chem. Phys.
150
(
24
),
244111
(
2019
).
12.
W. C.
Swope
,
J. W.
Pitera
, and
F.
Suits
, “
Describing protein folding kinetics by molecular dynamics simulations. 1. Theory
,”
J. Phys. Chem. B
108
(
21
),
6571
6581
(
2004
).
13.
N.
Frank
,
C.
Schütte
,
E.
Vanden-Eijnden
,
L.
Reich
, and
T. R.
Weik
, “
Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations
,”
Proc. Natl. Acad. Sci. U. S. A.
106
(
45
),
19011
19016
(
2009
).
14.
An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation
, edited by
G. R.
Bowman
,
V. S.
Pande
, and
N.
Frank
,
Volume 797 of Advances in Experimental Medicine and Biology
(
Springer Netherlands
,
Dordrecht
,
2014
).
15.
J.
Finkel
,
R. J.
Webber
,
D. S.
Abbot
,
E. P.
Gerber
, and
J.
Weare
, “
Learning forecasts of rare stratospheric transitions from short simulations
,”
Mon. Weather Rev.
149
(
11
),
3647
3669
(
2021
).
16.
A.
Antoszewski
,
C.
Lorpaiboon
,
J.
Strahan
, and
A. R.
Dinner
, “
Kinetics of phenol escape from the insulin R6 hexamer
,”
J. Phys. Chem. B
125
(
42
),
11637
11649
(
2021
).
17.
S. C.
Guo
,
R.
Shen
,
B.
Roux
, and
A. R.
Dinner
, “
Dynamics of activation in the voltage-sensing domain of Ci-VSP
,” (
2022
).
18.
G.
Andrew
,
R.
Arora
,
J.
Bilmes
, and
K.
Livescu
, “
Deep canonical correlation analysis
,” in
Proceedings of the 30th International Conference on Machine Learning
(
PMLR
,
2013
), pp.
1247
1255
.
19.
A.
Mardt
,
L.
Pasquali
,
H.
Wu
, and
N.
Frank
, “
VAMPnets for deep learning of molecular kinetics
,”
Nat. Commun.
9
(
1
),
5
(
2018
).
20.
C.
Wehmeyer
and
F.
Noé
, “
Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics
,”
J. Chem. Phys.
148
(
24
),
241703
(
2018
).
21.
B.
Lusch
,
J. N.
Kutz
, and
S. L.
Brunton
, “
Deep learning for universal linear embeddings of nonlinear dynamics
,”
Nat. Commun.
9
(
1
),
4950
(
2018
).
22.
W.
Chen
,
H.
Sidky
, and
A. L.
Ferguson
, “
Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets
,”
J. Chem. Phys.
150
(
21
),
214114
(
2019
).
23.
A.
Glielmo
,
B. E.
Husic
,
A.
Rodriguez
,
C.
Clementi
,
N.
Frank
, and
A.
Laio
, “
Unsupervised learning methods for molecular simulation data
,”
Chem. Rev.
121
,
9722
(
2021
).
24.
J.
Strahan
,
J.
Finkel
,
A. R.
Dinner
, and
J.
Weare
, “
Predicting rare events using neural networks and short-trajectory data
,”
J. Comput. Phys.
488
,
112152
(
2023
).
25.
H.
Li
,
Y.
Khoo
,
Y.
Ren
, and
L.
Ying
, “
A semigroup method for high dimensional committor functions based on neural network
,” in
Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference
(
PMLR
,
2022
), pp.
598
618
.
26.
Y.
Khoo
,
J.
Lu
, and
L.
Ying
, “
Solving for high-dimensional committor functions using artificial neural networks
,”
Res. Math. Sci.
6
(
1
),
1
(
2018
).
27.
Q.
Li
,
B.
Lin
, and
W.
Ren
, “
Computing committor functions for the study of rare events using deep learning
,”
J. Chem. Phys.
151
(
5
),
054112
(
2019
).
28.
B.
Roux
, “
String method with swarms-of-trajectories, mean drifts, lag time, and committor
,”
J. Phys. Chem. A
125
(
34
),
7558
7571
(
2021
).
29.
B.
Roux
, “
Transition rate theory, spectral analysis, and reactive paths
,”
J. Chem. Phys.
156
(
13
),
134111
(
2022
).
30.
G. M.
Rotskoff
,
A. R.
Mitchell
, and
E.
Vanden-Eijnden
, “
Active importance sampling for variational objectives dominated by rare events: Consequences for optimization and generalization
,” in
Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference
(
PMLR
,
2022
), pp.
757
780
.
31.
R. S.
Sutton
and
A. G.
Barto
,
Reinforcement Learning: An Introduction
,
2nd ed.
(
The MIT Press
,
Cambridge, MA
,
2018
).
32.
J.
Wen
,
B.
Dai
,
L.
Li
, and
S.
Dale
, “
Batch stationary distribution estimation
,” in
Proceedings of the 37th International Conference on Machine Learning, ICML’20
(
JMLR.org
,
2020
), pp.
10203
10213
.
33.
G. H.
Golub
and
C. F.
Van Loan
,
Matrix Computations
,
3rd ed.
(
The Johns Hopkins University Press
,
1996
).
34.
R.
Du
,
V. S.
Pande
,
A. Y.
Grosberg
,
T.
Tanaka
, and
E. S.
Shakhnovich
, “
On the transition coordinate for protein folding
,”
J. Chem. Phys.
108
(
1
),
334
350
(
1998
).
35.
A.
Ma
and
A. R.
Dinner
, “
Automatic method for identifying reaction coordinates in complex systems
,”
J. Phys. Chem. B
109
(
14
),
6769
6779
(
2005
).
36.
S. V.
Krivov
, “
On reaction coordinate optimality
,”
J. Chem. Theory Comput.
9
(
1
),
135
146
(
2013
).
37.
W.
E
and
E.
Vanden-Eijnden
, “
Transition-path theory and path-finding algorithms for the study of rare events
,”
Annu. Rev. Phys. Chem.
61
,
391
420
(
2010
).
38.
E.
Vanden-Eijnden
, “
Transition path theory
,” in
Computer Simulations in Condensed Matter Systems: From Materials to Chemical Biology
(
Springer
,
2006
), Vol.
1
, pp.
453
493
.
39.
P.
Collett
,
S.
Martinez
, and
J.
San Martin
, “
Quasi-stationary distributions
,” in
Probability and its Applications
(
Springer Berlin Heidelberg
,
2012
).
40.
X.
Nguyen
,
M. J.
Wainwright
, and
M. I.
Jordan
, “
Estimating divergence functionals and the likelihood ratio by convex risk minimization
,”
IEEE Trans. Inf. Theory
56
(
11
),
5847
5861
(
2010
).
41.
B.
Peters
and
B. L.
Trout
, “
Obtaining reaction coordinates by likelihood maximization
,”
J. Chem. Phys.
125
(
5
),
054108
(
2006
).
42.
B.
Peters
,
G. T.
Beckham
, and
B. L.
Trout
, “
Extensions to the likelihood maximization approach for finding reaction coordinates
,”
J. Chem. Phys.
127
(
3
),
034109
(
2007
).
43.
H.
Jung
,
R.
Covino
, and
G.
Hummer
, “
Artificial intelligence assists discovery of reaction coordinates and mechanisms from molecular dynamics simulations
,” arXiv:1901.04595 (
2019
).
44.
A.
Chattopadhyay
,
E.
Nabizadeh
, and
P.
Hassanzadeh
, “
Analog forecasting of extreme-causing weather patterns using deep learning
,”
J. Adv. Model. Earth Syst.
12
(
2
),
e2019MS001958
(
2020
).
45.
H.
Jung
,
R.
Covino
,
A.
Arjun
,
C.
Leitold
,
C.
Dellago
,
P. G.
Bolhuis
, and
G.
Hummer
, “
Machine-guided path sampling to discover mechanisms of molecular self-organization
,”
Nat. Comput. Sci.
3
(
4
),
334
345
(
2023
).
46.
M.
George
,
B.
Cozian
,
P.
Abry
,
P.
Borgnat
, and
F.
Bouchet
, “
Probabilistic forecasts of extreme heatwaves using convolutional neural networks in a regime of lack of data
,”
Phys. Rev. Fluids
8
(
4
),
040501
(
2023
).
47.
K.
Müller
and
L. D.
Brown
, “
Location of saddle points and minimum energy paths by a constrained simplex optimization procedure
,”
Theor. Chim. Acta
53
(
1
),
75
93
(
1979
).
48.
C.
Lorpaiboon
,
J.
Weare
, and
A. R.
Dinner
, “
Augmented transition path theory for sequences of events
,”
J. Chem. Phys.
157
(
9
),
094115
(
2022
).
49.
S.
Buchenberg
,
N.
Schaudinnus
, and
G.
Gerhard Stock
, “
Hierarchical biomolecular dynamics: Picosecond hydrogen bonding regulates microsecond conformational transitions
,”
J. Chem. Theory Comput.
11
(
3
),
1330
1336
(
2015
).
50.
A.
Perez
,
F.
Sittel
,
G.
Stock
, and
K.
Dill
, “
MELD-path efficiently computes conformational transitions, including multiple and diverse paths
,”
J. Chem. Theory Comput.
14
(
4
),
2109
2116
(
2018
).
51.
F.
Sittel
,
T.
Filk
, and
G.
Stock
, “
Principal component analysis on a torus: Theory and application to protein dynamics
,”
J. Chem. Phys.
147
(
24
),
244101
(
2017
).
52.
C. W.
Hopkins
,
R. C.
Walker
,
S.
Le Grand
, and
A. E.
Roitberg
, “
Long-Time-step molecular dynamics through hydrogen mass repartitioning
,”
J. Chem. Theory Comput.
11
(
4
),
1864
1874
(
2015
).
53.
G. A.
Khoury
,
J.
Smadbeck
,
P.
Tamamis
,
A. C.
Vandris
,
C. A.
Kieslich
, and
C. A.
Floudas
, “
Ab Initio charge parameters to aid in the discovery and design of therapeutic proteins and peptides with unnatural amino acids and their application to complement inhibitors of the compstatin family
,”
ACS Synth. Biol.
3
(
12
),
855
869
(
2014
).
54.
H.
Nguyen
,
D. R.
Roe
, and
C.
Simmerling
, “
Improved generalized Born solvent model parameters for protein simulations
,”
J. Chem. Theory Comput.
9
(
4
),
2020
2034
(
2013
).
55.
P.
Eastman
,
J.
Swails
,
J. D.
Chodera
,
R. T.
McGibbon
,
Y.
Zhao
,
K. A.
Beauchamp
,
L.-P.
Wang
,
A. C.
Simmonett
,
M. P.
Harrigan
,
C. D.
Stern
,
R. P.
Wiewiora
,
B. R.
Brooks
, and
V. S.
Pande
, “
OpenMM 7: Rapid development of high performance algorithms for molecular dynamics
,”
PLoS Comput. Biol.
13
(
7
),
e1005659
(
2017
).
56.
P.
Metzner
,
C.
Schütte
, and
E.
Vanden-Eijnden
, “
Illustration of transition path theory on a collection of simple examples
,”
J. Chem. Phys.
125
(
8
),
084110
(
2006
).
57.
D. P.
Kingma
and
B.
Jimmy
, “
Adam: A method for stochastic optimization
,” arXiv:1412.6980 (
2014
).
58.
E.
Darve
,
J.
Solomon
, and
A.
Kia
, “
Computing generalized Langevin equations and generalized Fokker-Planck equations
,”
Proc. Natl. Acad. Sci. U. S. A.
106
(
27
),
10884
10889
(
2009
).
59.
S.
Cao
,
A.
Montoya-Castillo
,
W.
Wang
,
T. E.
Markland
, and
X.
Huang
, “
On the advantages of exploiting memory in Markov state models for biomolecular dynamics
,”
J. Chem. Phys.
153
(
1
),
014105
(
2020
).
60.
D.
Lucente
,
J.
Rolland
,
C.
Herbert
, and
F.
Bouchet
, “
Coupling rare event algorithms with data-based learned committor functions using the analogue Markov chain
,”
J. Stat. Mech.: Theory Exp.
2022
(
8
),
083201
(2022).
61.
Y.
Meng
,
D.
Shukla
,
V. S.
Pande
, and
B.
Roux
, “
Transition path theory analysis of c-Src kinase activation
,”
Proc. Natl. Acad. Sci. U. S. A.
113
(
33
),
9193
9198
(
2016
).
62.
B. P.
Vani
,
J.
Weare
, and
A. R.
Dinner
, “
Computing transition path theory quantities with trajectory stratification
,”
J. Chem. Phys.
157
(
3
),
034106
(
2022
).
63.
J.
Finkel
,
D. S.
Abbot
, and
J.
Weare
, “
Path properties of atmospheric transitions: Illustration with a low-order sudden stratospheric warming model
,”
J. Atmos. Sci.
77
(
7
),
2327
2347
(
2020
).
64.
P.
Miron
,
F. J.
Beron-Vera
,
L.
Helfmann
, and
P.
Koltai
, “
Transition paths of marine debris and the stability of the garbage patches
,”
Chaos
31
(
3
),
033101
(
2021
).
65.
D.
Lucente
,
C.
Herbert
, and
F.
Bouchet
, “
Committor functions for climate phenomena at the predictability margin: The example of El Niño-Southern Oscillation in the Jin and Timmermann model
,”
J. Atmos. Sci.
79
(
9
),
2387
2400
(
2022
).
66.
J.
Finkel
,
E. P.
Gerber
,
S. A.
Dorian
, and
J.
Weare
, “
Revealing the statistics of extreme events hidden in short weather forecast data
,”
AGU Adv.
4
(
2
),
e2023AV000881
(
2023
).
67.
J.
Finkel
,
R. J.
Webber
,
E. P.
Gerber
,
D. S.
Abbot
, and
J.
Weare
, “
Data-driven transition path analysis yields a statistical understanding of sudden stratospheric warming events in an idealized model
,”
J. Atmos. Sci.
80
(
2
),
519
534
(
2023
).
68.
J.
Hu
,
A.
Ma
, and
A. R.
Dinner
, “
A two-step nucleotide-flipping mechanism enables kinetic discrimination of DNA lesions by AGT
,”
Proc. Natl. Acad. Sci. U. S. A.
105
(
12
),
4615
4620
(
2008
).
69.
E.
Xia
and
M.
Wainwright
, “
Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces
,” in
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, Volume 206 of Proceedings of Machine Learning Research
, edited by
F.
Ruiz
,
D.
Jennifer
, and
J.-W.
van de Meent
(
PMLR
,
2023
), pp.
9137
9166
.