Finite Markov chains, memoryless random walks on complex networks, appear commonly as models for stochastic dynamics in condensed matter physics, biophysics, ecology, epidemiology, economics, and elsewhere. Here, we review exact numerical methods for the analysis of arbitrary discrete- and continuous-time Markovian networks. We focus on numerically stable methods that are required to treat nearly reducible Markov chains, which exhibit a separation of characteristic timescales and are therefore ill-conditioned. In this metastable regime, dense linear algebra methods are afflicted by propagation of error in the finite precision arithmetic, and the kinetic Monte Carlo algorithm to simulate paths is unfeasibly inefficient. Furthermore, iterative eigendecomposition methods fail to converge without the use of nontrivial and system-specific preconditioning techniques. An alternative approach is provided by state reduction procedures, which do not require additional *a priori* knowledge of the Markov chain. Macroscopic dynamical quantities, such as moments of the first passage time distribution for a transition to an absorbing state, and microscopic properties, such as the stationary, committor, and visitation probabilities for nodes, can be computed robustly using state reduction algorithms. The related kinetic path sampling algorithm allows for efficient sampling of trajectories on a nearly reducible Markov chain. Thus, all of the information required to determine the kinetically relevant transition mechanisms, and to identify the states that have a dominant effect on the global dynamics, can be computed reliably even for computationally challenging models. Rare events are a ubiquitous feature of realistic dynamical systems, and so the methods described herein are valuable in many practical applications.

## I. INTRODUCTION

Finite Markov chains^{1–4} are commonly used to represent a variety of stochastic processes. They provide attractive coarse-grained representations of continuous-state models, such as the dynamics of many-particle systems,^{5} since the high dimensionality of the coordinate space can be preserved.^{6} Markov chains corresponding to a continuous-state system can be constructed using explicit simulation data to estimate a network model by maximum-likelihood^{7–11} or Gibbs sampling^{12–15} approaches.^{16–20} Alternatively, the energy landscape of a physical system can be mapped to a Markovian network using geometry optimization methods^{21} to locate the stationary points.^{22–29} The dynamics of the resulting Markov chain are described by a linear master equation,^{30–32} a system of coupled first-order ordinary differential equations (ODEs). Here, the Markov chain corresponds to a network for which the edges are parameterized by rates *K*_{ij} for the transitions between nodes *i* ← *j* (in the continuous-time case) or transition probabilities *T*_{ij}(*τ*) at a lag time *τ* (in the discrete-time case).^{33} Continuous-time Markov chains (CTMCs) are also frequently used to represent population dynamics processes, where the transitions correspond to discrete changes in the numbers of species. Each node of the network is then associated with a vector specifying the population distribution. Since the populations of species are unbounded, the state space of the network is countably infinite^{34,35} but can be truncated to yield a finite Markov chain with negligible error.^{36,37} Simple examples of such models include birth–death processes^{38,39} and queuing networks.^{40} More complex examples arise as representations of chemical^{41,42} and biochemical^{43–45} reaction cycles, gene regulatory networks,^{46–49} epidemic spread,^{50} and ecosystems.^{51}

In realistic models for complex dynamical processes, there often exists a separation of characteristic timescales.^{52–66} For instance, the extinction of a species in an ecosystem takes place over a long period of time, compared to short-timescale fluctuations in the size of the population arising from individual births and deaths.^{51} In molecular and condensed matter systems, the underlying energy landscape typically features a disparity in the heights of energy barriers separating regions of the state space.^{67–71} For example, the folding of a protein to its functional structure is a relatively slow process of cooperative movement that typically requires surmounting one or more large free energy barriers, whereas unproductive local fluctuations are facile.^{28,29} In each of these applications, it is the rare event that is the transition of interest.

In the present work, we review exact computational methods to analyze the dynamics of arbitrary discrete- and continuous-time finite Markov chains. In particular, we are concerned with *nearly reducible* Markov chains,^{72,73} which exhibit rare event dynamics and are consequently ill-conditioned.^{74–79} The application of conventional dense linear algebra methods to nearly reducible Markovian networks is therefore usually prohibited by the severe propagation of error arising from the limits of numerical precision.^{80} Similarly, sparse iterative methods are subject to convergence issues in this rare event regime, necessitating the use of preconditioning techniques.^{76} The kinetic Monte Carlo^{81,82} (kMC) algorithm for simulation of trajectories also becomes unfeasibly inefficient, since the paths have a tendency to “flicker” within the metastable sets of nodes.^{83–87} We therefore focus on specialized methods that are capable of treating Markov chains featuring metastability. We consider a general Markov chain comprising the set of nodes $S$. To formulate the problem of analyzing a particular transition, we consider initial and absorbing (target) sets of nodes, denoted $B$ and $A$, respectively. The nodes in $B$ are a subset of the set of transient (nonabsorbing) nodes, denoted $Q\u2261Ac$.

Following an overview of the relevant theory of Markov chains and standard linear algebra methods for their exact analysis (Sec. II), we provide a detailed review of procedures that have superior numerical stability and are therefore recommended for application to networks exhibiting metastability. Specifically, we discuss state reduction methods that are inherently robust and do not involve preconditioning techniques. The mean first passage time (MFPT) for the $A\u2190B$ transition can be computed using the graph transformation (GT) algorithm (Sec. III).^{88–93} The GT approach is closely related to uncoupling–coupling methods (Secs. IV A and IV B) and the Grassmann–Taksar–Heyman (GTH) algorithm (Sec. IV C) for computation of the stationary distribution. Further state reduction procedures exist to compute other macroscopic dynamical properties, such as the expected time to reach the equilibrium distribution, and higher moments of the first passage time distribution (Sec. IV D). State reduction algorithms are also available to compute microscopic dynamical properties, including the committor probabilities for nodes (Sec. V A 4), defined as the probability that a trajectory initialized at that node hits state $A$ before state $B$, and other quantities that characterize the $A\u2190B$ transition path ensemble (TPE),^{94–100} such as state visitation probabilities (Sec. IV E). Finally, we discuss efficient methods to simulate paths on a nearly reducible Markov chain. The kinetic path sampling^{83,84} (kPS) and Monte Carlo with absorbing Markov chain^{101–104} (MCAMC) algorithms provide an efficient alternative to standard kMC simulations^{81,82} for this purpose (Sec. V). The former method extends the GT algorithm with an iterative reverse randomization procedure to sample the time associated with a path escaping from a metastable region of the network, while the latter approach uses local eigendecompositions. By employing the specialized methods described herein, it is, in principle, possible to extract any desired dynamical information from a nearly reducible Markov chain. An overview of pertinent dynamical properties, the various methods to calculate them, and specific considerations that arise in their application, are summarized in Table I.

Property . | Methods for calculation . | Comments . | References . |
---|---|---|---|

Mean first passage time for | Matrix inversion (via A^{#}) | Irreducible chains only. Stable if using REFUND. Yields $Tij\u2200i,j\u2208S$. | Equation (12) |

transition from node j, $TAj$ | Eigendecomposition | Irreducible and reversible chains only. Yields $Tij\u2200i,j\u2208S$. | Equation (19) |

Linear solve | Yields MFPTs for transitions from all nodes to absorbing state. | Equation (39) | |

State reduction (GT) | Stable. Use recursive formulation to compute MFPTs for transitions | Section III A, Ref. 112 | |

from all nodes to the absorbing state, $TAj\u2200j\u2208S$ | |||

Variance of the first passage | Matrix inversion (via A^{#}) | Irreducible chains only. Stable if using REFUND. Yields $Vij\u2200i,j\u2208S$. | Equations (13) and (14) |

time distribution for transition | Eigendecomposition | Yields $VAj\u2200j\u2208S$. Also yields the full FPT distribution [Eq. (22)]. | Equation (24) |

from node j, $VAj$ | State reduction (Dayar-Akar) | Stable. Yields variances of FPT for transitions from all nodes to the | Reference 113 |

Stationary probability | Eigendecomposition | absorbing state, $VAj\u2200j\u2208S$ Stationary distribution is the dominant | Section II A |

distribution, π | State reduction (GTH) | right eigenvector.Stable. | Section IV C |

exact uncoupling–coupling IAD | Stable if metastable states chosen appropriately. Parallelizable. | Section IV A | |

Efficient if metastable states chosen appropriately. Parallelizable | Section IV B | ||

Committor probability for | Eigendecomposition | Can specify any number of attractors. | Equation (30) |

node j, $qj+$ (first hitting | Linear solve | Two attractors only (i.e., $A\u2190B$ committor probability). | Equation (29) |

probability of state $A$) | State reduction | Stable, can specify any number of attractors | Section V A 4 |

Expected number of | Matrix inversion | Stable if the using state reduction method with the augmented matrix. | Reference 112 |

visits to node j, θ_{j} | Linear solve | N.B. result for reactive paths, $\theta \u0303j$, follows from θ_{j} and $qj+\u2200j\u2208S$ | Reference 99 |

Reactive visitation | From $\theta \u0303j$ and $qj+\u2200j\u2208S$ | Stable when this info is computed via state reduction. Yields $rj+\u2200j\u2208S$. | Reference 112 |

probability for node j, $rj+$ | State reduction | Stable. Computes visitation probability for a single node or set thereof | Reference 114 |

Net reactive i ← j internode | $fij+$ from $qj+$ and $\pi j\u2200j\u2208S$ | Can be used in shortest paths analysis to decompose total reactive $A\u2190B$ | Equation (32) |

flux, $fij+$ (eq.) or $J\u0303ij$ (noneq.) | $J\u0303ij$ from $qj+$ and $\theta \u0303j\u2200j\u2208S$ | flux [Eq. (33)] into contributions from finite set of flux-paths (Ref. 115) | Reference 99 |

Average mixing time, ζ_{K} | Matrix inversion (via A^{#}) | Stable if using REFUND (Sec. IV D). | Equation (15) |

Eigendecomposition | Reversible chains only | Equation (18) | |

Time-dependent occupation | Eigendecomposition | Diagonalizable chains only. | Equation (21) |

probability distribution, p(t) | Solve ODEs | Use stiff ODE integrator for systems with metastability. | Equation (1) |

Matrix exponentiation | Can use sparse linear algebra methods to calculate p(t). | Equation (3), Ref. 116 | |

Simulate trajectories | (see below) | Section V | |

Simulate trajectories | Kinetic Monte Carlo | Unfeasibly inefficient when trajectories are long. | Section V |

[sample p(t)/sample | Kinetic path sampling | Stable, requires a priori knowledge of metastable states. | Section V A |

FPT distribution] | MCAMC | Requires a priori knowledge of metastable states | Section V B |

Property . | Methods for calculation . | Comments . | References . |
---|---|---|---|

Mean first passage time for | Matrix inversion (via A^{#}) | Irreducible chains only. Stable if using REFUND. Yields $Tij\u2200i,j\u2208S$. | Equation (12) |

transition from node j, $TAj$ | Eigendecomposition | Irreducible and reversible chains only. Yields $Tij\u2200i,j\u2208S$. | Equation (19) |

Linear solve | Yields MFPTs for transitions from all nodes to absorbing state. | Equation (39) | |

State reduction (GT) | Stable. Use recursive formulation to compute MFPTs for transitions | Section III A, Ref. 112 | |

from all nodes to the absorbing state, $TAj\u2200j\u2208S$ | |||

Variance of the first passage | Matrix inversion (via A^{#}) | Irreducible chains only. Stable if using REFUND. Yields $Vij\u2200i,j\u2208S$. | Equations (13) and (14) |

time distribution for transition | Eigendecomposition | Yields $VAj\u2200j\u2208S$. Also yields the full FPT distribution [Eq. (22)]. | Equation (24) |

from node j, $VAj$ | State reduction (Dayar-Akar) | Stable. Yields variances of FPT for transitions from all nodes to the | Reference 113 |

Stationary probability | Eigendecomposition | absorbing state, $VAj\u2200j\u2208S$ Stationary distribution is the dominant | Section II A |

distribution, π | State reduction (GTH) | right eigenvector.Stable. | Section IV C |

exact uncoupling–coupling IAD | Stable if metastable states chosen appropriately. Parallelizable. | Section IV A | |

Efficient if metastable states chosen appropriately. Parallelizable | Section IV B | ||

Committor probability for | Eigendecomposition | Can specify any number of attractors. | Equation (30) |

node j, $qj+$ (first hitting | Linear solve | Two attractors only (i.e., $A\u2190B$ committor probability). | Equation (29) |

probability of state $A$) | State reduction | Stable, can specify any number of attractors | Section V A 4 |

Expected number of | Matrix inversion | Stable if the using state reduction method with the augmented matrix. | Reference 112 |

visits to node j, θ_{j} | Linear solve | N.B. result for reactive paths, $\theta \u0303j$, follows from θ_{j} and $qj+\u2200j\u2208S$ | Reference 99 |

Reactive visitation | From $\theta \u0303j$ and $qj+\u2200j\u2208S$ | Stable when this info is computed via state reduction. Yields $rj+\u2200j\u2208S$. | Reference 112 |

probability for node j, $rj+$ | State reduction | Stable. Computes visitation probability for a single node or set thereof | Reference 114 |

Net reactive i ← j internode | $fij+$ from $qj+$ and $\pi j\u2200j\u2208S$ | Can be used in shortest paths analysis to decompose total reactive $A\u2190B$ | Equation (32) |

flux, $fij+$ (eq.) or $J\u0303ij$ (noneq.) | $J\u0303ij$ from $qj+$ and $\theta \u0303j\u2200j\u2208S$ | flux [Eq. (33)] into contributions from finite set of flux-paths (Ref. 115) | Reference 99 |

Average mixing time, ζ_{K} | Matrix inversion (via A^{#}) | Stable if using REFUND (Sec. IV D). | Equation (15) |

Eigendecomposition | Reversible chains only | Equation (18) | |

Time-dependent occupation | Eigendecomposition | Diagonalizable chains only. | Equation (21) |

probability distribution, p(t) | Solve ODEs | Use stiff ODE integrator for systems with metastability. | Equation (1) |

Matrix exponentiation | Can use sparse linear algebra methods to calculate p(t). | Equation (3), Ref. 116 | |

Simulate trajectories | (see below) | Section V | |

Simulate trajectories | Kinetic Monte Carlo | Unfeasibly inefficient when trajectories are long. | Section V |

[sample p(t)/sample | Kinetic path sampling | Stable, requires a priori knowledge of metastable states. | Section V A |

FPT distribution] | MCAMC | Requires a priori knowledge of metastable states | Section V B |

## II. BACKGROUND THEORY OF MARKOV CHAINS

### A. Master equation dynamics

The dynamics of a continuous-time Markov chain (CTMC), parameterized by *i* ← *j* internode transition rates *K*_{ij} and with state space $S$, is governed by the linear master equation^{30–32,105}

where *p*_{j}(*t*) is the time-dependent occupation probability of the *j*th node. Equation (1) can be written in matrix form as

where $p(t)=(p1(t),p2(t),\u2026,p|S|(t))\u22a4$ is the time-dependent occupation probability vector for the nodes of the network and **K** is the transition rate matrix. **K** has off-diagonal elements equal to the transition rates and diagonal elements such that the columns of the matrix sum to zero, i.e., *K*_{jj} = −∑_{γ≠j}*K*_{γj}. The right eigenvectors of **K** represent dynamical eigenmodes, where the magnitudes and signs of the elements reflect the extent and direction of probability flow, respectively, for the corresponding nodes.^{106} The solution of Eq. (2) is a sum of contributions that decay exponentially with rates equal to the negatives of the eigenvalues {*γ*_{k}} of **K**.^{8} This solution leads to a formal definition for metastability, namely, that the Markov chain exhibits a spectral gap in its set of eigenvalues {*γ*_{k}}, and hence, there exists a set of dynamical eigenmodes that represent comparatively slow relaxation processes.^{64} By the Perron–Frobenius theorem,^{107}^{,}**K** has a unique dominant zero eigenvalue *γ*_{1} = 0 if the Markov chain is *irreducible*,^{73} i.e., if every node of the network is reachable from all other nodes. The zero eigenvalue is associated with the stationary probability vector ** π**, and all other eigenvalues have a negative real component. The stationary distribution satisfies the global balance equation

**K**=

*π***0**, where

**0**is the column vector with all elements equal to zero.

^{73}If the stationary distribution also satisfies the detailed balance condition,

*K*

_{ij}

*π*

_{j}=

*K*

_{ji}

*π*

_{i}∀

*i*≠

*j*, then the Markov chain is said to be

*reversible*, and the eigenvalues of

**K**are real.

^{2}

The transition rate (or *infinitesimal generator*) matrix **K** is related to a transition probability matrix **T**(*τ*), which propagates the probability distribution vector **p**(*t*) at discrete-time intervals *τ* according to the Chapman–Kolmogorov equation^{2} **p**(*t* + *τ*) = **T**(*τ*)**p**(*t*) via^{50}

and

where **I** is the identity matrix. For an irreducible and aperiodic Markov chain, **T** has a single dominant eigenvalue *λ*_{1} = 1 associated with the stationary distribution ** π**, which is the occupation probability distribution in the limit of infinite time. The absolute value of all other eigenvalues,

*λ*

_{k}for $k=2,\u2026|S|$, is less than unity.

^{1}The transition rate and probability matrices share the same set of right and left eigenvectors, {

*ψ*^{(k)}} and {

*ϕ*^{(k)}}, respectively. Their eigenvalues are related via $e\gamma k\tau =\lambda k(\tau )$ [cf. Eq. (3)].

^{50}When the dynamics are reversible, the Markov chain has a complete set of orthonormal eigenvectors.

^{108}Specifically, the left and right eigenvectors satisfy the orthonormality conditions

^{8}

where $\psi j(k)$ is the *j*th element of the *k*th right eigenvector and *δ*_{kl} is the Kronecker delta. In fact, reversibility is not a requirement for the final relation of Eq. (5) to hold.

There are two possible choices of stochastic matrix for a CTMC.^{109} The first is the branching probability matrix **P**, with elements *P*_{ij} = *K*_{ij}/∑_{γ≠j}*K*_{γj}.^{91} **P** contains no self-loop transitions, and the time associated with a transition from the *j*th node is exponentially distributed with mean *τ*_{j} = 1/∑_{γ≠j}*K*_{γj},^{110} referred to as the mean waiting time for the *j*th node. The second valid stochastic matrix for a CTMC is the continuous-time linearized transition probability matrix^{83}

for which the mean waiting times are uniform for all nodes, equal to *τ*. **T**_{lin} has the same sparsity pattern as **P**, except that the linearized matrix contains self-loop transitions, whereas *P*_{jj} = 0 ∀ *j*. Provided that $\tau \u2264min{\u2212Kjj\u22121:\u2200j}$, the linearized transition matrix is column-stochastic and shares the same set of eigenvectors as **K**, but the eigenvalues are shifted.^{111} In the following exposition, we will use the notation **T** to denote any general stochastic matrix for a discrete- or continuous-time Markov chain and draw attention to separate considerations for the different formulations where necessary.

### B. Fundamental properties of irreducible Markov chains

Irreducible Markov chains have a stationary distribution ** π** that satisfies the global balance equations

**T**=

*π***and**

*π***K**=

*π***0**. The Markovian kernel

^{117}[

**I**−

**T**(

*τ*)] and the transition rate matrix

**K**, which we will collectively denote by

**A**, are singular, but there exist a class of generalized inverses

^{118–122}

**G**that satisfy

**AGA**=

**A**. Such matrices are

*fundamental*in the sense that key global dynamical properties can be expressed straightforwardly in terms of

**G**and the stationary distribution

**.**

*π*^{1,123}In the following discussion, we use

**T**(

*τ*) to refer to a discrete-time stochastic matrix parameterized at lag time

*τ*or the linearized transition matrix of a CTMC with uniform mean waiting times

*τ*.

Important examples of generalized inverses are Kemeny and Snell’s fundamental matrix,^{1,109}

where $1S$ is the $|S|$-dimensional column vector with elements equal to unity, and Meyer’s group inverse,^{124}

with elements^{125}

The fundamental matrix is a generalized inverse^{118} that also satisfies **AZ** = **ZA**,^{117,121,126,127} and the group inverse is the unique generalized inverse that additionally satisfies **A**^{#}**AA**^{#} = **A**^{#}.^{117,120} The group inverse is also the unique solution to the Bellman-type equations^{128}

with constraints **A**^{#}** π** =

**0**and $\u2211\gamma A\gamma j#=0\u2200j\u2208S$. The diagonal elements of

**A**

^{#}are strictly positive.

In practice, it is sometimes more convenient to compute and work with the group inverse rather than the fundamental matrix, since **A**^{#} can be obtained without knowledge of the stationary distribution.^{124} Furthermore, the elements of the group inverse have a probabilistic interpretation. Specifically, $Aij#$ represents the expected deviation in the number of visits to the *i*th node for the relaxation process to the stationary distribution initialized from the *j*th node, compared to the average number of visits when starting at a node chosen randomly in proportion to the stationary distribution.^{129} Formally, if $Nij(n)$ denotes the expected number of times that the *i*th node is visited on a trajectory of *n* steps initialized from the *j*th node, then^{124}

A key macroscopic quantity characterizing a particular transition is the mean first passage time^{113,124,130–140} (MFPT), defined as the expected first hitting time for trajectories to reach an absorbing (target) state given an initial condition.^{2} The matrix of MFPTs for all pairwise internode transitions can be expressed directly in terms of the fundamental matrix or the group inverse,^{124,134}

where **E** is the $|S|\xd7|S|$-dimensional matrix with all elements equal to unity, **G**_{d} is the matrix with diagonal elements of a generalized inverse **G** and off-diagonal elements equal to zero, and **D** = diag(** π**). The matrix whose elements $Vij$ are the variances of the FPT distribution for all pairwise internode transitions is given by $V=T(2)\u2212T\u25e6T$, where ◦ denotes the element-wise product and

^{120,122}

with

More complicated expressions exist for matrices with elements corresponding to the higher moments of the FPT distributions for internode transitions.^{121}

Another important property that characterizes the global dynamics of an irreducible Markov chain is the average mixing time,^{1} which can be thought of as the expected time for an initial occupation probability distribution to decay to the stationary distribution. More formally, the average mixing time is the expected time for trajectories to first hit a target node that is sampled in proportion to the stationary distribution.^{141} This quantity is independent of the initial condition^{142–144} and is known as the Kemeny constant,^{1,123} given by^{117}

Higher moments of the mixing time distribution depend on the initial state but can nonetheless be derived from a generalized inverse. Let the variance of the mixing time distribution when the relaxation to equilibrium is initialized from the *j*th node be denoted by *ν*_{j}. These variances are given by

where $L=DT=Td\u22121T$ is the mixing matrix^{142} and ** α** is the vector with elements $\alpha i=\u2211\gamma Ti\gamma \pi \gamma $. The Kemeny constant effectively quantifies the extent to which all of the nodes in the state space of a Markov chain are mutually reachable.

^{141}When the dynamics are diffusive (i.e., the elements of $T$ are comparable), the average mixing time is relatively small, and the variances of the mixing time distributions are likewise small and fairly uniform. In contrast, when there are metastable states, relaxation to the stationary distribution is a slow process, and the mixing time distributions are fat-tailed, with large means and variances, owing to the existence of rare event transitions.

### C. Eigendecomposition analysis

It is sometimes more convenient, for reasons of numerical stability or efficiency, to compute properties of a Markov chain via an eigendecomposition operation instead of performing a matrix inversion operation required to compute the group inverse (Sec. II B). Similar to the utility of generalized inverses, many dynamical quantities can be expressed in terms of the eigenspectrum of a Markovian network. For instance, the average mixing time [Eq. (15)] is given directly by the eigenvalues of an irreducible and reversible Markov chain. In the discrete-time case,^{108}

The MFPTs for internode transitions can also be written in terms of the eigenspectrum of a reversible Markov chain.^{137} For a CTMC, the *i* ← *j* MFPT is given by^{108,145}

The overall $A\u2190B$ MFPT, for a transition to a specified absorbing macrostate $A$ from a set of initial nodes $B$, is a sum of MFPTs for transitions from each of the nodes comprising $B$, where each term is weighted by the initial occupation probability of the starting node,

Although the MFPT is the usual dynamical observable,^{113,124,130–140} the complete FPT distribution contains valuable dynamical information that is not captured in the average. For instance, the width of the FPT distribution characterizes the heterogeneity of the transition path ensemble. When the transition probabilities or rates of the Markov chain depend on an external parameter, such as the temperature in physical systems,^{87} the form of the FPT distribution may change dramatically and with a relatively well-defined threshold.^{146} The existence of multiple peaks in the FPT distribution suggests the presence of competing transition mechanisms that each make a non-negligible contribution to the MFPT.^{100} To justify “lumping” a subset of nodes of a Markov chain and thus obtain a reduced Markovian representation of the dynamics,^{1,147–157} the FPT distribution for escape from the community of nodes ought to be an exponential distribution, which has the memoryless property.^{51,110}

In discrete-time, the FPT distribution can be computed from eigendecomposition of the substochastic transition probability matrix $TQQ$, comprising only transient nodes of the set $Q\u2261Ac$. The equivalent formulation in continuous-time uses the corresponding rate matrix $KQQ$, for which the diagonal elements include contributions from transitions to absorbing nodes of the set $A$ so that the columns of $KQQ$ do not necessarily sum to zero. $KQQ$ does not have a zero eigenvalue and associated stationary distribution, and instead, all eigenvalues have a negative real component. Since the dynamics within the transient set of nodes $Q$ are unchanged prior to absorption, we can still write a linear master equation [cf. Eq. (2)] for the dynamics within this subnetwork. There is a probability flux out of the macrostate $Q$, which defines the probability distribution *p*(*t*_{FPT}) for the $A\u2190Q$ first passage times *t*_{FPT}. Noting that the propagation of an initial probability distribution **p**(0) according to the linear master equation can be written in terms of the eigenspectrum of the rate matrix [cf. Eq. (3)],^{8}

where ⊗ denotes the outer product, we obtain the properly normalized FPT distribution^{52}

Here, $\gamma kQ$ is the *k*th eigenvalue of $KQQ$, with $\varphi Q(k)$ and $\psi Q(k)$ being the corresponding left and right eigenvectors, respectively, and $pQ(0)$ is the vector containing the initial occupation probabilities for the transient nodes. In writing Eq. (23) from Eq. (22), we have assumed orthonormality of the eigenvectors [Eq. (5)]. The eigendecomposition of the FPT distribution is particularly insightful, since it separates the distribution into contributions from individual dynamical eigenmodes. The *k*th orthonormal eigenmode makes a dominant contribution to the FPT distribution when the product $1Q\u22a4[\psi Q(k)\u2297\varphi Q(k)]pQ(0)$ is close to unity. The *n*th moment of the FPT distribution [Eq. (22)] is given by^{52,158}

### D. Numerical considerations for linear algebra methods

Many properties of an irreducible Markov chain can be directly expressed in terms of a fundamental matrix, obtained via a matrix inversion operation (Sec. II B). In Sec. II C, we noted that several key dynamical quantities characterizing the dynamics of a Markov chain can also be computed straightforwardly from its eigenspectrum, including MFPTs [Eq. (19)], the FPT distribution [Eq. (22)], the time-dependent occupation probability distribution [Eq. (21)], and the average mixing time [Eq. (18)]. Typical algorithms for eigendecomposition, and for matrix inversion or diagonalization, have time complexity $O(|S|3)$, which is comparable to the time complexity of the state reduction algorithms described in Secs. III and IV. However, dense linear algebra methods are afflicted by the propagation of roundoff error in the finite precision arithmetic when applied to nearly reducible Markov chains.^{89,91,159} The extent to which a metastable Markov chain is ill-conditioned is essentially independent of its dimensionality.^{144} Hence, for Markov chains featuring a rare event, numerical error can prohibit the use of conventional dense methods, such as lower–upper (LU) decomposition to solve linear systems of equations^{35} or the QR algorithm to perform an eigendecomposition,^{160} even when the network comprises just a few nodes.^{52,92}

For reversible Markov chains,^{2} marginal improvements in the numerical stability of linear algebra methods can be gained by employing the symmetrized transition rate matrix (in the continuous-time case), with elements $K\u0303ij=(KijKji)1/2$, or the symmetrized transition probability matrix (in the discrete-time case), with elements $T\u0303ij(\tau )=(Tij(\tau )Tji(\tau ))1/2$.^{52} These symmetrized matrices have the same eigenvalues as their unsymmetrized counterparts, and the elements of the right and left eigenvectors for the two formulations are related via $\psi j(k)=\pi j1/2\psi \u0303j(k)$ and $\varphi j(k)=\pi j\u22121/2\varphi \u0303j(k)$, respectively.^{8}

An alternative framework to solve linear systems of equations or to determine the eigenspectrum of a Markov chain is provided by iterative methods,^{161–164} which can be optimized to treat Markov chains featuring a spectral gap via preconditioning.^{165} Unlike direct solution methods, iterative methods preserve the sparsity of a stochastic matrix and are therefore well-suited to treating high-dimensional structured systems,^{166} such as those that arise in population dynamics models.^{37} Below, we briefly outline the basis of these iterative methods and preconditioning strategies to aid their convergence. A detailed review of both direct and iterative solution methods in the context of Markov chains can be found in Ref. 76.

As an illustrative example, we consider application of the simplest iterative solution method, namely, the power method, to compute the stationary distribution of a Markov chain, i.e., to solve **K π** =

**0**(in continuous-time) or

**T**(

*τ*)

**=**

*π***(in discrete-time). In the power method, the solution vector**

*π***x**is repeatedly updated according to

The convergence rate for Eq. (25) is directly related to *λ*_{2}.^{76} Thus, when the Markov chain is nearly reducible, the convergence **x** → ** π** is slow. To remedy this problem, we may introduce a preconditioning matrix

**M**that is readily invertible, ideally such that

**M**≈

**K**and the inverse

**M**

^{−1}yields a matrix (

**I**−

**M**

^{−1}

**K**) that has no subdominant eigenvalues close to the unique unit eigenvalue. The rate of convergence of the preconditioned version of Eq. (25) is consequently fast. In general, preconditioning refers to any method to modify a system of linear equations by premultiplication. That is,

**Ax**=

**b**⇒

**M**

^{−1}

**Ax**=

**M**

^{−1}

**b**, where

**Mc**=

**y**can be solved efficiently with any

**y**, and this transformation favorably alters the distribution of eigenvalues, thereby aiding the convergence of iterative methods.

The success of iterative solution methods applied to ill-conditioned Markov chains exhibiting rare event dynamics is clearly strongly dependent on the choice of **M**. Some iterative methods simplify the additional input required from the user by implicitly incorporating a preconditioning matrix. Successive over-relaxation^{167} (SOR) to solve the linear problem **Ax** = **b**, which can be thought of as a generalization of Gauss–Siedel iteration, splits the relevant matrix **A** as

where **D** is a diagonal matrix, **L** and **U** are strictly lower- and upper-triangular matrices, respectively, and *ω* > 0 is the relaxation factor. A solution vector **x** is then iterated according to

The matrix *ω*^{−1}(**D** − *ω***L**) effectively acts as a preconditioner in SOR.^{167} Hence, the problem of selecting an appropriate preconditioning matrix is simplified to the problem of determining a suitable value for the scalar parameter *ω*. In practice, there is very little guidance on an appropriate choice of relaxation factor for solving a given system of linear equations if the matrix **A** is unstructured, and the possible gains in efficiency may be quite limited.^{92}

Krylov subspace methods are a class of sparse iterative procedures that can be used to perform an eigendecomposition of a matrix **A**.^{159,160,168,169} The idea is to generate a sequence of *m* vectors, {**v**_{k}} ∀ *k* = 1, …, *m*, which form an orthonormal basis of the *m*th order *Krylov subspace* spanned by the set of vectors (**v**_{1}, **Ax**, …, **A**^{m−1}**v**_{1}), where **v**_{1} is an arbitrary normalized vector.^{170} The Arnoldi algorithm^{171} provides an iterative method to produce a sequence of orthonormal vectors **V**_{m} = (**v**_{1}, …, **v**_{m}) spanning the Krylov subspace.^{172,173} The procedure yields a *m* × *m*-dimensional upper Hessenberg matrix **H**_{m} that satisfies $Hm=Vm\u22a4AVm$. For a sufficiently large number of iterations *m*, the dominant eigenvalues of **H** converge to those of **A**. Additionally, if $\phi k(m)$ denotes the *k*th dominant eigenvector of **H**_{m}, then the so-called Ritz vector $Vm\phi k(m)$ approximates the *k*th eigenvector of **A**.^{174} The generalized minimal residual (GMRES) algorithm^{175} adapts this concept to solve the linear problem **Ax** = **b**.

For well-conditioned matrices **A**, the eigenvalues of **H**_{m} converge rapidly to those of **A** so that the former matrix is comparatively low-dimensional and its eigenpairs can be found efficiently by direct solution methods. For more computationally challenging problems, the required number of iterations *m* is large so that the storage requirements and operation counts of the procedure become prohibitively large. Typical implementations of the Arnoldi and related algorithms incorporate implicit restarting to improve efficiency and avoid “wasteful” memory usage.^{176,177} However, numerical problems remain pervasive for pathological systems, such as nearly reducible Markov chains.^{178} In this ill-conditioned regime, the Arnoldi and GMRES methods can be made more effective by preconditioning in the usual way,^{165} i.e., replacing the matrix **A** with **M**^{−1}**A**, where **M** is similar to **A** and **M**^{−1}**y** can be evaluated efficiently and reliably for any vector **y**.

Similar strategies can also be used to solve systems of ODEs. A numerically stable approach to solving the master equation for the time-dependent occupation probability distribution vector **p**(*t*) [Eq. (2)] uses a stiff ODE^{179,180} integrator,^{181–185} employing GMRES iterations^{175} preconditioned with an appropriate matrix to solve the relevant system of linear equations.^{186} Alternatively, the product of a matrix exponential and a vector can be computed using Krylov subspace methods,^{116,187–189} and this approach is amenable to preconditioning.^{190} Hence, it is possible to compute the vector **p**(*t*) at a given time *t* [cf. Eq. (3)] even when the transition rate matrix is high-dimensional by exploiting sparsity.

In principle, preconditioned iterative methods provide stable algorithms to analyze Markov chains that are scalable in terms of both operation counts and memory usage. However, this approach is of limited use in practice, owing to nontrivial and system-specific considerations that arise. In particular, the rate of convergence of iterative procedures is strongly dependent on the choice of preconditioning matrix and on the initial guess for the solution vector. Without appropriate preconditioning, iterative solvers may converge unfeasibly slowly for linear problems involving nearly reducible Markov chains.^{92} For arbitrary Markov chains, there is limited *a priori* information on appropriate input to sparse procedures, and strategies are not generalizable. Hence, the additional parameters must usually be explored empirically in a trial-and-error fashion.

### E. Transition path theory

The theory presented in Secs. II B and II C was concerned with global dynamical properties of Markov chains. The role of individual nodes in determining macroscopic behavior is elucidated by transition path theory (TPT).^{94–98} TPT provides a theoretical framework to analyze the transition path ensemble^{46,47,100} (TPE) of reactive paths,^{99,191} which proceed directly from an initial macrostate $B$ to an absorbing macrostate $A$, without revisiting $B$. The central object of TPT is the vector of committor probabilities^{192–195} for nodes. We denote by $qj+$ the forward $A\u2190B$ committor probability for the *j*th node. $qj+$ is the probability that a trajectory occupying node *j* will visit the absorbing macrostate $A$ before the initial macrostate $B$. It is in this sense that the committor probability represents an idealized reaction coordinate quantifying the progress of a $A\u2190B$ transition, onto which state variables can be projected.^{111}

Formally, the forward committor probability for the *j*th node is defined as^{97}

where $Pj$ denotes the probability considering all trajectories *ξ*(*t*) initialized from node *j* and the random variable $hB=inf{t>0:\xi (t)\u2208B}$ is the first hitting time for the macrostate $B$.^{2} In the context of Markovian networks, a trajectory *ξ*(*t*) is a sequence of visited nodes and associated times. For reversible Markov chains, the backward committor probabilities, for the $B\u2190A$ direction, which are defined analogously to Eq. (28) but for the time-reversed dynamics,^{97} are related to the forward committor probabilities simply by $qj\u2212=1\u2212qj+$.^{111} The committor probabilities can be written as the solution of a series of linear equations obtained by a first-step analysis,^{4}

where we have noted that, in this definition, $qa\u2208A+=1$ and $qb\u2208B+=0$. See Ref. 97 for a detailed proof. Equation (29) can be solved using a variety of linear solution methods (Sec. II D), such as Gauss–Siedel iteration,^{196,197} successive over-relaxation, and^{91,169} GMRES,^{175} or robustly by state reduction (Sec. V A 4).^{112}

The committor probabilities can also be computed from the second dominant (i.e., first nontrivial) right eigenvector of the modified stochastic matrix $T\u0304$ for which nodes of the sets $B$ and $A$ are subsumed into single nodes *b* and *a*, respectively, and where both of these supernodes are considered to be absorbing.^{111} That is, $T\u0304aj=0\u2200j\u2260a$ and $T\u0304bj=0\u2200j\u2260b$. This transition matrix has two unit eigenvalues, one of which is associated with the unit vector and the second is associated with an eigenvector that we shall denote by $\psi \u0304(2)$. The committor probability for the *j*th node is then^{111}

Obtaining the committor probabilities using Eq. (30) provides a scalable approach, since the Lanczos algorithm^{173} can be used to compute $\psi \u0304(2)$ efficiently for sparse systems.^{166} Moreover, this formulation is readily extended to the first hitting problem [cf. Eq. (28)] for an arbitrary number of target states.^{111}

Several quantities characterizing the $A\u2190B$ TPE at a nodewise level of detail can be obtained from the committor probabilities.^{94–98} One such dynamical property of interest is the probability distribution of reactive trajectories, $mjR$, defined as the probability that the *j*th node is occupied at equilibrium by a trajectory that is reactive.^{97} Intuitively, this quantity is a product of the probabilities that the trajectory last visited $B$ before $A$ and will next visit $A$ before $B$ and the stationary probability of the *j*th node. Therefore, from the definition of the committor probabilities [Eq. (28)], for reversible Markov chains, this probability is given by

The probability that any given trajectory at equilibrium is reactive is equal to the normalization factor $ZAB=\u2211j\u2208SmjR$. The normalized distribution of reactive trajectories, $mj=mjR/ZAB$, is the probability to observe a trajectory at node *j*, conditional on the trajectory being reactive.

In the continuous-time case,^{97} the net probability flux of reactive trajectories along the *i* ← *j* edge of the network at equilibrium is

In discrete-time, *T*_{ij} replaces *K*_{ij} in Eq. (32).^{111} The $A\u2190B$ steady state rate constant,^{198}^{,}$kABSS$, can be calculated by defining an isocommittor cut^{191} Σ_{α}, which partitions the network into two sets $A*\u2283A$ and $B*\u2283B$ such that $qa\u2208A*+>\alpha >qb\u2208B*+$, and summing over the net probability fluxes associated with the edges of the cut,^{199,200}

Here, we have denoted the steady state reactive $A\u2190B$ flux^{43} as $JAB$ and $\pi B=\u2211b\u2208B\pi b$. Of particular interest is the isocommittor cut Σ_{α=0.5}, which defines the edges that constitute the transition state ensemble (TSE). The TSE essentially characterizes the boundary between the effective basins of attraction associated with the $A$ and $B$ states.^{201–205} Alternative approaches to characterize the TSE are based on Bayesian path statistics^{206} and the emission-absorption cut defined in terms of the eigenvectors.^{64}

Equation (33) shows that the steady state reactive flux, $JAB$, can be exactly decomposed into additive contributions from individual edges that together comprise an $A$–$B$ cut in the network. Hence, the local states that constitute the dominant channels for the productive pathways can be readily identified, and representative paths associated with each of the bottleneck edges can be determined by an augmenting paths algorithm.^{97} Alternatively, $JAB$ can be exactly decomposed into a finite set of contributions from transition flux-paths, which proceed productively through sequential isocommittor cuts in the network and are therefore loopless.^{199} The dominant flux-paths can be determined efficiently by a *k* shortest paths algorithm, and this procedure allows for a pathwise analysis to quantify the relative importance of competing mechanisms for the transition process.^{115}

The expressions for the probability distribution of reactive trajectories for nodes [Eq. (31)], and the net reactive fluxes for edges [Eq. (32)], correspond to the *equilibrium* TPE, where the system has relaxed to a steady state. Formally, this analysis involves considering a trajectory of infinite length in time, which continually transitions between the $A$ and $B$ states.^{97} The dynamical observable for the equilibrium path ensemble is $kABSS$ [Eq. (33)]. In Ref. 99, a theory of transition paths was developed for the nonequilibrium path ensemble, i.e., considering the first hitting problem where trajectories are absorbed at $A$, for which the dynamical observable is the MFPT, $TAB$. In particular, expressions were derived for the expected numbers of times that nodes *j* are visited along reactive paths, denoted $\theta \u0303j$, and for a nonequilibrium analog of the net reactive fluxes along *i* ← *j* edges [cf. Eq. (32)], denoted $J\u0303ij$, which depend on $\theta \u0303j$ and $qj+\u2200j\u2208S$. In Ref. 112, expressions for the probabilities $rj+$ that nodes *j* are visited along reactive paths were derived for both the nonequilibrium and equilibrium cases. Together with the committor probabilities, the reactive visitation probabilities $rj+$ provide a rigorous and readily interpretable metric to identify the local states that have a dominant influence on the productive $A\u2190B$ dynamics.^{100}

## III. GRAPH TRANSFORMATION METHOD TO CALCULATE MFPTs

We now proceed to describe numerically stable *state reduction* methods to compute the properties of a Markov chain, including moments of the first passage time distribution [Eq. (24)], the average mixing time [Eq. (15)], node properties characterizing transition paths (Sec. II E), and other quantities introduced in Sec. II. The state reduction approach is exact and requires no additional knowledge of the Markov chain besides the transition probability or rate matrix. Hence, state reduction procedures are of greater general utility than sparse linear algebra algorithms in application to nearly reducible Markov chains, since convergence of the latter methods in the metastable regime is strongly dependent on the careful choice, and moreover existence, of a suitable auxiliary matrix in the preconditioning scheme (Sec. II D). In this section, we introduce the concept of a renormalized Markov chain, which is central to the state reduction methodology, and prove that renormalization can be used to robustly compute the MFPT for a transition. In Sec. IV, we describe further closely related state reduction methods for the exact analysis of Markovian network dynamics.

### A. Graph transformation algorithm

The graph transformation^{88–92} (GT) algorithm is a procedure for the iterative elimination of nodes in an arbitrary Markov chain while preserving the $A\u2190B$ mean first passage time (MFPT) for the transition from an initial node ${b}\u2261B$ to an absorbing macrostate $A$.^{93} We denote the set of intervening nodes by $I\u2261(A\u222aB)c$. The GT method uses renormalization of transition probabilities and of mean waiting times (in the continuous-time case) or lag times (in discrete-time) for transitions from nodes to preserve individual path probabilities and the average time associated with the ensemble of paths to the absorbing state.

At each step of the nodewise iterative GT algorithm, the *n*th node, $n\u2208I$, is eliminated from the network, and the probabilities for internode *i* ← *j* transitions on the remaining network are renormalized according to

The waiting or lag times of nodes *j*, which we collectively denote by *τ*_{j}, are likewise updated according to

The transition probabilities for pairs of nodes *i* and *j*, where one or both of the nodes are not directly connected to *n*, are unaffected by the renormalization [Eq. (34)]. If *i* and *j* were not directly connected in the untransformed network, but both nodes were directly connected to *n*, i.e., the sequence of direct transitions *i* ← *n* ← *j* existed prior to renormalization, then a new *i* ← *j* transition connects the pair of nodes in the renormalized network. Hence, the renormalized network at the *n*th iteration of the GT algorithm is less sparse than the network at the (*n* − 1)th iteration, and self-loop (*j* ← *j*) transitions, if not initially present, are introduced into the successive Markov chains in the course of the algorithm. The mean waiting or lag time for a transition from the *j*th node, *τ*_{j}, increases upon elimination of the *n*th node if the *n* ← *j* transition exists and remains unchanged by the renormalization otherwise [Eq. (35)]. Hence, if the *τ*_{j} are initially uniform, as is the case for the lag times of a discrete-time Markov chain (DTMC), then they become nonuniform in the renormalized network. The effect of the renormalization procedure to eliminate a single node [Eqs. (34) and (35)] is illustrated in Fig. 1.

Equation (34) conserves the probability flow out of all noneliminated nodes, i.e., ∑_{γ}*T*_{γj}′ = 1 ∀ *j* ≠ *n*.^{91} The renormalized transition probabilities *T*_{ij}′ subsume all *i* ← *j* transitions that occurred indirectly, i.e., via intervening *n*, with an arbitrary number of self-loop transitions of *n* on the original network.^{83} That is, the transition probabilities for the renormalized network not only account for direct *i* ← *j* transitions but also “round-trip” transitions *i* ← *n* ← *j*, *i* ← *n* ← *n* ← *j*, etc. The renormalized mean waiting or lag times *τ*_{j}′ [Eq. (35)] have a similar probabilistic interpretation. In particular, *τ*_{j}′ represents the expected time for a transition from the *j*th node, accounting for the average contribution from deviations via the eliminated node *n*.^{144} That is, *τ*_{j}′ includes the average time attributable to *n* ←⋯← *n* ← *j* transitions before proceeding to escape from {*j*} ∪ {*n*}.

Equations (34) and (35) exactly preserve the MFPTs from any given transient node in the network to the set of absorbing nodes $A$. The renormalization of transition probabilities [Eq. (34)] also preserves the probabilities associated with individual paths (in a renormalized representation) from transient to absorbing nodes. Hence, the elements *B*_{ab} of the absorption probability matrix^{1} **B**, where $a\u2208A$ and $b\u2208B$, are given by the renormalized transition probabilities *T*_{ab}′ of the network where all nodes of the set $(A\u222a{b})c$ have been eliminated using Eq. (34). A formal proof of these statements is the subject of Sec. III B. If the original Markov chain is irreducible, then the renormalized Markov chain is also irreducible and an expression can be derived for the stationary distribution ** π** of the transformed network.

^{73}This theorem forms the basis of state reduction methods to compute

**(Secs. IV A–IV C).**

*π*The GT method for the computation of $A\u2190B$ MFPTs is significantly more numerically stable than conventional dense linear algebra methods.^{91,92} GT can retain numerical precision even when a node *n* to be eliminated is associated with a dominant probability for the self-loop transition, *T*_{nn} → 1, because in this case, the equivalence 1 − *T*_{nn} ≡ ∑_{γ≠n}*T*_{γn} can be exploited to avoid performing the subtraction operation.^{207,208} Using this numerical trick minimizes error in the finite precision arithmetic that would otherwise be propagated^{74–79} and is liable to become prohibitively severe for nearly reducible Markov chains.^{72,80}

The GT algorithm has a time complexity that is dependent on the average degree of nodes and on the heterogeneity of the node degree distribution.^{89} Empirically, the nodewise iterative renormalization procedure has been observed to scale approximately as $O(|S|3)$ for sparse random networks and for some other classes of structured network.^{83} Proofs for the $O(|S|3)$ asymptotic time complexity of some state reduction algorithms have been formally derived.^{128,208} Since a DTMC is less sparse than the corresponding CTMC, there is no advantage to converting from a continuous-to a discrete-time formulation [Eq. (3)], as the state reduction computation will then be less efficient. The reverse operation [formally given by Eq. (4)] is highly nontrivial to perform in practice,^{144} and in any case, an equivalent continuous-time representation of a DTMC does not necessarily exist.^{10,209} The performance considerations for state reduction methods are complementary to those for matrix inversion and diagonalization methods (Sec. II D), which have time complexity $O(|S|3)$^{210} but frequently fail when the Markov chain features a spectral gap.^{80} GT is therefore the method of choice to compute MFPTs between two subsets of nodes in a Markov chain featuring a rare event, and likewise, the state reduction procedures outlined in Sec. IV allow for robust computation of further dynamical quantities that are otherwise challenging to obtain for nearly reducible Markov chains. There are numerous tricks for performance optimization of state reduction methods, for example, prioritizing the elimination of nodes with a low degree,^{91} switching from sparse to dense storage when the number of remaining nodes in the network falls below a threshold,^{89} and eliminating multiple nodes at once via a matrix inversion operation (Sec. IV A).^{93,198}

### B. Graph transformation proof

We wish to prove that the GT algorithm correctly preserves the MFPT from any given node in the network to a set of absorbing nodes $A$. Recall that the overall $A\u2190B$ MFPT, $TAB$, is an average of MFPTs $TAb$ for transitions from a node $b\u2208B$ of the initial set to any node of the absorbing set $A$ for a specified initial occupation probability distribution [Eq. (20)]. The following argument demonstrates that the individual MFPTs $TAb$, from which $TAB$ is derived, can be computed exactly using the renormalized transition probabilities [Eq. (34)] and waiting times [Eq. (35)] that are obtained after eliminating all nodes of the set $(A\u222a{b})c$, with $A$ and *b* chosen arbitrarily.

First, we derive the renormalized mean waiting or lag times [Eq. (35)] more explicitly by introducing the reweighted transition probabilities $T\u0303ij=Tije\zeta \tau j$. Consider the *n*-step discrete path *ξ* specified as a sequence of visited nodes, *ξ* = {*i*_{n} ← *i*_{n−1} ←⋯← *i*_{1}}. The probability of this path from *i*_{1} is $W\xi =\u220f(i\u2190j)\u2208\xi Tij$, where the product includes all *i* ← *j* transitions in the path *ξ*, with the correct multiplicities. The product of reweighted transition probabilities along the path, $W\u0303\xi $, is defined similarly. The reweighted transition probabilities satisfy

Hence, this derivative yields the product of the path probability and the average waiting time associated with the path. For an $a\u2208A\u2190b\u2208B$ path *ξ*^{(a,b)}, this quantity is the contribution of the path to the overall $A\u2190b$ MFPT. Therefore, to preserve the $A\u2190b$ MFPT by renormalization of the waiting times for nodes *j* that are directly connected to the *n*th (eliminated) node, each renormalized waiting time *τ*_{j}′ must be the sum of derivatives [Eq. (36)] of each of the reweighted transition probabilities for transitions from *j* to the set of nodes directly connected to *j* or *n*, excluding *n*. We will denote this set of adjacent nodes by Γ. If we use the renormalization relation for the reweighted transition probabilities [Eq. (34)] in this expression, then we recover Eq. (35),^{91}

While Eq. (34) preserves the probabilities associated with *individual* *ξ*^{(a,q)} first passage paths (in their resulting reduced representation) from any transient node $q\u2208Q$ to any absorbing node $a\u2208A$, Eq. (35) does *not* preserve the expected first passage times for individual paths to the absorbing state but instead preserves the $A\u2190q$ MFPTs for all transient nodes $q\u2208Q$. That is, Eq. (35) preserves the *path ensemble average* time for the transition from a transient node to the *set* of absorbing nodes. To understand this result, note that the formula for the renormalized waiting time *τ*_{j}′ [Eq. (35)] is an average over all *γ* ∈ Γ ← *j* transitions [cf. Eq. (37)] and is associated with each of the renormalized probabilities for these transitions. We now proceed to provide a formal proof that Eq. (35) not only preserves the probability and mean time for the local *γ* ∈ Γ ← *j* transitions but also preserves the $A\u2190q$ MFPTs for all transient nodes $q\u2208Q$ that remain noneliminated.

The overall probability associated with a *a* ← *q* first passage path that proceeds via at least one transition between nodes of the set Γ, on a network where the *n*th node has been eliminated, can be factorized into probabilities for segments of the path divided as follows: the portion of the path from the starting transient node $q\u2208Q$ to a node *j* of the set Γ (denoted *ξ*^{(j,q)}), the portion of the path from a node *γ* of the set Γ to the absorbing node $a\u2208A$ (denoted *ξ*^{(a,γ)}), and transitions within nodes of the set Γ. The probability for any *γ* ∈ Γ ← *j* transition on the transformed network is simply *T*_{γj}′, and the path segments *ξ*^{(j,q)} and *ξ*^{(a,γ)} are associated with products of transition probabilities $W\xi (j,q)$ and $W\xi (a,\gamma )$, respectively, which do not involve renormalized transition probabilities. Using the reweighted transition probabilities and their derivatives [Eq. (36)], we can hence write the contribution from the family of paths starting from node *q* and ending in node *a*, with a single transition *γ* ← *j* between nodes of the set Γ, to the total $A\u2190B$ MFPT as follows:^{91}

In this expression, there are sums over all path segments *ξ*^{(j,q)} initialized from a particular transient node $q\u2208Q$ and ending at a node *j* ∈ Γ and over all path segments *ξ*^{(a,γ)} beginning at a node *γ* ∈ Γ and terminating at a particular absorbing node $a\u2208A$. Only the second term in Eq. (38) is affected by the renormalization, and the derivative in this term is the same as the local term in Eq. (37). The contribution of this family of paths (for a particular absorbing node *a*) to the overall $A\u2190q$ MFPT is *not* preserved by the graph transformation because each *γ* ∈ Γ ← *j* step is associated with the same averaged *τ*_{j}′. To obtain the result for all $A\u2190q$ first passage paths, we sum over absorbing nodes $a\u2208A$.^{91,144} The contribution of this set of paths to the $A\u2190q$ MFPT *is* conserved, $\u2211a\u2208A\u2211\xi (a,\gamma )W\xi (a,\gamma )=1\u2200\gamma \u2208\Gamma $. Hence, the renormalization equations [Eqs. (34) and (35)], applied any number of times to the Markov chain, preserve the MFPTs $TAq$ for transitions from all transient nodes $q\u2208Q$ that remain noneliminated.

The vector of MFPTs for transitions from all transient nodes, of the set $Q\u2261Ac$, can be obtained by solving the following system of linear equations obtained from a first-step analysis:^{4}

We wish to find the MFPT for the transition from a particular initial node *b*. When all nodes of the set $(A\u222a{b})c$ have been eliminated from the Markov chain according to Eqs. (34) and (35), the only remaining edges in the network represent transitions from the *b*th node to nodes of the absorbing macrostate $A$, and the *b* ← *b* self-loop. The first-step relation for the MFPTs [Eq. (39)] therefore reduces to a direct solution for the $A\u2190b$ MFPT,

To compute the overall $A\u2190B$ MFPT [Eq. (20)], for an initial macrostate $B\u2286Q$, the MFPTs for transitions from each node of the set $B$ to the absorbing state $A$ are required. When the initial set of nodes $B$ is small, this calculation can be achieved easily in practice as follows. Each individual MFPT, $TAb$, for nodes $b\u2208B$, is determined via Eq. (40) by eliminating all nodes of the set $B\b$, after first eliminating all nodes of the set $I\u2261(A\u222aB)c$ and storing the resulting renormalized network. Then, only the former computation needs to be repeated for each initial node $b\u2208B$ to obtain $TAB$. When the absorbing state $A$ is small, the MFPT for the reverse ($B\u2190A$) direction can be computed similarly with comparatively little additional computational effort by eliminating all nodes of the set $A\a$ from the renormalized network resulting from elimination of all nodes of the set $I$, for each node $a\u2208A$ in turn.

## IV. FURTHER STATE REDUCTION ALGORITHMS

The renormalization of transition probabilities [Eq. (34)] forms the basis for the family of state reduction methods to robustly compute the properties of a Markov chain. In this section, we extend the theory of renormalization to eliminate blocks of nodes simultaneously and describe numerically stable state reduction procedures to compute many of the dynamical quantities introduced in Sec. II, including uncoupling–coupling methods to determine the stationary distribution (Secs. IV A and IV B) and the REFUND algorithm to compute the group inverse (Sec. IV D).

### A. Exact uncoupling–coupling via stochastic complements

In Sec. III, we argued that renormalization of the transition probabilities upon eliminating a single node *n* in the network [Eq. (34)] preserves the probabilities of individual paths, in their renormalized representation, on the resulting reduced Markov chain. This theory can be extended to allow for elimination of a block of nodes, $N$, in a single step using a matrix inversion operation. The state space of the Markov chain is partitioned as $S\u2261N\u222aZ$ so that we write the transition probability matrix in block form as

where $TNZ$ contains the $n\u2208N\u2190z\u2208Z$ transition probabilities and the other blocks are defined similarly. Hence, $TNN$ is the substochastic matrix for transitions within the subset of nodes to be eliminated, $N$, and $TZZ$ likewise corresponds to the subnetwork comprising the nodes of the macrostate to be retained, $Z\u2261Nc$. The renormalized Markov chain consisting only of the nodes within the set $Z$ is associated with the transition probability matrix^{208}

Equation (42) is referred to as a *stochastic complement* by Meyer.^{73} The renormalized transition probabilities account for the average behavior of paths that visit $N$, including transitions within $N$, analogous to the case of eliminating a single node [Eq. (34)].^{211} Therefore, if there was no direct *i* ← *j* transition in the original network, but there were $i\u2190N\u2190j$ paths, then a direct *i* ← *j* transition is present in the renormalized network for $i,j\u2208Z$. Similarly, it can be shown that the $|Z|$-dimensional vector of renormalized mean waiting or lag times for the remaining network is given by^{93}

Equations (42) and (43) constitute a block formulation of the GT algorithm (Sec. III) that remains numerically stable when each of the blocks of nodes to be eliminated, $N$, constitutes a metastable state so that the $TNN$ matrices do not feature a spectral gap, and hence, the Markovian kernel $(I\u2212TNN)$ can be safely inverted by dense linear algebra methods. Simultaneous elimination of blocks of nodes in a Markov chain can also lead to improved time complexity of the GT algorithm.^{198}

It can be shown that if a Markov chain is irreducible, then any stochastic complement [Eq. (42)] also has a well-defined stationary distribution,^{1} and hence, so do any stochastic complements of that stochastic complement, and so on. Moreover, if a transition probability matrix is partitioned into *N* communities, $C={1,2,\u2026,N}$, then the stochastic complements (i.e., renormalized Markov chains) corresponding to each of the communities have independent stationary distributions, from which the stationary distribution of the original Markov chain can be inferred.^{212} Specifically, the stationary distribution of the original Markov chain partitioned according to

is given by

where $\zeta Y=\u2211y\u2208Y\pi y$ is the coupling factor corresponding to the stochastic complement [Eq. (42)] comprising the nodes of the set $Y$, which has a stationary distribution vector $\pi Y\u2032$, and $\u2211Y\zeta Y=1$. The *N*-dimensional vector of coupling factors,

is the stationary distribution for the stochastic matrix **C** with elements $CXY=1X\u22a4TXY\pi Y\u2032$ for all communities $X,Y\u2208C$.^{212}^{,}**C** is referred to as the aggregation matrix, since its elements are simply the intercommunity transition probabilities when a local equilibrium^{6} is established within each of the separate macrostates. If the community structure $C$ appropriately characterizes the metastable communities of nodes, then **C** is well-conditioned, and hence, ** ζ** can be determined accurately with relatively little computational effort, since the number of communities

*N*is typically small. The coupling factors take a particularly simple form in the case of a two-level partition into sets $N$ and $Z\u2261Nc$ [Eq. (41)], namely,

^{73}

and $\zeta N=1\u2212\zeta Z$.

These observations suggest the following uncoupling–coupling algorithm to compute the stationary distribution vector. In the uncoupling phase, an irreducible Markov chain and its derived stochastic complements are repeatedly decomposed into two or more renormalized Markov chains of reduced dimensionality based on a determined partitioning $C$ at each iteration. For the set of stochastic complements resulting from the final iteration of the uncoupling phase, linear algebra methods (Sec. II) or the GTH algorithm^{130,213} (Sec. IV C) can be used to compute the independent stationary distributions for each of the renormalized Markov chains with state space $Y$, $\pi Y\u2032\u2200Y\u2208C$. Then, the aggregation matrix **C** is constructed from this information, and the coupling factors $\zeta Y$ associated with each of the stochastic complements are obtained as the stationary distribution of **C**. The vectors ${\pi Y\u2032}$ and corresponding coupling factors ${\zeta Y}$ yield the stationary distribution of the parent Markov chain, comprising the nodes in *all* communities $Y\u2208C$ [Eq. (45)]. That is, the final iteration of the uncoupling stage has been undone. Repeated coupling of the stationary distributions for the stochastic complements at each level of the hierarchical partitioning that was performed in the uncoupling phase eventually recovers the stationary distribution of the original Markov chain. This uncoupling–coupling procedure based on stochastic complementation [Eq. (42)] is illustrated in Fig. 2.

In practice, this exact uncoupling–coupling algorithm has been found to be highly effective for determining the stationary distribution of a nearly reducible Markov chain.^{214} To understand this observation, consider an iteration of the uncoupling phase, where the Markov chain **T** has *N* metastable macrostates and is partitioned into the set of *N* communities $C$, which accurately characterizes the metastable sets of nodes. For each community $Z\u2208C$, the renormalized transition matrix $TZZ\u2032$ [Eq. (42)] is then close to the (nearly stochastic) block $TZZ$ [Eq. (41)] of the parent transition matrix.^{73} Since the dynamics within the state space $Z$ are fast compared to escape from this subnetwork, neither the absorbing Markov chain $TZZ$ nor its derived stochastic complement $TZZ\u2032$ have any subdominant eigenvalues close to unity, and the renormalized Markov chain $TZZ\u2032$ is therefore well-conditioned. Hence, inversion of the Markovian kernel to determine the corresponding stochastic complement [Eq. (42)], and computation of the stationary distribution for this stochastic complement, is numerically stable.^{215}

More formally, since each of the metastable macrostates gives rise to a slowly decaying dynamical eigenmode, the nearly reducible Markov chain **T** necessarily has at least *N* − 1 eigenvalues *λ*_{k} that are close to unity, in addition to the eigenvalue *λ*_{1} = 1 that is associated with the stationary distribution. Furthermore, each of the stochastic complements $TZZ\u2032$, corresponding to communities $Z\u2208C$, necessarily has a unit eigenvalue.^{212} Each of the unique unit eigenvalues for the *N* stochastic complements are associated with one of the *N* dominant eigenvalues for the original Markov chain, and this mapping becomes exact in the limit where the original Markov chain is completely reducible, i.e., where the separate blocks $Z\u2208C$ are themselves irreducible.^{72} Hence, by the continuity of the eigenspectrum, if there is a spectral gap after the *N* dominant eigenvalues of the original Markov chain, then each of the *N* stochastic complements must have a spectral gap after the unit eigenvalue.^{73} In fact, it can be shown that there exists an upper bound on the second dominant eigenvalue for a stochastic complement of a reversible Markov chain.^{216} Thus, if the community structures used to partition the stochastic complements during the uncoupling phase appropriately characterize all metastable sets of nodes in the relevant Markov chains so that the aggregation matrices are also well-conditioned, then the entire uncoupling–coupling procedure is numerically stable. Moreover, the uncoupling–coupling procedure is readily parallelizable, owing to the independence of the stationary distributions for the stochastic complements derived from a parent Markov chain.^{212}

### B. Iterative aggregation–disaggregation

Stochastic complementation is close in spirit to iterative aggregation–disaggregation (IAD) methods^{217–220} to compute the stationary distribution ** π**. Both methods use an

*N*-way partitioning of a Markov chain based on the community structure $C$ [Eq. (44)]. Whereas the uncoupling–coupling in stochastic complementation is exact, IAD uses the substochastic blocks of a partitioned Markov chain directly, thereby avoiding the matrix inversion operations required to compute the stochastic complements [Eq. (42)].

To infer an approximation to the stationary distribution of the parent Markov chain in IAD, the normalized right eigenvector $\psi Y(1)$ associated with the dominant eigenvalue (which is less than unity) is computed for each of the substochastic matrices $TYY$, corresponding to communities $Y\u2208C$. The vector of coupling factors ** ζ*** [cf. Eq. (46)] that are associated with these eigenvectors is determined as the stationary distribution for a

*N*×

*N*-dimensional stochastic coupling matrix

**C*** with elements $CXY*=1X\u22a4TXY\psi Y(1)\u2200X,Y\u2208C$. Note that thus far, the IAD procedure is analogous to the exact uncoupling–coupling method described in Sec. IV A, except that the quantities are inexact since the substochastic blocks are used in place of the stochastic complements. The eigenvectors and associated coupling factors together yield an initial approximation to the stationary distribution of the original Markov chain [cf. Eq. (45)], $\pi \u2248(\zeta 1*\psi 11,\u2026,\zeta 1*\psi 1N)\u22a4$, which is iteratively refined as follows:

Let $\pi *=(\pi 1*,\u2026,\pi N*)\u22a4$ denote the current estimate for the stationary distribution. First, the vectors $\pi Y\u2200Y\u2208C$ are normalized to yield the vectors ${\pi \u0304Y}$, and updated coupling factors ${\zeta Y*}$ are computed as the stationary distribution of a new stochastic aggregation matrix **C*** with elements $CXY*=1X\u22a4TXY\pi \u0304Y*\u2200X,Y\u2208C$ (aggregation step). The new coupling factors yield the vector $z=(\zeta 1*\pi \u03041*,\u2026,\zeta N*\pi \u0304N*)\u22a4$. Second, an updated estimate for ** π** is obtained by solving the following

*N*systems of linear equations (disaggregation step):

^{221}

These two steps are repeated until the estimate for the stationary distribution of the parent Markov chain converges, ** π*** →

**. Further refinements to this procedure that improve numerical stability and convergence were reported in Ref. 222.**

*π*In practice, this procedure typically converges to the true stationary distribution rapidly, provided that the community structure is such that the blocks of the transition matrix are close to being stochastic.^{215,223} IAD shares many of the same advantages as exact uncoupling–coupling, most notably the numerical stability conferred when the community structure reflects the nearly reducible structure of a Markov chain exhibiting metastability and the possibility of parallelization. The GTH algorithm (Sec. IV C) can be used to solve the linear systems of equations that arise in both the aggregation and disaggregation steps of IAD, leading to improved numerical stability.^{224}

### C. Grassmann–Taksar–Heyman algorithm

The Grassmann–Taksar–Heyman (GTH) algorithm^{130,213} is essentially a nodewise iterative formulation of exact uncoupling–coupling via stochastic complementation (Sec. IV A). Consider the elimination of the *n*th node in a Markov chain by renormalization [Eq. (34)], as in the nodewise iterative formulation of the GT algorithm (Sec. III). Since the equilibrium distribution is equal to the proportion of time spent at a node in the infinite time limit, the stationary distribution of the parent Markov chain, ** π**, must be proportional to that of the reduced model,

**′,**

*π*From Eq. (49) and the global balance equation for the stationary distribution of the parent Markov chain at the *n*th node, *π*_{n} = ∑_{γ}*π*_{γ}*T*_{nγ}, we have^{128}

where *T*_{ij} is the *i* ← *j* transition probability for the original Markov chain.

Equations (49) and (50) suggest the following nodewise iterative procedure to compute the stationary distribution of an irreducible Markov chain. The GTH algorithm uses renormalization of transition probabilities [cf. Eq. (34)] to eliminate all nodes *n* of the Markov chain for which $|S|\u2265n>1$. The stationary probability for the reduced Markov chain comprising the single remaining node (*n* = 1) is, of course, known to be *π*_{1}′ = 1. The algorithm then employs a back substitution phase to “undo” the state reduction stage, thereby recovering the original Markov chain and associated stationary distribution.^{40} The GTH algorithm, which is essentially equivalent to a Gaussian elimination,^{40} is given as pseudocode in Algorithm 1. Like the GT algorithm, the GTH algorithm is numerically stable,^{225} since the relation 1 − *T*_{nn} ≡ ∑_{γ≠n}*T*_{γn} can be exploited to avoid problematic subtraction operations when *T*_{nn} → 1.^{207}

input: transition probability matrix T |

set of nodes $n\u2208S\{1}$ to be eliminated from the state space $S$ |

output: stationary probability distribution vector π |

/* elimination phase */ |

for $n=|S|,|S|\u22121,\u2026,2$ do |

S_{n} = ∑_{γ<n}T_{γn} (≡1 − T_{nn}); // confers numerical stability |

for j < n do |

T_{nj} ← T_{nj}/S_{n}; |

for i < n do |

T_{ij} ← T_{ij} + T_{in}T_{nj}; |

/* back substitution phase */ |

π_{1} ← 1; |

μ ← 1; |

for $n=2,\u2026,|S|\u22121,|S|$ do |

$\pi n=Tn1+\u2211\gamma =2n\u22121\pi \gamma Tn\gamma $; |

μ ← μ + π_{n}; |

/* normalization */ |

for $n=1,\u2026,|S|\u22121,|S|$ do |

π_{n} ← π_{n}/μ; |

return ; π |

input: transition probability matrix T |

set of nodes $n\u2208S\{1}$ to be eliminated from the state space $S$ |

output: stationary probability distribution vector π |

/* elimination phase */ |

for $n=|S|,|S|\u22121,\u2026,2$ do |

S_{n} = ∑_{γ<n}T_{γn} (≡1 − T_{nn}); // confers numerical stability |

for j < n do |

T_{nj} ← T_{nj}/S_{n}; |

for i < n do |

T_{ij} ← T_{ij} + T_{in}T_{nj}; |

/* back substitution phase */ |

π_{1} ← 1; |

μ ← 1; |

for $n=2,\u2026,|S|\u22121,|S|$ do |

$\pi n=Tn1+\u2211\gamma =2n\u22121\pi \gamma Tn\gamma $; |

μ ← μ + π_{n}; |

/* normalization */ |

for $n=1,\u2026,|S|\u22121,|S|$ do |

π_{n} ← π_{n}/μ; |

return ; π |

### D. FUND and REFUND algorithms

Recall that the key macroscopic dynamical properties of an irreducible Markov chain, including moments of the first passage time distributions for all pairwise transitions between nodes and moments of the mixing time distributions for relaxation processes from alternative starting nodes, can be computed from any generalized matrix inverse (Sec. II B). The FUND^{78,226,227} and REFUND^{128} algorithms provide numerically stable procedures to compute the group inverse **A**^{#} [Eqs. (8)–(10)] associated with an irreducible Markov chain and can be readily adapted to obtain related generalized inverses, such as the fundamental matrix **Z**.

The FUND algorithm is based on the fact that a generalized inverse **X** satisfying^{134}

can be obtained from the result of the first phase of the GTH algorithm (Algorithm 1) and that the group inverse can be written directly in terms of any generalized inverse **X** satisfying Eq. (51),^{128}

The first step of the FUND algorithm is to obtain a LU decomposition of the Markovian kernel, **I** − **T** = **UL**, where **U** and **L** are upper- and lower-triangular matrices, respectively. To see that this factorization is achieved naturally and robustly by the elimination phase of the GTH algorithm (Sec. IV C), suppose that the elements of the stochastic matrix **T** are overwritten during the procedure, yielding the matrix **T***. Let **F** denote the strictly lower-triangular matrix and **G** the upper-triangular matrix containing the corresponding elements of **T*** so that **T*** = **F** + **G** + (**I** − **S**), where **S** is the diagonal matrix with nonzero elements *S*_{nn} = *S*_{n} (cf. Algorithm 1). It can be shown that^{40,226}

The **U** matrix has all column sums equal to zero. The diagonal elements of **U** can be computed by enforcing this constraint, which thereby provides an additional opportunity to enforce numerical stability.^{78} The diagonal elements of **L** are all equal to −1, and hence, **L** is non-singular.

The LU decomposition of Eq. (53) can be used to solve Eq. (51) in two stages. First, the (unique) solution **Y** to the problem $YL=I\u2212\pi 1S\u22a4$ is determined by backward substitution. The second step is to solve **XU** = **Y**.^{134} Since the first column of **U** is identically zero, the first column of **Y** is also necessarily the null vector, and therefore, the generalized inverse **X** is not a unique solution to this equation.^{226} The simplest choice is to set the first column of **X** to be the null vector, and the remaining component vectors can then be obtained by forward substitution.^{227} The group inverse **A**^{#} then follows straightforwardly from **X** [Eq. (52)], where ** π** is computed by following through the later stages of the GTH algorithm.

The FUND algorithm outlined above is usually stable, since there is no significant numerical error associated with the forward and backward substitution procedures to determine **Y** and **X**. However, a difficulty may sometimes arise where **L** is ill-conditioned even though the Markovian kernel **I** − **T** is not. A small refinement of the FUND algorithm proposed in Ref. 78 addresses this issue. The time complexity of either formulation of the FUND algorithm is $O(|S|3)$.

An explicit formula relating the group inverses associated with a renormalized Markov chain where a single node has been eliminated and the corresponding parent Markov chain was derived in Ref. 128. This relation allows for a state reduction procedure to compute the group inverse that has a similar structure to the GTH algorithm, namely, the REFUND algorithm. The first stage of this algorithm is an elimination phase to iteratively reduce the Markov chain by renormalization, yielding the group inverse for the Markov chain with only a single node remaining as a trivial solution. A backward pass phase then computes the group inverses for the sequence of Markov chains resulting from the restoration of nodes in turn, given the group inverse for the reduced Markov chain where the node of the current iteration is eliminated. The recursive phase of the REFUND procedure also computes the stationary distribution of the reduced Markov chain at each iteration, which is required to compute the group inverse of the parent Markov chain. The *n*-dimensional stationary distribution vector for the reduced Markov chain with the *n*th node restored, *π*_{n}, is normalized at each iteration of the backward pass phase. This approach differs from the GTH algorithm (Algorithm 1), where the stationary distribution is only normalized at the final step. REFUND has comparable stability to the refined FUND algorithm and likewise has asymptotic time complexity $O(|S|3)$. The REFUND algorithm is given as pseudocode in Algorithm 2.

input: transition probability matrix T |

set of nodes $n\u2208S\{1}$ to be eliminated from the state space $S$ |

output: group inverse A^{#} |

stationary distribution π |

/* elimination phase. Note that nodes are eliminated in reverse order, thereby effectively performing the |

triangular decomposition of the Markovian kernel: I − T = UL */ |

for $n=|S|,|S|\u22121,\u2026,2$ do |

/* p_{:n} is the column and $q:n\u22a4$ the row vector, respectively, corresponding to the node to be eliminated (n) |

in the stochastic matrix for the reduced Markov chain at the current iteration, not including |

the diagonal element. Both vectors have dimension (n − 1) */ |

$p:n,q:n\u22a4\u2190$ GetBlock(T, n); |

S_{n} ← ∑_{γ<n}T_{γn} (≡1 − T_{nn}); // factors $Sn\u22001<n<|S|$ are stored in a vector |

$T:n,:n\u2032\u2190T:n,:n+Sn\u22121p:nq:n\u22a4$; // GTH elimination of the nth node |

$T:n+1,:n+1\u2190T:n,:n\u2032p:nSn\u22121q:n\u22a4Tnn$; // overwrite the elements of the transition matrix |

/* the stationary distribution and group inverse for the reduced Markov chain with only one |

node remaining are trivial */ |

π_{1} ← (1); |

$A1#\u2190(0)$; |

/* backward pass phase. At this stage, the first diagonal element of T, T_{11} = 1, corresponds to the stochastic |

matrix for a reduced Markov chain comprising only a single node. The other |

off-diagonal elements of T are the vectors p_{:n} and $q:n\u22a4$ for $1<n\u2264|S|$ */ |

for $n=2,\u2026,|S|\u22121,|S|$ do |

/* $p:n\u22a4$ and $q:n\u22a4$ are defined above, but $q:n\u22a4$ is now scaled by $Sn\u22121$ compared to the $q:n\u22a4$ vector returned by GetBlock |

during the elimination phase */ |

$p:n,q:n\u22a4\u2190$ GetBlock (T, n); |

$\alpha \u2190(1+\pi n\u22121\u22a4q:n)\u22121$; |

$\beta \u2190\alpha \pi n\u22121\u22a4q:n/Sn(\u2261(1\u2212\alpha )/Sn)$; // confers numerical stability |

$r\u2190\alpha An\u22121#q:n$; |

$t\u22a4\u2190\beta p:n\u22a4An\u22121#$; |

$c\u2190\beta (\alpha +p:n\u22a4r)$; |

c ← cπ_{n−1} − t; |

/* stationary distribution for the reduced Markov chain with the nth node restored */ |

$\pi n\u2190\alpha \pi n\u22121\u22a4q:n\pi n\u22121$; // this stationary vector is normalized |

$\delta \u2190(\pi n\u22121\u22a4q:n)\u22121(\u2261\alpha /(1\u2212\alpha ))$; // confers numerical stability |

/* the group inverse for the parent Markov chain, for which the nth node is restored, is |

recovered by an explicit formula */ |

$An#\u2190An\u22121#\u2212[r\pi n\u22121\u22a4+1n\u22121c\u22a4]\u22a4\u2212\delta cr\u22a4\u2212c1n\u22121\u22a4\delta c$; // 1_{n−1} is the (n − 1)-dimensional unit vector |

$A#\u2190A|S|#$; $\pi \u2190\pi |S|$; |

return A^{#}, ; π |

/* function to partition the relevant block of the matrix T. During the elimination phase, this block |

corresponds to the stochastic matrix for the reduced Markov chain at the current iteration |

(the other elements are overwritten during this phase) */ |

function GetBlock (T, n) |

let $T:n+1,:n+1=T:n,:np:nq:n\u22a4Tnn$; // reduced Markov chain partitioned into blocks |

return $p:n,q:n\u22a4$; |

input: transition probability matrix T |

set of nodes $n\u2208S\{1}$ to be eliminated from the state space $S$ |

output: group inverse A^{#} |

stationary distribution π |

/* elimination phase. Note that nodes are eliminated in reverse order, thereby effectively performing the |

triangular decomposition of the Markovian kernel: I − T = UL */ |

for $n=|S|,|S|\u22121,\u2026,2$ do |

/* p_{:n} is the column and $q:n\u22a4$ the row vector, respectively, corresponding to the node to be eliminated (n) |

in the stochastic matrix for the reduced Markov chain at the current iteration, not including |

the diagonal element. Both vectors have dimension (n − 1) */ |

$p:n,q:n\u22a4\u2190$ GetBlock(T, n); |

S_{n} ← ∑_{γ<n}T_{γn} (≡1 − T_{nn}); // factors $Sn\u22001<n<|S|$ are stored in a vector |

$T:n,:n\u2032\u2190T:n,:n+Sn\u22121p:nq:n\u22a4$; // GTH elimination of the nth node |

$T:n+1,:n+1\u2190T:n,:n\u2032p:nSn\u22121q:n\u22a4Tnn$; // overwrite the elements of the transition matrix |

/* the stationary distribution and group inverse for the reduced Markov chain with only one |

node remaining are trivial */ |

π_{1} ← (1); |

$A1#\u2190(0)$; |

/* backward pass phase. At this stage, the first diagonal element of T, T_{11} = 1, corresponds to the stochastic |

matrix for a reduced Markov chain comprising only a single node. The other |

off-diagonal elements of T are the vectors p_{:n} and $q:n\u22a4$ for $1<n\u2264|S|$ */ |

for $n=2,\u2026,|S|\u22121,|S|$ do |

/* $p:n\u22a4$ and $q:n\u22a4$ are defined above, but $q:n\u22a4$ is now scaled by $Sn\u22121$ compared to the $q:n\u22a4$ vector returned by GetBlock |

during the elimination phase */ |

$p:n,q:n\u22a4\u2190$ GetBlock (T, n); |

$\alpha \u2190(1+\pi n\u22121\u22a4q:n)\u22121$; |

$\beta \u2190\alpha \pi n\u22121\u22a4q:n/Sn(\u2261(1\u2212\alpha )/Sn)$; // confers numerical stability |

$r\u2190\alpha An\u22121#q:n$; |

$t\u22a4\u2190\beta p:n\u22a4An\u22121#$; |

$c\u2190\beta (\alpha +p:n\u22a4r)$; |

c ← cπ_{n−1} − t; |

/* stationary distribution for the reduced Markov chain with the nth node restored */ |

$\pi n\u2190\alpha \pi n\u22121\u22a4q:n\pi n\u22121$; // this stationary vector is normalized |

$\delta \u2190(\pi n\u22121\u22a4q:n)\u22121(\u2261\alpha /(1\u2212\alpha ))$; // confers numerical stability |

/* the group inverse for the parent Markov chain, for which the nth node is restored, is |

recovered by an explicit formula */ |

$An#\u2190An\u22121#\u2212[r\pi n\u22121\u22a4+1n\u22121c\u22a4]\u22a4\u2212\delta cr\u22a4\u2212c1n\u22121\u22a4\delta c$; // 1_{n−1} is the (n − 1)-dimensional unit vector |

$A#\u2190A|S|#$; $\pi \u2190\pi |S|$; |

return A^{#}, ; π |

/* function to partition the relevant block of the matrix T. During the elimination phase, this block |

corresponds to the stochastic matrix for the reduced Markov chain at the current iteration |

(the other elements are overwritten during this phase) */ |

function GetBlock (T, n) |

let $T:n+1,:n+1=T:n,:np:nq:n\u22a4Tnn$; // reduced Markov chain partitioned into blocks |

return $p:n,q:n\u22a4$; |

### E. Other state reduction methods

Many procedures based on the concept of renormalization have been proposed as numerically stable approaches to solve various other linear algebra problems for Markov chains and related systems.^{131} In Ref. 113, an extension of the GTH algorithm was derived to compute the variance and higher moments of the FPT distributions for the transitions from each of the transient nodes of a reducible Markov chain to the absorbing state. This procedure provides a stable method to compute the variance of the FPT distribution, which can also exploit sparsity of a Markov chain and thereby avoid excessive memory usage. The vector of MFPTs for transitions from all transient nodes to the absorbing state can likewise be calculated in a single computation by extending the GT algorithm (Sec. III A) with a backward pass phase.^{112} Similar to the GTH (Sec. IV C) and REFUND (Sec. IV D) algorithms, this procedure is based on a recursive formula to compute the MFPTs for iteratively restored nodes, after storing relevant matrix elements during the forward pass phase.^{132,134}

In Ref. 112, a state reduction procedure was proposed to determine the expected numbers of times that transient nodes *j* are visited on first passage paths prior to absorption, *θ*_{j}. This algorithm effectively performs a matrix inversion of the Markovian kernel corresponding to the transient nodes of a reducible Markov chain. This operation is achieved while retaining numerical precision by defining an augmented matrix where each node of the network is connected to a “dummy” partner, and state reduction is used to eliminate the non-dummy transient nodes. The corresponding quantity for reactive paths, $\theta \u0303j$, can be obtained by performing this procedure on a reweighted Markov chain that represents the reactive process,^{191} which is derived from the original transition probabilities and the committor probabilities. As shown in Ref. 112, it is then possible to robustly obtain the reactive visitation probabilities $rj+$ for all transient nodes $j\u2208Q$ for both nonequilibrium and equilibrium transition path ensembles. Alternatively, the probability that a particular node, or set thereof, is visited along a reactive path can be computed robustly and with minimal memory usage by employing a state reduction procedure that follows naturally from the concept of renormalization.^{114} This approach is valuable in many applications, not only for nearly reducible Markov chains, since the usual method to compute visitation probabilities for nodes on first passage^{1} or reactive^{112} paths uses matrix inversion and is therefore not scalable. Furthermore, the conventional method cannot be used to compute the visitation probabilities for macrostates. The reactive visitation probability is an important dynamical property that intuitively quantifies the kinetic relevance of states with respect to the productive transition. Thus, since there are also state reduction methods to compute the stationary distribution (Secs. IV A–IV C) and committor probabilities (Sec. V A 4), renormalization can be used for robust computation of all microscopic properties characterizing the transition path ensemble (Sec. II E).

In Ref. 228, a state reduction framework was formulated for application to Markov decision processes (MDPs). A MDP augments the state space of a Markov chain with alternative sets of probabilities for transitions from the nodes, each associated with a different *action* that is available to the decision-making agent. A set of actions corresponding to each of the nodes of the MDP is a *policy*, which defines how the agent operates. The *i* ← *j* transition under action *a*_{j} is associated with a reward *R*_{ij}(*a*_{j}). The reward at the *t*th step is usually weighted by *γ*^{t}, where *γ* is a discount factor to ensure that the reward for a trajectory converges, 0 < *γ* < 1. The usual optimization problem is to determine the *optimal policy*, namely that which maximizes the expected reward when starting from a specified initial node. The algorithm of Ref. 228 is a robust method to find the optimal policy by policy iteration using repeated application of the following two-stage procedure. First, a state reduction algorithm is used to determine the vector of expected rewards starting from each node when the agent obeys the current policy. Second, these expected rewards are used in an improvement step to obtain an update to a suboptimal policy. This numerically stable method is useful in many practical applications, where nodes associated with large rewards may be rarely visited under an initial policy.

## V. ALGORITHMS FOR SIMULATING PATHWAYS

The state reduction framework presented in Secs. III and IV provides numerically stable procedures to compute almost all dynamical properties of a Markov chain. However, there are two notable quantities introduced in Sec. II that have not yet been obtained by state reduction methods. First, there is no state reduction algorithm to exactly compute the time-dependent occupation probability distribution vector **p**(*t*), given a transition rate matrix **K**. Instead, **p**(*t*) must be computed by matrix exponentiation [Eq. (3)] or eigendecomposition [Eq. (21)]. Second, we desire a state reduction method to compute the committor probabilities for nodes [Eq. (29)], which constitute the central object in transition path theory (Sec. II E).

In the present section, we describe the kinetic path sampling (kPS) method to efficiently simulate trajectories, and hence **p**(*t*), for a nearly reducible Markov chain. The kPS algorithm applies a state reduction procedure to the currently occupied metastable macrostate of the Markovian network, followed by a backward pass phase to sample the numbers of internode transitions, and therefore the time, associated with a trajectory escaping from the subnetwork. The formulation of the graph transformation algorithm (Sec. III) employed in kPS leads to an exact state reduction procedure to compute the committor probabilities for all nodes of a Markov chain (Sec. V A 4).

In principle, numerical estimates for the $A\u2190B$ FPT distribution [Eq. (22)], and for the occupation probability distribution **p**(*t*), can be obtained by explicit simulation of $A\u2190B$ first passage paths using any algorithm that samples the solution to the linear master equation [Eq. (1)] exactly. The standard procedure to simulate trajectories on a Markovian network in continuous-time is rejection-free kinetic Monte Carlo (kMC).^{81,82} In the kMC algorithm, a trajectory that currently occupies the *j*th node is advanced to the next node using two random numbers *r*_{1}, *r*_{2} ∈ (0, 1] drawn from a uniform distribution.^{229} The first random number is used to sample an *i* ← *j* transition in proportion to the branching probabilities *P*_{ij}, and the second is used to increment the simulation clock by Δ*t* = *τ*_{j} ln *r*_{2}.^{110} The probabilities of the sampled paths then agree exactly with the linear master equation [Eq. (1)]. By using the branching probability matrix, this procedure avoids self-loop transitions and so prevents the system from becoming trapped in any one node associated with small outgoing transition rates. However, for nearly reducible Markov chains, the trajectories exhibit a strong tendency to “flicker” within metastable communities of nodes,^{84–87} which can cause kMC simulation to be unfeasibly inefficient.^{83,100}

For Markov chains featuring a rare event, large gains in efficiency can be achieved by defining the currently occupied metastable community of nodes (or basin) $B$, sampling a node at the boundary $\u2202A\u2286A$ of the absorbing macrostate $A\u2261Bc$ and sampling an escape time for the trajectory segment escaping to the absorbing boundary. The kinetic path sampling^{83,84} (kPS) (Sec. V A) and Monte Carlo with absorbing Markov chain^{101–104} (MCAMC) (Sec. V B) algorithms provide exact methods to do this. The kPS and MCAMC algorithms forfeit resolution of the trajectory within the metastable macrostates, which in any case ought to be unproductive and hence of little interest, and gain the desirable property that their efficiency is essentially independent of the metastability of the Markov chain.^{83}

The kPS and MCAMC methods require a partitioning of the network into metastable communities. This clustering can be specified *a priori*, for example, with a suitable community detection algorithm,^{230} or constructed on-the-fly, for instance, using a breadth-first search procedure with criteria for including nodes in the basin.^{231} The algorithms can be used in conjunction with standard rejection-free kMC, automatically switching to the advanced method when flickering of a trajectory within a subset of nodes is detected.^{83}

### A. Kinetic path sampling

#### 1. Renormalization phase

To simulate a trajectory segment in the kinetic path sampling^{83,84} (kPS) algorithm, the state space $S\u2261B\u222aA$ is divided into the currently occupied community $B$ and an absorbing macrostate $A$. The nodes of the basin $B\u2261E\u222aT$ are further divided into the set of eliminated nodes $E\u2286B$ and the set of retained nodes $T\u2282B$. The set $T$ may be empty, $T=\u2205$. We also define the absorbing boundary $\u2202A\u2286A$ as the subset of nodes of the absorbing macrostate that are directly connected to one or more nodes of the basin $B$. The total number of nodes in the set $E\u222aT\u222a\u2202A$ is denoted *N*_{c}. The definitions of these states are illustrated in Fig. 3.

To simulate a trajectory escaping from the active community $B$ to the absorbing boundary $\u2202A$, kPS uses a stochastic matrix corresponding to the relevant subnetwork $E\u222aT\u222a\u2202A$. In the first phase of the kPS algorithm, nodes $1,\u2026,|E|\u2208E$ are eliminated, in turn, by renormalization (Sec. III), while retained nodes $|E|+1,\u2026,|B|\u2208T$ and absorbing boundary nodes $|B|+1,\u2026,Nc\u2208\u2202A$ remain noneliminated. The input to the sampling stage of the kPS algorithm is the set of $|E|+1$ transition probability matrices {**T**^{(n)}}, $0\u2264n\u2264|E|$, with **T**^{(0)} corresponding to the transition probability matrix for the subnetwork $B\u222a\u2202A$ of the original (untransformed) network. **T**^{(0)} has dimensions $Nc\xd7|B|$, since the absorbing boundary nodes have no outgoing transitions. Successive stochastic matrices (with *n* > 0) are computed by the iterative elimination of the $|E|$ nodes $n\u2208E$ from this subnetwork using the GT algorithm (Sec. III). In the renormalization, the transition probabilities for *all* pairs of nodes are updated according to

[cf. Eq. (34)]. Transitions from eliminated to noneliminated nodes are preserved by Eq. (54), but transitions from noneliminated to eliminated nodes have zero probability. As in Eq. (34), only the probabilities for transitions between pairs of nodes that are both directly connected to the *n*th node are affected by the renormalization. However, unlike renormalization using Eq. (34), connections involving eliminated nodes must also be considered. The mean waiting or lag times for transitions from nodes are *not* renormalized [cf. Eq. (35)] in the kPS algorithm, since the state reduction procedure is reversed before sampling the time associated with a trajectory escaping from the basin to the absorbing boundary.

The formulation of GT expressed in Eq. (54) is analogous to a LU decomposition^{215,232} of a stochastic matrix.^{83} Exploiting this analogy reduces the memory requirements of the kPS algorithm, since the intermediate transition matrices arising from the iterative elimination of nodes need not be stored. Instead, it is only necessary to store the original and final stochastic matrices, **T**^{(0)} and $T(|E|)$, respectively, and the matrices **L** and **U** that contain the elements required to construct **T**^{(n−1)} from **T**^{(n)}. Specifically, the **L** and **U** matrices have elements [cf. Eq. (54)]

#### 2. Sampling dynamics on the renormalized network and iterative reverse randomization

Recall that renormalization preserves the path probabilities to individual absorbing nodes (Sec. III B).^{91} This property of the state reduction framework is the basis for the kPS algorithm. To sample the absorbing boundary node $\alpha \u2208\u2202A$ at which the trajectory segment terminates, the following probability vectors are used to simulate successive *i* ← *j* transitions on the renormalized subnetwork comprising the nodes of the set $B\u222a\u2202A$,

starting from the currently occupied node $\u03f5\u2208B$. Here, $t*,j(|E|)$ denotes the vector containing the elements of the *j*th column of $T(|E|)$, and $t*,j(j)$ (for $j>|E|$) denotes the *j*th column of the transition matrix given by the elimination [Eq. (54)] of node *j* from **T**^{(0)}. In this sampling procedure, transitions between eliminated nodes are not permitted [cf. Eq. (54)], but transitions from noneliminated transient nodes $j\u2208T$ to eliminated nodes $i\u2208E$ are possible. This setup greatly reduces the number of steps required to reach a node at the absorbing boundary. If $T=\u2205$, then the trajectory necessarily reaches the absorbing boundary in a single transition. If $E=\u2205$, then Eq. (56) reduces to the standard rejection-free kMC algorithm.^{233}

Let **h** denote the $|E|$-dimensional vector with elements *h*_{j} equal to the number of transitions from node $j\u2208E$ to nodes *i* > *j*, i.e., transitions to nodes that are noneliminated in the stochastic matrix at the *j*th iteration of the renormalization phase. Let ** η** denote the $|B|$-dimensional vector with elements

*η*

_{j}equal to the total number of transitions from the

*j*th node. We also define the set of $|E|+1$ hopping matrices {

**H**

^{(n)}}, $0\u2264n\u2264|E|$, each of dimension $(Nc\u2212n)\xd7|E|$. The element $Hij(n)$ corresponds to the number of $i\u2190j\u2208E$ transitions on the renormalized stochastic matrix resulting from the elimination of

*n*nodes [Eq. (54)]. Hence, $\eta j=\u2211\gamma \u2208B\u222a\u2202AH\gamma j(0)$. Note that there are no transitions to eliminated nodes (with indices

*j*≤

*n*) in the hopping matrix of the

*n*th iteration, hence the stated dimensionality.

In the next stage of the kPS algorithm, an iterative reverse randomization procedure is used to sample the matrix **H**^{(0)} with elements $Hij(0)$ corresponding to the numbers of *i* ← *j* transitions on the original (untransformed) subnetwork, with the associated stochastic matrix **T**^{(0)}. This procedure exploits the fact that **H**^{(n−1)} can be sampled directly from **H**^{(n)}, **T**^{(n)}, and **T**^{(n−1)} without requiring explicit simulation of the dynamics on the network corresponding to **T**^{(n−1)}, in which the *n*th node is not eliminated. In this sense, a single iteration of the reverse randomization procedure “undoes” a single iteration of the renormalization [Eq. (54)]. The relationship between renormalization and iterative reverse randomization is illustrated in Fig. 4. The sampling of successive hopping matrices is based on the set of *N*_{c} × *n*-dimensional matrices {**G**^{(n)} (*τ*)}, for $0<n\u2264|E|$, with elements $Gij(n)$ given by the ratio of *i* ← *j* transition probabilities in Markov chains where the *n*th node is the next to be eliminated in the renormalization phase and where the *n*th node has been eliminated,

Hence, $Gij(n)$ is the fraction of *i* ← *j* transitions in **T**^{(n−1)} that are direct, i.e., that do not proceed via the *n*th node. If either *i* or *j* are not directly connected to *n* in **T**^{(n−1)}, then $Gij(n)=1$.

Since the total number of *i* ← *j* kMC transitions along the *α* ← ϵ trajectory on the original Markov chain is known from ** η**, which is derived from

**H**

^{(0)}and

**h**, the time $tA$ elapsed along the path for escape from the community $B$ can be sampled. The pseudocode for the categorical sampling, iterative reverse randomization, and transition time sampling procedures detailed in this section is given in Algorithm 3. Here, ∼ denotes a random number drawn from a distribution, Γ(

*α*,

*β*) is the gamma distribution with shape parameter

*α*and rate parameter

*β*

^{−1}, and B(

*h*,

*p*) and NB(

*r*,

*p*) are the binomial and negative binomial distributions, respectively, with trial number

*h*, success number

*r*, and success probability

*p*.

input: sets of nodes $E\u2260\u2205$, $T$, $B\u2261E\u222aT$, $A\u2261Bc$, and $\u2202A\u2286A$ (Fig. 3) |

$Nc=|E\u222aT\u222a\u2202A|$ |

initially occupied node $\u03f5\u2208B$ |

set of $|E|+1$ stochastic matrices {T^{(n)}}, $0\u2264n\u2264|E|$, from renormalization [Eq. (54)] of nodes in $E$ |

set of $|E|$ matrices {G^{(n)}}, for $0<n\u2264|E|$, derived from the {T^{(n)}} matrices via Eq. (57) |

output: absorbing node $\alpha \u2208\u2202A$ |

hopping matrix H^{(0)} with elements $Hij(0)$ equal to the numbers of $i\u2190j\u2208E$ internode transitions |

along the sampled α ← ϵ path on T^{(0)} |

vector with elements ηη_{j} equal to the total number of transitions from node $j\u2208B$ |

time elapsed $tA$ for the α ← ϵ trajectory for escape from $B$ |

initialize $H(|E|)$ (dimension $(Nc\u2212|E|)\xd7|E|$); |

initialize h (dimension $|E|$); |

initialize (dimension $|B|$); η |

β ← ϵ; |

/* Categorical sampling procedure to sample an absorbing boundary node $\alpha \u2208\u2202A$ and numbers of internode |

transitions on the renormalized subnetwork $B\u222a\u2202A$ */ |

while $\beta \u2209\u2202A$ do |

γ ∼ c_{β} [Eq. (56)]; |

if $\beta \u2208E$ then |

$H\gamma \beta (|E|)\u2190H\gamma \beta (|E|)+1$; |

else |

η_{β} ← η_{β} + 1; |

β ← γ; |

α ← β; |

/* Sample numbers of transitions from noneliminated nodes */ |

for $\delta \u2208T$ do |

$f\delta \u223cNB(\eta \delta ,1\u2212T\delta \delta (0))$; |

η_{δ} ← η_{δ} + f_{δ}; |

/* Iterative reverse randomization to generate the numbers of internode transitions on the |

original subnetwork $B\u222a\u2202A$ */ |

for $n\u2190|E|$ to 1 do |

initialize H^{(n−1)} (dimension $(Nc\u2212(n\u22121))\xd7|E|$); |

for $\beta \u2208E\{n}$ and γ ∈ {n + 1, …, N_{c}} do |

$H\gamma \beta (n\u22121)\u223cB(H\gamma \beta (n),G\gamma \beta (n))$; |

for $\beta \u2208E\{n}$ do |

$Hn\beta (n\u22121)\u2190\u2211\gamma \u2208{n+1,\u2026,Nc}H\gamma \beta (n)\u2212H\gamma \beta (n\u22121)$; |

for γ ∈ {n + 1, …, N_{c}} do |

$H\gamma n(n\u22121)\u2190H\gamma n(n)+\u2211\beta \u2208E\{n}H\gamma \beta (n)\u2212H\gamma \beta (n\u22121)$; |

$hn\u2190\u2211\gamma \u2208{n+1,\u2026,Nc}H\gamma n(n\u22121)$; |

$Hnn(n\u22121)\u223cNB(hn,1\u2212Tnn(n\u22121))$; |

$\eta n\u2190hn+Hnn(n\u22121)$; |

deallocate H^{(n)}, T^{(n)}; |

/* sample the transition time for the $\alpha \u2208\u2202A\u2190\u03f5\u2208B$ trajectory */ |

$tA\u21900$; |

for $j\u2208B\u222aT$ do |

Δ_{j} ∼ Γ(η_{j}, τ_{j}); |

$tA\u2190tA+\Delta j$; |

deallocate H^{(0)}, T^{(0)}, h, ; η |

return $tA$, α; |

input: sets of nodes $E\u2260\u2205$, $T$, $B\u2261E\u222aT$, $A\u2261Bc$, and $\u2202A\u2286A$ (Fig. 3) |

$Nc=|E\u222aT\u222a\u2202A|$ |

initially occupied node $\u03f5\u2208B$ |

set of $|E|+1$ stochastic matrices {T^{(n)}}, $0\u2264n\u2264|E|$, from renormalization [Eq. (54)] of nodes in $E$ |

set of $|E|$ matrices {G^{(n)}}, for $0<n\u2264|E|$, derived from the {T^{(n)}} matrices via Eq. (57) |

output: absorbing node $\alpha \u2208\u2202A$ |

hopping matrix H^{(0)} with elements $Hij(0)$ equal to the numbers of $i\u2190j\u2208E$ internode transitions |

along the sampled α ← ϵ path on T^{(0)} |

vector with elements ηη_{j} equal to the total number of transitions from node $j\u2208B$ |

time elapsed $tA$ for the α ← ϵ trajectory for escape from $B$ |

initialize $H(|E|)$ (dimension $(Nc\u2212|E|)\xd7|E|$); |

initialize h (dimension $|E|$); |

initialize (dimension $|B|$); η |

β ← ϵ; |

/* Categorical sampling procedure to sample an absorbing boundary node $\alpha \u2208\u2202A$ and numbers of internode |

transitions on the renormalized subnetwork $B\u222a\u2202A$ */ |

while $\beta \u2209\u2202A$ do |

γ ∼ c_{β} [Eq. (56)]; |

if $\beta \u2208E$ then |

$H\gamma \beta (|E|)\u2190H\gamma \beta (|E|)+1$; |

else |

η_{β} ← η_{β} + 1; |

β ← γ; |

α ← β; |

/* Sample numbers of transitions from noneliminated nodes */ |

for $\delta \u2208T$ do |

$f\delta \u223cNB(\eta \delta ,1\u2212T\delta \delta (0))$; |

η_{δ} ← η_{δ} + f_{δ}; |

/* Iterative reverse randomization to generate the numbers of internode transitions on the |

original subnetwork $B\u222a\u2202A$ */ |

for $n\u2190|E|$ to 1 do |

initialize H^{(n−1)} (dimension $(Nc\u2212(n\u22121))\xd7|E|$); |

for $\beta \u2208E\{n}$ and γ ∈ {n + 1, …, N_{c}} do |

$H\gamma \beta (n\u22121)\u223cB(H\gamma \beta (n),G\gamma \beta (n))$; |

for $\beta \u2208E\{n}$ do |

$Hn\beta (n\u22121)\u2190\u2211\gamma \u2208{n+1,\u2026,Nc}H\gamma \beta (n)\u2212H\gamma \beta (n\u22121)$; |

for γ ∈ {n + 1, …, N_{c}} do |

$H\gamma n(n\u22121)\u2190H\gamma n(n)+\u2211\beta \u2208E\{n}H\gamma \beta (n)\u2212H\gamma \beta (n\u22121)$; |

$hn\u2190\u2211\gamma \u2208{n+1,\u2026,Nc}H\gamma n(n\u22121)$; |

$Hnn(n\u22121)\u223cNB(hn,1\u2212Tnn(n\u22121))$; |

$\eta n\u2190hn+Hnn(n\u22121)$; |

deallocate H^{(n)}, T^{(n)}; |

/* sample the transition time for the $\alpha \u2208\u2202A\u2190\u03f5\u2208B$ trajectory */ |

$tA\u21900$; |

for $j\u2208B\u222aT$ do |

Δ_{j} ∼ Γ(η_{j}, τ_{j}); |

$tA\u2190tA+\Delta j$; |

deallocate H^{(0)}, T^{(0)}, h, ; η |

return $tA$, α; |

The final section of Algorithm 3, in which $tA$ is sampled, assumes that the input transition probability matrices correspond to a CTMC. Recall that in the continuous-time case, the time for a *i* ← *j* transition is sampled from an exponential distribution with rate parameter $\tau j\u22121$.^{110} If the linearized transition probability matrix [Eq. (57)] is used instead of the branching probability matrix, then the mean waiting times for nodes on the untransformed subnetwork **T**^{(0)} are uniform, *τ*_{j} ≡ *τ* ∀*j*. Hence, the time $tA$ elapsed along the escape trajectory on the untransformed network can be drawn from a single gamma distribution, with the shape parameter equal to the total number of internode transitions on **T**^{(0)} and rate parameter *τ*^{−1}. That is, $tA\u223c\Gamma (\u2211j\u2208E\u222aT\eta j,\tau )$. In the discrete-time case, the time step is fixed and uniform for all transitions, equal to the lag time *τ*, and so $tA=\tau \u2211j\u2208E\u222aT\eta j$.

Algorithm 3 describes the generation of a single escape trajectory from the currently occupied community $B$, given the set of transition matrices {**T**^{(n)}} computed by renormalization (Sec. V A 1). In a kPS simulation to sample complete $A\u2190B$ first passage paths, the main loop of the kPS algorithm (comprising renormalization, categorical sampling, iterative reverse randomization, and sampling of a transition time) is repeated many times to yield a stochastic trajectory that hits the target state $A$.

#### 3. Model example

For clarity, we illustrate the kPS algorithm for a model example. The network shown in Fig. 4 consists of four nodes. The set $E$ comprises three nodes *β* < *n* < *γ* that are to be eliminated by graph transformation, in order of increasing indices. The absorbing boundary comprises a single node, $\u2202A\u2261{\alpha}$, and there are no retained nodes so that $E\u2261B$. Therefore, when all three nodes of the set $E$ have been eliminated, the only available moves are to the absorbing node *α*. Hence, in the categorical sampling procedure based on Eq. (56), the escape trajectory reaches the absorbing boundary from the initial node ϵ in a single transition. The corresponding element $H\alpha \u03f5(\gamma )$ of the hopping matrix **H**^{(γ)}, which contains the numbers of transitions on the network **T**^{(γ)} where all nodes of the set $E$ are eliminated, is then incremented by one.

To understand the effect of renormalization, in the form employed within kPS [Eq. (54)], consider the elimination of node *n* from the stochastic matrix **T**^{(β)} to give **T**^{(n)}, as shown in Fig. 4. Prior to the elimination of node *n*, node *β* is already eliminated, and hence, no edges to node *β* are included in the network **T**^{(β)}. Upon the elimination of node *n*, edges to node *n*, including the *n* ← *n* self-loop, are likewise removed from the network. To compensate for the removal of these transitions, the transition probabilities of edges from node *n* to neighboring noneliminated nodes (i.e., *γ* and *α*) increase. The transition probabilities for edges between pairs of nodes that are both directly connected to node *n*, and for which the terminal node is noneliminated, similarly increase. This operation involves updating the *γ* ← *γ* and *α* ← *α* self-loops, as well as the *α* ← *γ*, *γ* ← *α*, and *α* ← *β* edges, which already exist in the network, and the addition of a *γ* ← *β* edge (indicated by ∗ in Fig. 4) in the renormalized network, since nodes *β* and *γ* were not directly connected in **T**^{(β)}. Figure 4 also shows the reverse randomization procedure to sample **H**^{(β)} from **H**^{(n)}, hopping matrices for which node *n* is noneliminated and eliminated (i.e., corresponding to the networks **T**^{(β)} and **T**^{(n)}), respectively. The numbers of transitions along the edges to be updated are generated randomly in a way that compensates for the changes in the transition probabilities resulting from the corresponding single iteration of the renormalization phase, as outlined in Algorithm 3.

A single iteration of the reverse randomization procedure, given as pseudocode in the third **for** loop of Algorithm 3, comprises four individual steps. Figure 5 shows the effect of each of these individual steps [(i)–(iv)] on the hopping matrix **H**^{(n)} to yield the matrix **H**^{(β)} (cf. Fig. 4). In step (i), the numbers of transitions along each edge from a node of the set $E$ to a noneliminated node, excluding edges associated with node *n*, are updated by drawing new values from a binomial distribution. For each transition along an edge of the renormalized network **T**^{(n)}, a Bernoulli trial is conducted with success probability equal to the ratio of transition probabilities for the edge in the networks **T**^{(β)} and **T**^{(n)}, where node *n* is eliminated and noneliminated, respectively. It is not necessary to consider edges associated with a node that is not directly connected to node *n* in **T**^{(β)} because renormalization does not affect these edges, and hence, the success probability in the Bernoulli trials is unity. Similarly, if two nodes are not directly connected to one another but are both connected to node *n* in **T**^{(β)}, then the ratio of transition probabilities associated with this edge in **T**^{(β)} and **T**^{(n)} is necessarily zero, and hence, there are no transitions along this (nonexistent) edge in **H**^{(β)}. Thus, the *γ* ← *β* edge in Fig. 5 is removed after step (i).

In step (ii), the numbers of transitions along edges from nodes of the set $E$ to node *n*, excluding the *n* ← *n* self-loop, increase to account for any decreases in the numbers of transitions from nodes of the set $E$ in step (i). Similarly, in step (iii), the numbers of transitions from node *n* to noneliminated nodes increase to account for any decreases in the numbers of transitions to noneliminated nodes in step (i). Finally, in step (iv), the number of *n* ← *n* self-loop transitions is drawn from a negative binomial distribution, where the number of trials is equal to the number of transitions from node *n* to noneliminated nodes. By repeated application of this reverse randomization procedure, the numbers of transitions from all eliminated (and any transient) nodes on the original subnetwork **T**^{(0)} are obtained, and hence, a time for the trajectory can be sampled.

#### 4. Committor and absorption probabilities

The formulation of the state reduction methodology employed in kPS [Eq. (54)] also provides a numerically stable procedure to compute committor probabilities (Sec. II E). Recall that the forward $A\u2190B$ committor probability for the *j*th node, $qj+$ [Eqs. (28) and (29)], is defined as the probability that a trajectory at node *j* hits the absorbing macrostate $A$ before hitting the initial macrostate $B$.^{97} By definition, $qb+=0$ for $b\u2208B$ and $qa+=1$ for $a\u2208A$.^{111} The committor probabilities for all nodes are obtained following the renormalization phase of the kPS computation if the state space is divided so that $A\u2261A\u222aB$ and $E\u2261(A\u222aB)c$. Then, after repeated application of Eq. (54) to the nodes in the set $E$, the only transitions from eliminated nodes in the transformed network $T(|E|)$ are to nodes in either of the endpoint states $A$ or $B$. The forward committor probability for the *j*th node, $j\u2208E$, is therefore simply

This procedure is readily generalizable to computing the set of committor probabilities to any number of alternative attractor states in a single computation.^{112}

This formulation of GT also allows for the robust computation of the absorption probabilities.^{1} The probability *B*_{aj} that a trajectory initialized at the *j*th node is absorbed at node $a\u2208A$ is given straightforwardly from the transition probabilities of the renormalized network where only the nodes of the set $Ac$ remain noneliminated, i.e., using $A\u2261A$ and $E\u2261Ac$, via $Ba\u2208A,j\u2209A=Taj(|E|)$.

### B. Monte Carlo with absorbing Markov chains

The Monte Carlo with absorbing Markov chain^{101–104} (MCAMC) algorithm provides an alternative approach to simulate pathways for a nearly reducible Markov chain and is similar in spirit to kPS. In the first passage time analysis^{234} (FPTA) variant of the MCAMC method, the problem of sampling a transition time $tA$ and exit node at the absorbing boundary $\alpha \u2208\u2202A$, for a trajectory escaping from an initial node of the currently occupied community $\u03f5\u2208B$, is solved exactly using eigendecomposition (Sec. II).

The probability that the trajectory has exited the community $B$ at time *t* is equal to the sum of occupation probabilities for the absorbing boundary nodes at that time, which in the continuous-time case is given by [cf. Eq. (21)]

where we have used the fact that the initial probability distribution is localized at the ϵ-th node. The Markov chain for the subnetwork $B\u222a\u2202A$ is reducible because the nodes of the state $\u2202A$ are treated as absorbing so that $p\u2202A(t\u2192\u221e)=1$. An exit time $tA$ can therefore be sampled by drawing a random number *r*_{1} ∈ (0, 1] and solving for $p\u2202A(tA)=r1$ numerically using a bracketing and bisection method.^{234} The iterative calculation to determine $tA$ in the FPTA method is initialized from the mean exit time, which is given by

where we have used the Perron–Frobenius theorem, *γ*_{1} = 0.^{107} The approximate mean rate method avoids the iterative calculation to determine $tA$ by simply using $\u27e8tA\u27e9$ to advance the simulation clock.^{235} The relative occupation probability distribution of absorbing nodes at any given time is also known from the eigendecomposition [Eq. (59)], and therefore, an exit node *α* can be sampled by drawing a second random number *r*_{2} ∈ (0, 1] and comparing with the probability distribution $p\alpha \u2032(tA)/p\u2202A(tA)\u2200\alpha \u2032\u2208\u2202A$, analogous to the procedure for selecting a move in the standard kMC algorithm.^{81,82}

### C. Practical considerations for advanced simulation algorithms

The time complexity to simulate a trajectory segment escaping from the currently occupied basin $B$ to the absorbing boundary $\u2202A$ is $O(|B\u222a\u2202A|3)$ for both the kPS^{83} and MCAMC^{92} algorithms. Importantly, the central processing unit (CPU) time to execute a single iteration of the main loop in the kPS and MCAMC algorithms does not depend on the actual number of transitions along the path. This feature makes the methods extremely powerful when the basins are chosen to accurately characterize the metastable sets of nodes, where the trajectory segments are sufficiently long that simulation of the path by kMC is unfeasible. Thus, the computational overhead associated with the more advanced algorithms is offset.^{100} The choice of community structure that is leveraged in these advanced simulation algorithms is therefore crucial to their success in simulating trajectories on nearly reducible Markov chains. The partitioning is also a critical consideration when eliminating blocks of nodes simultaneously in state reduction procedures, since inversion of the Markovian kernel for a community of nodes is numerically unstable if the subnetwork encompasses a separation of characteristic timescales. This block operation can be used in the GT algorithm to compute the $A\u2190B$ MFPT (Sec. III A) and in algorithms to compute the expected number of node visits^{112} and state visitation probabilities^{114} (Sec. IV E) and also forms the basis of exact uncoupling–coupling to compute the stationary distribution (Sec. IV A). Likewise, a condition for efficient convergence of iterative aggregation–disaggregation is that the diagonal blocks of the partitioned Markov chain are nearly stochastic (Sec. IV B).

A discussion of appropriate community detection procedures to identify metastable macrostates is beyond the scope of the present review. However, we briefly mention a recent advance that provides a framework to refine an initial partitioning, which is useful in practical applications, where the efficiency and/or numerical stability of computations is sensitive to perturbations of the community structure. The central theorem of this approach is that for a given partitioning of a CTMC into a set of *N* communities, there exists an upper bound on the second dominant eigenvalue of the *N* × *N*-dimensional lumped rate matrix given by the local equilibrium approximation (LEA).^{145,157,236} An analogous theorem applies in discrete-time. This result suggests a variational optimization procedure to refine an initial clustering, where the assigned macrostates for one or more nodes at the intercommunity boundaries are randomly switched to that of a neighboring community.^{237} The second dominant eigenvalue $\gamma 2C$ of the updated lumped Markov chain given by the LEA^{6,144} is then computed. The latter operation is efficient if *N* is not large, especially since the full eigenspectrum is not required and therefore Krylov subpsace methods (Sec. II D) can be employed. Moreover, the eigendecomposition is numerically stable if the lumped Markov chain does not encompass a separation of characteristic timescales, which in any case is required for the Markovian approximation to the coarse-grained dynamics to be valid.

An increase in $\gamma 2C$ essentially guarantees that the reduced Markovian network better approximates the slowest relaxation process of the original Markov chain.^{238,239} This eigenvalue therefore provides a rigorous metric to improve an initial clustering in an interpretable way. Recently, it was shown that there also exists an upper bound for the Kemeny constant $\zeta KC$ [Eq. (15)] of a reduced Markov chain lumped according to the community structure $C$ and with coarse-grained intercommunity transition probabilities or rates given by the LEA.^{108} $\zeta KC$ therefore provides an alternative objective function for the variational optimization procedure, which may be preferable to using the second dominant eigenvalue $\gamma 2C$. The Kemeny constant is a sum of eigenvalues [Eq. (18)] and therefore quantifies the extent to which the coarse-grained Markov chain reproduces *all* relaxation processes of the original Markov chain, with slower dynamical eigenmodes receiving a larger weighting. In the simplest possible implementation, the variational optimization is performed using a greedy approach, where node-switching moves that increase $\gamma 2C$ (or $\zeta KC$) are always accepted. More sophisticated stochastic optimization approaches, such as simulated annealing, calculate an acceptance probability for a proposed move.^{237}

The computational overhead associated with the kPS and MCAMC algorithms may become significant if a metastable community comprises a large number of nodes. It is not an option to split a large basin into separate macrostates because then escape of a trajectory to an absorbing boundary does not constitute a long-timescale process. The only way to reduce the computational expense for a given iteration of either simulation algorithm is to reduce the dimensionality of the relevant subnetwork. A simple way to achieve this goal is recursive regrouping^{6} to subsume sets of nodes that are interconnected by transition rates that exceed a specified threshold,^{52} with the new intergroup transition rates given by the LEA.^{157} If the basin is metastable, then the error resulting from this procedure ought to be small, since the regrouping of nodes that are interconnected by fast transition rates should not have a significant effect on the slow dynamics for escape from the community. A more advanced pre-processing strategy is partial graph transformation,^{240} where renormalization [Eqs. (34) and (35)] is used to eliminate a subset of chosen nodes within the predefined communities.^{86} This framework exploits the fact that, for a typical Markov chain, it is likely that the path probability distribution for the ensemble of escape trajectories from a given basin is localized.^{100} For instance, the probability distribution of reactive trajectories [Eq. (31)] may be concentrated in a small fraction of nodes, or escape to a particular node at the absorbing boundary may be strongly favored. Therefore, in principle, it should be possible to identify the subset of basin nodes to be eliminated that minimizes the information loss on the slow dynamics for escape from the community. Empirically, we find that it is favorable to retain nodes at the intercommunity boundary, as well as nodes with large (quasi-) stationary probabilities.^{240}

## VI. CONCLUSIONS

We have provided an overview of linear algebra (Sec. II) and state reduction methods (Secs. III–V A) for the exact numerical analysis of Markovian network dynamics. In Sec. II, we surveyed expressions for properties characterizing both the global and local dynamical behavior of finite Markov chains. We began by noting that macroscopic quantities, such as moments of the first passage and mixing time distributions, can be computed from a fundamental matrix associated with an irreducible Markov chain (Sec. II B) or using eigendecomposition (Sec. II C). We also defined quantities that characterize an $A\u2190B$ transition from an initial ($B$) to an absorbing ($A$) state at a microscopic level of detail, including the committor probabilities for nodes, which are the central object in analyzing the features of the productive transition (Sec. II E). For nearly reducible Markov chains, which feature a separation of characteristic timescales, ill-conditioning typically prohibits the solution of a given system of linear equations by conventional algorithms. While preconditioning schemes can be employed to aid the convergence of sparse linear algebra methods applied to Markov chains exhibiting metastability, this framework is not readily generalizable, since nontrivial and system-specific considerations arise (Sec. II D).

We therefore focused on state reduction algorithms, which have inherent numerical stability, to analyze arbitrary discrete- and continuous-time Markov chains. State reduction methods are based on renormalization [Eq. (34)] to sequentially eliminate nodes or blocks thereof. Some state reduction algorithms (such as the GTH^{130,213} and REFUND^{128} algorithms) also incorporate a backward pass phase to restore nodes in turn, recursively computing the properties of the successive Markov chains having solved the trivial one-node reduced system. Section III detailed the graph transformation algorithm to robustly compute the $A\u2190B$ MFPT. In Sec. IV, state reduction procedures were described that enable computation of all dynamical quantities outlined in Sec. II for nearly reducible Markov chains. The stationary distribution can be computed by exact uncoupling–coupling (Sec. IV A), iterative aggregation–disaggregation (Sec. IV B), or the Grassmann–Taksar–Heyman algorithm (Sec. IV C). The fundamental matrix of an irreducible Markov chain, and hence moments of the FPT distribution for all pairwise internode transitions, can be determined by the REFUND algorithm (Sec. IV D). Other state reduction methods, including procedures to compute node properties that characterize the ensemble of $A\u2190B$ transition paths,^{97,99} such as the committor^{112} and visitation^{114} probabilities for states, were summarized in Secs. IV E and V A 4. Finally, we noted that sampling trajectories using kinetic Monte Carlo is unfeasibly inefficient for systems exhibiting rare event dynamics and presented an account of kinetic path sampling, which extends the state reduction methodology to sample the numbers of internode transitions along pathways (Sec. V).

The metastable regime is crucially important, since realistic dynamical models typically exhibit a particular rare event that is the process of interest.^{52–66} The state reduction methodology reviewed herein allows for extensive numerical analysis of nearly reducible Markov chains, which would otherwise be intractable, and hence is valuable in many practical applications. The ability to extract observable quantities and perform detailed analyses of computationally challenging models will lead to new insights into the dynamical behavior of complex systems. In particular, we can probe the relationship between the local features of a Markovian network and the slow global dynamics, which is typically influenced strongly by a small number of states that facilitate the dominant transition mechanisms.^{87,100,112,115}

## SUPPLEMENTARY MATERIAL

Many of the methods described in this work, including the LU decomposition formulation of the graph transformation (GT) algorithm to compute MFPTs, committor probabilities, and absorption probabilities, the Grassmann–Taksar–Heyman (GTH) algorithm, a state reduction procedure to compute visitation probabilities, and kinetic path sampling (kPS), are implemented in DISCOTRESS, the DIscrete State COntinuous Time Rare Event Simulation Suite. DISCOTRESS is a flexible C++ program for efficient simulation and numerically stable analysis of the stochastic dynamics on arbitrary discrete- and continuous-time finite Markov chains, including nearly reducible Markov chains that are computationally challenging. DISCOTRESS is freely available software under the GNU General Public License and can be found online at github.com/danieljsharpe/DISCOTRESS. An implementation of the graph transformation (GT) algorithm to compute the $A\u2190B$ MFPT for the transition between two endpoint states $A$ and $B$ is included in the PATHSAMPLE program, which is available at www-wales.ch.cam.ac.uk/PATHSAMPLE/ under the GNU General Public License.

## ACKNOWLEDGMENTS

D.J.S. gratefully acknowledges the Cambridge Commonwealth, European and International Trust for a Ph.D. scholarship and additional financial support from the EPSRC. D.J.W. gratefully acknowledges support from the EPSRC.

The authors have no conflicts to disclose.

## DATA AVAILABILITY

Data sharing is not applicable to this article as no new data were created or analyzed in this study.