Solving the chemical master equation directly is difficult due to the curse of dimensionality. We tackle that challenge by a numerical scheme based on the quantized tensor train (QTT) format, which enables us to represent the solution in a compressed form that scales linearly with the dimension. We recast the finite state projection in this QTT framework and allow it to expand adaptively based on proven error criteria. The end result is a QTT-formatted matrix exponential that we evaluate through a combination of the inexact uniformization technique and the alternating minimal energy algorithm. Our method can detect when the equilibrium distribution is reached with an inexpensive test that exploits the structure of the tensor format. We successfully perform numerical tests on high-dimensional problems that had been out of reach for classical approaches.

Complex interacting networks are prevalent across many fields of science and engineering, ranging from chemical kinetics and pharmacology to social sciences.1 In biochemistry, such networks notably arise through the reactions between the cell’s components such as DNA and RNA molecules. Since these key molecules often appear in low copy numbers, the intrinsic randomness of their interactions becomes significant.2–5 This motivates a stochastic framework that treats the chemical populations as a continuous-time, discrete-state Markov process. Finding the time-dependent probability distribution of this process amounts to solving the chemical master equation (CME).6 

Many recent direct CME solvers build on the finite state projection (FSP) of Munsky and Khammash,7 which allows the extremely large or even infinite CME to be truncated to a finite, more tractable, system of linear ordinary differential equations (ODEs). These solvers improve efficiency with selective choices of the projection and adaptive time-stepping schemes.8–12 Alternatively, one can represent the FSP-truncated system using a different basis, such as orthogonal polynomials,13 wavelets,14 and radial basis functions.15 We also mention the multi-finite buffers introduced by Cao et al.16 In addition, there are special methods that exploit specific properties of the underlying system such as the time scale separation17 and the quasi-steady state assumption.18 One can also develop hybrid methods to reduce the dimensionality.19 

In this paper, we are interested in representing the high-dimensional solution to the CME in a low-rank tensor format. Similar endeavors have proved effective in several high-dimensional problems,20–22 and research efforts for solving the CME are benefiting from the ideas such as the early work of Hegland and Garcke23 who proposed a tensor-based CME solver using the canonical polyadic (CP) format. Also, Liao et al.24 recently applied tensor methods to fit steady-state data using the stationary solution to the chemical Fokker-Planck equation. We focus here on the tensor train (TT) format,25,26 in particular the quantized tensor train (QTT) variant.27 We note that there have been other promising studies to solve the CME using the QTT format.28,29 These previous studies employed the QTT scheme on the macro-time steps, whereas our contribution further develops the scheme by introducing new adaptive integration strategies to vary both the FSP and the integration steps, resulting in substantially reduced execution times as comparative numerical tests will show.

The organization of the paper is as follows. Sections II and III are review sections, briefly outlining, respectively, the CME and then the tensor train and its quantized variant. From there the rest of the paper delves into our new QTT-based CME solver, featuring in Sec. IV a tensor formulation of the FSP on hyper-rectangles with absorbing boundaries, after which we describe in Sec. V how we allow the FSP to extend adaptively and how we update the cores of the QTT format accordingly. We then develop in Sec. VI a way to use the inexact uniformization method30,31 to find the QTT-formatted matrix exponential with a tensor-flavored implementation that utilizes the alternating minimal energy (AMEn) iterative algorithm.32 It is also where we describe a way to detect the equilibrium distribution at essentially no extra cost. We perform numerical tests in Sec. VII and give some concluding remarks in Sec. VIII.

Consider a system of N different chemical species that are interacting via M chemical reactions; the state of the system is a vector x=(x1,,xN)T of non-negative integers counting each species. The kth reaction, 1kM, is associated with the stoichiometric vector νk=(νk(1),,νk(N))T.

The evolution of the system is modeled as a continuous-time, discrete-state Markov process. The state space is a subset of the lattice of N-tuple of non-negative integers. A reaction k changes the current state x to x+νk, with propensity αk(x)0. In this work, we assume that the propensity functions are separable, i.e.,

(1)

where αk()0 are some scalar functions. This assumption is broad enough to encompass important cases in practice, including the standard mass-action kinetics where

and the Hill functions

Let P(x,t) denote the probability for the system to be in state x at time t and assume that the initial probability distribution at t = 0 is known. The probability distribution evolves over time according to the CME, which has the well-known form

Alternatively, we can think of the probability mass function P as a probability vector p indexed by the states and define the transition rate matrix by

(2)

The CME can then be stated as a system of linear ODEs

where p0 is the initial probability vector representing the initial condition of the biochemical system. This form leads to the exact solution

In practice, however, we cannot directly use the matrix exponential of the full matrix due to its enormous size. Reducing the complexity of the CME through the finite state projection and its tensor train decomposition is the focus of our present work.

By an N-way tensor X of size I1××IN, we mean a set of real numbers X(i1,,iN) where the multi-index is an N-tuple of integers with ik{1,,Ik}. The numbers Ik are called the mode sizes of the tensor X, and the number N is called the order of that tensor. A tensor can be viewed either as a function on the N-dimensional lattice ×s=1NIs or as a linearized vector of length k=1NIk indexed by the elements of this set. It is helpful to keep both aspects in mind when reading our description. We denote the set of all I1××IN tensors by RI1××IN. To keep the presentation succinct, we will make no distinction between a tensor and its vectorization. The usage will be clear from the context.

A linear operator from RI1××IN to RI1××IN can be represented by a tensor 𝒜 of size I1××IN×I1××IN. We can think of 𝒜 as a square matrix whose rows and columns are, respectively, indexed by (i1,,iN)I1××IN and (j1,,jN)I1××IN. Depending on the context, we shall also identify such a tensor with the matrix of size k=1NIk×k=1NIk resulting from its matricization.

The tensor train (TT) representation of a tensor XRI1××IN takes the form

(3)

where Xk(ik)Rrk1×rk, ik{1,,Ik}, with the boundary conditions r0 = rN = 1 to make (3) a scalar. The quantities rk are called the TT ranks of X, and the three dimensional arrays

(4)

are called the TT cores. We shall denote these TT cores by Xk(),k{1,,N}.

While an explicit storage of the probability vector of the CME would require k=1NIk=O(IN) entries, where I=max{I1,,IN}, the TT format only stores the TT cores at a total cost of O(rTT2NI) entries, where rTT=max{r1,,rN1}. Thus, provided rTT is small, the storage of the TT format scales linearly with N, leading to dramatic savings from the explicit format where the storage scales exponentially with N. Any tensor can be written in the TT format by the TT-SVD algorithm that is based on the singular value decomposition (SVD).26 

The CME falls in the common case of a matrix A indexed by rows (i1,,iN) and columns (j1,,jN), which is usually represented in the matrix TT format

(5)

where each Ak(ik,jk) is an rk1×rk matrix with the boundary conditions r0 = rN = 1. Whereas accessing a conventional matrix entry often amounts to a simple table lookup, the TT format implies that doing so through (5) now requires a sequence of smaller matrix-vector evaluations, evocative of how the advent of fast computerized algorithms superseded past mathematical tables of function values. Nevertheless, we should not lose sight of the fact that there is really no computational overhead in using (5) when the dramatic savings in storage makes possible to successfully solve problems that could not be solved in the first place. Consider also that individual entries will seldom be needed if we devise matrix-free algorithms that leverage the functional representation. Moreover, efficient CPUs can execute substantial computations without adversely affecting the CPU time.

Something else worth highlighting at this point is that our adaptive scheme starts with a relatively small tensor, meaning that it quickly obtains the initial decomposition (through the TT-SVD mentioned earlier), and it then uses cheap updates to gradually account for more likely states of the CME as explained in Sec. IV.

A difficulty in dealing with the TT format is that even the most basic operations of addition and multiplication increase the TT ranks as we now illustrate. Consider two tensors X,YRI1××IN that are already given in the TT format, i.e.,

The sum Z=X+Y can also be written exactly in the TT format with cores Z1(),,ZN() given by the matrix concatenation

The matrix-vector product between A given by (5), and an I1××IN TT tensor X given by (3) has a TT form with cores Y1,,YN given by

Hence, the resulting tensor ranks increase quickly after each product. This discourages the direct use of Krylov subspace approximation techniques33–35 that rely heavily on matrix-vector products. Fortuitously, there is an efficient way to evaluate the matrix exponential through the uniformization method as we shall show in Sec. VI B.

So far, we have focused on the standard TT format that separates the “physical” modes of the tensor. In the case of the CME, these modes correspond to the species and their sizes could be large if there are several hundreds of molecules in each species as we will encounter in the numerical examples in Sec. VII. It then becomes more advantageous to reshape the tensor into another tensor of higher order with smaller mode sizes and apply the TT decomposition on the latter. This is the idea behind the quantized tensor train (QTT) format.27,36 In fact, Khoromskij27 proves that the QTT rank remains constant (or small) for a wide class of function-generated tensors, and practical experience corroborates that the QTT rank scales linearly in the number of modes and logarithmically in the mode sizes (cf. Table I). This is why our implementation will use the QTT format that we discuss next.

TABLE I.

Memory requirements of tensor formats for a tensor of size I××I, where rTT is the maximum of the TT ranks and rQTT is the maximum of the QTT ranks. In general, these ranks are not the same.

FormatExplicitTensor trainQuantized tensor train
Storage cost IN O(rTT2NI) O(rQTT2Nlog2I) 
FormatExplicitTensor trainQuantized tensor train
Storage cost IN O(rTT2NI) O(rQTT2Nlog2I) 

With the assumption that Is=2Ls, a tensor X of order N and size I1××IN can be reshaped into a tensor Y of order L:=L1++LN and size 2××2 through the correspondence

where ik(s)1{0,1} are obtained from the binary representation of is − 1, i.e.,

(6)

in which the notation ()2 means the base 2 representation of an integer. We also refer to ik(s) as the virtual indices of the tensor X. The QTT format of X is then defined as the TT format of the reshaped tensor Y, and we call the cores (respectively, ranks) of the TT representation of Y the QTT cores (respectively, QTT ranks) of X.

The QTT format requires O(rQTT2Nlog2I) entries, where I=max{I1,,IN} and rQTT is the maximum of the QTT ranks. When rQTT is small, the storage complexity is logarithmic in the mode size I and linear in the order N. Dolgov and Khoromskij28 have analyzed the QTT ranks of the CME matrix, while Kazeev and Schwab37 gave bounds in the context of the stationary solution. Our implementation uses the TT-Toolbox (https://github.com/oseledets/TT-Toolbox) that provides handy routines to operate in the QTT format, which we combine with an adaptive finite state projection as we describe next.

The size of the CME could be infinite or extremely large. To alleviate this problem, Munsky and Khammash7 introduced the finite state projection (FSP) that truncates the problem to a finite size. To obtain the QTT formulation of the FSP, we will restrict to a hyper-rectangle and then write the FSP-truncated CME operator in terms of Kronecker products as in previous studies.23,28,29 The hyper-rectangle is

(7)

where Ik are positive integers and can be interpreted as maximum copy numbers of the chemical species. Recalling the separability assumption (1), we define the vector

Now, using Kronecker products, the FSP truncation of the CME operator corresponding to the hyper-rectangle (7) is

(8)

where I is the identity matrix of size I1IN×I1IN. Sk(s) are Is×Is shift matrices defined by

In our implementation, the matrices Sk(s) are generated quickly with the routine tt_qshift of the TT-Toolbox (cf. the work of Kazeev and Khoromskij38 for the mathematical foundation of the routine). Given a time step τ, the one-step solution to the CME is then approximated by

(9)

with padding by zeros as necessary, and [p(0)]H the associated restriction of the initial condition to H.

In practice, we use a step-by-step integration scheme, and as shown in Sec. V, the truncation error will be controlled by imposing a condition based on a bound similar to that of the classic FSP.7 While previous studies employed the tensor-structured FSP for the whole integration time,28,29 we will make the FSP adaptive on-the-fly in several ways as explained later. We begin first by describing a new variant of the FSP targeted at hyper-rectangles where we dynamically adjust their boundaries according to the spread of the probability mass.

Our starting point is to restrict the original CME to the subset of states in the hyper-rectangle H, where the states on its boundary are turned into artificial absorbing states in order to keep track of the probability mass that flows out of H.

Definition IV.1.
A state x=(x1,,xN)TH is said to be on the boundary of H, denoted by H, if there is a chemical reaction that can take it out of H. In other words,
(10)
For 1sN, let H(s) denote the sth boundary of the hyper-rectangle, defined as
(11)
Clearly,
(12)
We seek to downsize the CME by restricting it to H and turning H into fictitious absorbing states. We will see that the corresponding transition rate matrix can be obtained by a simple modification of (8). Denote νmax(s)=max1kMνk(s), the maximum increment any reaction can make to the population of species s. In most models, νmax(s){1,2}. First, we define the restricted propensity factors
and stack them into a vector of the form
(13)
With this notation, the smaller transition rate matrix takes the form
(14)
It is worth noting that the routines in the TT-toolbox were so efficient that we could afford regenerating the QTT representation of matrix (14) many times in our adaptive scheme (to be detailed in Sec. V) without observing a significant overhead.

The purpose of making the boundary states absorbing is to keep track of the probability mass that might leak out of the current FSP so that it can be used to monitor the error. The amount of probability mass absorbed in the boundary states can be bounded by summing up marginal distributions as stated below.

Lemma IV.1.
Letp:H[0,1]be a probability mass function defined onHandHdenote the set of boundary states ofH, we have
(15)
wherep(s)is the s-th marginal distribution corresponding to species s, defined as

Proof.

This follows from (12) and the fact that p is a probability measure. □

Using the preceding lemma and the same techniques as in the proof of the original FSP theorem, we have the following result.

Theorem IV.2.
Assume that the hyper-rectangleHdefined in(7)contains the support of the initial distributionp(0)of the CME. Consider the reduced Markov process with state spaceH, transition rate matrix as in(14), and initial distributionp^(0)=[p(0)]H. Letp^(x;τ)be the transient probability distribution of the reduced Markov process at timeτandp^(s)the corresponding marginal distributions of the s-th species,s=1,,N. Letδ>0be a chosen tolerance value. If
(16)
then the error satisfies
(17)

Proof.
For xHH, p^(x,τ) is the probability that the system is at state x at time τ and never transits outside of HH in the time interval [0,τ]. We have the inequality
Let X be the whole state space and keep in mind that X=HHC=(HH)(HH)C. Then, based on the fact that p and p^ are probability vectors, we have
The expression 2xHp^(x;τ) is less than Nδ based on Lemma IV.1 and inequality (16), which completes the proof. □

The theorem provides a computable error bound when the FSP is restricted to a hyper-rectangle with absorbing boundaries. It also suggests a way to improve accuracy through probability tracking and expansion in the directions where the probability mass leaks the most. Indeed, if (16) fails for some species s, we can selectively increase the corresponding bounds Is and try again with the enlarged H. This will be the strategy used in our step-by-step integration scheme that we explain next.

Having described just one step of our scheme, we now show how it is embedded in an adaptive step-by-step integration procedure that advances the solution through time points t0:=0<t1<<tK:=tf using the recurrence

(18)

where τk is the stepsize and Hk is the hyper-rectangle enclosing the most likely states of the CME within the time interval [tk,tk+τk], with A¯Hk and pk in the QTT format. All these quantities change adaptively across the K total number of steps eventually needed. Recall that operand vectors are padded with zeros for consistency. Section VI will address the selection of the stepsize and the evaluation of the matrix exponential in (18).

We focus here on explaining how adaptivity in our scheme is driven by the control of the local truncation error between consecutive iterates with the goal of approximating the final solution p(tf) at a certain accuracy. To see how this is done, let pk+1(s) be the sth marginal distribution derived from the approximation pk+1 and define the vector

(19)

From Theorem IV.2, the local error when going from tk to tk+τk is bounded by ek+11.

Let 𝜀FSP>0 be a user-prescribed error tolerance; we impose the condition that the global sum of the local errors satisfies

by requiring that

(20)

otherwise, we expand the current hyper-rectangle of the FSP in the directions that violate (20) and retake the step with the ensuing larger projection and an adjusted stepsize τk determined by the uniformization method in Sec. VI.

Our approach is somewhat reminiscent of the sliding window technique9 in that we use a hyper-rectangle. However, we use a posteriori criteria to expand the window instead of allowing the window to slide. Our current implementation does not support sliding hyper-rectangles. We also perform the expansion of the hyper-rectangles by simply doubling the mode sizes in the direction of the highest loss of probability. We discuss next how the corresponding extension of the current solution is done in the QTT format.

When criterion (20) is not met, some sides of the hyper-rectangle have to be increased. Traditional FSP methods7,8,11 that store the solution using a standard vector simply pad zero entries corresponding to the newly added states. However, our method stores the solution in the QTT format, and exactly how the padding is achieved needs some further explanation. We choose to expand by adding a new QTT core. Since each QTT core corresponds to a factor of 2 in the mode sizes of the tensor, our expansion has the effect of doubling the lengths of the sides of the current hyper-rectangle.

Let p be the current solution associated with the hyper-rectangle H={0,,2L11}××{0,,2LN1}. Assume that we double the sth side to obtain

and we want to find the QTT representation of the augmented tensor p^ such that

Let the QTT format of p be

with L=L1++LN cores, then the QTT representation of p^ preserves all the existing cores of p and inserts just one more core after the (L1++Ls)-th core. Specifically, there are now L^=L+1 cores P^(k)() such that P^(k)()=P(k)() for all pairs (,k) except for a newly added P^Ls+1(s)() which we define as

The pseudo-code in Algorithm 1 summarizes the overall structure of our scheme. We describe next how the matrix exponential can be evaluated in the QTT format.

Algorithm 1.

Adaptive FSP-QTT. All matrices and vectors are implicitly understood as being in the QTT format.

Input: Number of species N, number of reactions M, stoichiometry vectors νj and separable propensities αj(x) 
for j=1,,M
Initial states and their probabilities. Final time tf
Initial hyper-rectangle H with sizes 2L1××2LN
Error tolerance 𝜀FSP for FSP truncation. 
Output: Approximate solution pK to the CME at time tf (K is the number of steps). 
1: Form the matrix A¯H defined by (14) in QTT format. 
2: Form the initial distribution p0 based on the initial states and their probabilities. 
3: Set t := 0, k := 0. 
4: whilet<tfdo 
5: Compute the tentative pk+1:=exp(τkA¯H)pk with an appropriate stepsize τk
6: if (criteria (20) fails for any s) then 
7: Set Ls: = Ls + 1 for those s where (20) is not satisfied. 
8: Form the new matrix A¯H and update the QTT representation of pk
9: else 
10: t:=t+τk
11: k := k + 1. 
12: end if 
13: end while 
Input: Number of species N, number of reactions M, stoichiometry vectors νj and separable propensities αj(x) 
for j=1,,M
Initial states and their probabilities. Final time tf
Initial hyper-rectangle H with sizes 2L1××2LN
Error tolerance 𝜀FSP for FSP truncation. 
Output: Approximate solution pK to the CME at time tf (K is the number of steps). 
1: Form the matrix A¯H defined by (14) in QTT format. 
2: Form the initial distribution p0 based on the initial states and their probabilities. 
3: Set t := 0, k := 0. 
4: whilet<tfdo 
5: Compute the tentative pk+1:=exp(τkA¯H)pk with an appropriate stepsize τk
6: if (criteria (20) fails for any s) then 
7: Set Ls: = Ls + 1 for those s where (20) is not satisfied. 
8: Form the new matrix A¯H and update the QTT representation of pk
9: else 
10: t:=t+τk
11: k := k + 1. 
12: end if 
13: end while 

In this section, we implicitly understand vectors and matrices as already in the QTT format.

Let A be the transition rate matrix of the truncated CME in one step of our scheme (18). We drop the bar and the subscript H for brevity and use v to denote the operand probability vector. To advance the solution with stepsize τ, we approximate

using the uniformization method. It evaluates the action of the exponential operator by introducing

(21)

where I is the identity matrix, and 𝜃max(diag(A)) is an upper bound on the absolute values of the diagonal entries of A, which are non-positive as seen in (2). From this, we have exp(τA)=e𝜃τexp(𝜃τP) so that the Taylor approximation yields

(22)

Formula (22) is numerically more stable than a naïve Taylor approximation since P is column-stochastic with non-negative entries bounded by one. The degree m is determined by ensuring the scalar inequality

(23)

which guarantees that the difference in L1-norm between the approximation and the true solution in (22) is below Tolunifτtf.

The selection of the stepsize τ follows the classic uniformization,31 which determines a uniform stepsize for a fixed matrix A. In our case, however, the matrix can expand when the hyper-rectangle is enlarged, and the stepsize τ is not kept uniform but is instead adjusted at each expansion. To be specific, suppose that at step k we have a new matrix A with the corresponding 𝜃max(diag(A)), we now wish to choose a new stepsize τ as if the integration domain was [tk, tf]. From there, as in the classical case,31 we take

(24)

and then set τ:=(tftk)nstep(k). The safety parameter σ is chosen to prevent underflow in the factor e𝜃τ in the uniformization formula (22), and we will experiment with either 10 or 100 in our numerical examples. We note that there is a possibility of tweaking σ adaptively to improve performance, as a larger σ means a larger τ, but this remains to be investigated.

As noted before, an essential component of the uniformization method is the determination of the parameter 𝜃. Preferably, this upper bound should be close to the maximum magnitude of the diagonal entries of A. While it is straightforward in small matrices to find it simply by an exhaustive search, this is not the case with the CME where the diagonal entries are buried deep in an enormous tensor of size 2L1××2LN and so more advanced methods are required. In our implementation, we use the tt_max routine in the TT-toolbox for the estimation of 𝜃.

To implement (22), one could think of simply computing the actions of the matrix P sequentially. This usual approach is not practical in the QTT format. The reason is that matrix-vector products significantly increase the rank, and thus the storage requirement, of the computed vectors to the point of defeating the savings of the tensor format. We instead use a completely different approach based on solving a linear system as follows. Let fj:=Pjv,j=0,,m, then the terms fj satisfy the recurrence

(25)

These relations lead to a linear system of the form

(26)

or, more succinctly,

(27)

where S1 is the lower shift matrix of size (m+1)×(m+1), e1=(1,0,,0)T is the unit vector of length m + 1, and f=(f0T,,fmT)T. We then set this problem in the QTT format and solve it by the Alternating Minimal Energy (AMEn) method of Dolgov and Savostyanov.32 

It is remarkable that even though (25)–(27) are mathematically equivalent representations of f, we observed experimentally that the compact tensor storage format copes better by far with the data-rich representation in (27) than the traditional approach in (25) that repeatedly uses matrix-vector products to disjointly retrieve the terms individually. Our observation here is consistent with the experience in other applications of the QTT format39 (see also Ref. 28), where the all-inclusive linear system formulation is found to be more efficient than sequential evaluations.

If the solution to (26) or (27) is found exactly, one can easily evaluate (22) by

(28)

which amounts to a tensor contraction between the QTT format of f and the vector e𝜃τ(𝜃τ)jj!j=0m. In practice, the iterative solver only returns an approximate solution f̃=(f̃0T,,f̃mT)T and we instead have

(29)

The evaluation of (22) through an approximate solution to (27) is closely related to the inexact uniformization method.31 Let the residual be

Using the partitioning r=(r0T,,rmT)T, where each rk has the same length as each vector fk, we have

Similar to the classical case (Theorem 3.1 in Ref. 31), we can establish the formula for the gap

(30)

where the only notable difference here is that we also include the initial error r0. We then easily arrive at the bound

Thus, the error gap in evaluating (22) through the numerical solution to (27) is bounded by the norm of the residual.

The uniformization framework provides a way to detect when an equilibrium distribution of the Markov chain is reached, namely, through checking if fk+1=fk,k=0,,m1. When the inexact evaluation mentioned above is used, the QTT structure of the solution to (27) gives rise to a simple way of checking this.

Let 2L1××2LN be the size of the tensor terms fj in (25). The tensor form of the solution f to (27) will be of size ×s=1N2Ls×(m+1). Hence, the QTT representation of f consists of L + q cores, where the first L:=L1++LN cores correspond to the quantization of the N “physical” modes of the CME, and the remaining q cores result from the quantization of the last mode of size (m + 1). The Lth rank of the QTT format of f characterizes the separability between the tensors fj. More precisely, we have the following result.

Lemma VI.1.
Let the QTT representation of the exact solution to(27)be
(31)
so thatvec(fj):=vec(f(:,,:,j))forj=0,,m, wherevec(u)stands for the vectorization ofu. If rL = 1, thenf0=f1==fm.

Proof.
Assume that rL = 1. The Lth canonical unfolding of f can then be decomposed into
(32)
but note also that
so we have fk=(βkβ0)f0,k=1,,m. Since the entries of each fk sum to 1, we have β0=β1==βm. This shows that f0==fm. □

Although the theory seems convoluted, the resulting criterion for detecting the equilibrium amounts to simply checking that the Lth QTT rank of f is unity. This quantity is readily available after completion of the AMEn algorithm.

Algorithm 2 summarizes in pseudo-code all the essential building blocks of our step-by-step framework, consisting of the FSP, QTT, inexact Uniformization, and AMEn methods. We clarify here that the use of an adaptive FSP in our method is not targeted at improving the compression rate of the tensor, which is already impressive thanks to the use of the QTT format. Rather, its purpose is to improve computational time, with the added benefit of freeing the user from finicky selections of appropriate values for the species bounds Ik. The improved computational time comes from the fact that, from a numerical viewpoint, smaller hyper-rectangles likely yield smaller norms for the matrix A¯H, thus benefiting the matrix exponential (or the corresponding linear system of ODEs). Also, a smaller hyper-rectangle means that fewer cores are needed in the QTT format. The AMEn algorithm32 requires sweeping through the cores of the tensor train, and a shorter train means shorter sweeps. The downside of having adaptive hyper-rectangles is the overhead of monitoring the error and the potential cost of failed steps (for solving the CME on a rectangle that is too small), but we found this overhead not to be of significant consequence. The following testing section includes numerical experiments to assess the efficiency of our algorithm on well-known test problems in the CME literature. The section reports comparisons of our adaptive approach with the fixed approach that uses a non-adaptive hyper-rectangle large enough from the outset, showing that adaptivity is very beneficial to a problem such as the Goutsias example.

Algorithm 2.

Adaptive FSP-QTT-Uniformization-AMEn. All matrices and vectors are implicitly understood as being in the QTT format.

Input: Number of species N, number of reactions M, stoichiometry vectors νj and separable propensities αj(x) 
for j=1,,M
Initial states and their probabilities. Final time tf
Initial hyper-rectangle H with sizes 2L1××2LN
Error tolerances: 𝜀FSP for FSP, TolAMEn for AMEn, Tolunif for uniformization. 
Output: Approximate solution pK to the CME at time tf (K is the number of steps). 
1: Form the matrix A¯H defined by (14) in QTT format. 
2: Form the initial distribution p0 based on the initial states and their probabilities. 
3: Find 𝜃max(diagA¯H) and determine the stepsize τ and Taylor degree m
4: Form P:=1𝜃A¯H+I
5: Set t := 0, k := 0. 
6: whilet<tfdo 
7: Solve (27) with AMEn to obtain f=(f0TfmT)T
8: Compute w:=j=0me𝜃τ(𝜃τ)jj!fj
9: if (criteria (20) fails for any s) then 
10: Set Ls := Ls + 1 for those s where (20) is not satisfied. 
11: Form the new matrix A¯H and update the QTT representation of pk
12: Determine 𝜃max(diagA¯H), stepsize τ and Taylor degree m
13: Form P:=1𝜃A¯H+I
14: else 
15: if ((L1++LN+1)-th QTT rank of f is 1) then 
16: STOP ‘equilibrium reached’. 
17: end if 
18: t:=t+τ
19: k := k + 1. 
20: pk:=w
21: end if 
22: end while 
Input: Number of species N, number of reactions M, stoichiometry vectors νj and separable propensities αj(x) 
for j=1,,M
Initial states and their probabilities. Final time tf
Initial hyper-rectangle H with sizes 2L1××2LN
Error tolerances: 𝜀FSP for FSP, TolAMEn for AMEn, Tolunif for uniformization. 
Output: Approximate solution pK to the CME at time tf (K is the number of steps). 
1: Form the matrix A¯H defined by (14) in QTT format. 
2: Form the initial distribution p0 based on the initial states and their probabilities. 
3: Find 𝜃max(diagA¯H) and determine the stepsize τ and Taylor degree m
4: Form P:=1𝜃A¯H+I
5: Set t := 0, k := 0. 
6: whilet<tfdo 
7: Solve (27) with AMEn to obtain f=(f0TfmT)T
8: Compute w:=j=0me𝜃τ(𝜃τ)jj!fj
9: if (criteria (20) fails for any s) then 
10: Set Ls := Ls + 1 for those s where (20) is not satisfied. 
11: Form the new matrix A¯H and update the QTT representation of pk
12: Determine 𝜃max(diagA¯H), stepsize τ and Taylor degree m
13: Form P:=1𝜃A¯H+I
14: else 
15: if ((L1++LN+1)-th QTT rank of f is 1) then 
16: STOP ‘equilibrium reached’. 
17: end if 
18: t:=t+τ
19: k := k + 1. 
20: pk:=w
21: end if 
22: end while 

The computing environment is MATLAB 2016b on an ASUS UX51VZA laptop running Ubuntu Linux with 8 GB of RAM and 2.1 GHz Intel core i7 CPU, using routines from the TT-toolbox version 2.2 (https://github.com/oseledets/TT-Toolbox). We specifically use the MEX-Fortran implementation of AMEn in the file fort_amen_solve.ffor fast executions of the tensor linear system solves. In all examples, the tolerance parameters are set at (𝜀FSP,Tolunif,TolAMEn)=(104,104,105). The safety parameter σ in (24) is set at 100 for the toggle switch and the Goutsias examples and at 10 for the p53 example.

We will also present comparative results of our method with the other well-known QTT-based CME solvers28,29 that we summarize here as follows:

  • The hp-DG-QTT solver of Kazeev et al.29 integrates the CME using the hp-discontinuous Galerkin time discretization that results in a time-stepping scheme with a tensor-formatted linear system solved at each step with the DMRG (Density Matrix Renormalization Group) algorithm.40 Since DMRG was found to be less efficient than AMEn in several settings,32 we also implemented a slight variant of the original algorithm by replacing DMRG with AMEn. All the algorithmic parameters of Kazeev et al.29 are kept, except the DMRG solver tolerance that we set to 10−5 (we found the strict tolerance there to be slower), and our implementation makes calls to the efficient MEX routines in the TT-toolbox.

  • The block state-time approach of Dolgov and Khoromskij28 also integrates the CME on a fixed hyper-rectangle, but they use the Crank-Nicolson scheme instead of hp-DG. Furthermore, the Crank-Nicolson steps are not evaluated sequentially but are combined into a block state-time linear system solved with AMEn. The integration interval [0, tf] is broken into subintervals of uniform size T0, and an all-inclusive solver is used on each interval [(q1)T0,qT0], q=1,2,. We set Nt = 1024 time steps for the Crank-Nicolson time-discretization on each subinterval and vary T0 in {10−3tf, 10−2tf, 10−1tf}. We always use the AMEn tolerance 10−5 in our implementation. In what follows, we will call this the QTT-block-Crank-Nicolson approach.

Both of these existing QTT methods use a fixed FSP with a large enough hyper-rectangle as explained in Sec. IV. This is a major difference with our method (Algorithm 2) that uses an adaptive hyper-rectangle with absorbing boundaries.

We revisit the example solved by Kazeev et al.,29 which is a stochastic model for the toggle switch of Gardner et al.41 There are two species U and V that inhibit the production of each other. Table II shows the values of the parameters.29 We start with an initial rectangle of size 25×25 and integrate to tf := 100. Our method detects equilibrium at t14, which leads to an early termination. We ended up using only about 785 s of CPU time (Fig. 1). The equilibrium distribution displays biomodality, which is a well-known characteristic of this system (Fig. 2).

TABLE II.

Reaction channels of the genetic toggle switch example. The parameters29 are α1=5000, α2=1600, β=2.5, γ=1.5, δ1=δ2=1. ([X] is the number of copies of the species X.)

ReactionPropensity
U δ1[U] 
U α11+[V]β 
V δ2[V] 
V α21+[U]γ 
ReactionPropensity
U δ1[U] 
U α11+[V]β 
V δ2[V] 
V α21+[U]γ 
FIG. 1.

Results of Algorithm 2 on the toggle switch example. Left plot: accumulated CPU time and storage requirement of the QTT solution across integration steps. Right plot: number of cores and the maximal QTT rank in the QTT representation of the solution. Note how the number of cores increases as the FSP explores larger regions of the state space.

FIG. 1.

Results of Algorithm 2 on the toggle switch example. Left plot: accumulated CPU time and storage requirement of the QTT solution across integration steps. Right plot: number of cores and the maximal QTT rank in the QTT representation of the solution. Note how the number of cores increases as the FSP explores larger regions of the state space.

Close modal
FIG. 2.

Toggle switch example. Equilibrium distribution, reached at t14.

FIG. 2.

Toggle switch example. Equilibrium distribution, reached at t14.

Close modal

To compare our solution with the two existing CME solvers that also used the QTT format,28,29 we fixed their FSPsize to 213×211, a tight range suggested by our method and which is slightly smaller than the hyper-rectangle tested by Kazeev et al.29 From the results reported in Table III and Fig. 3, we see that our method was about at least two times faster than previous approaches that deliver results with comparable accuracy on this problem. In addition, our results confirm the effectiveness of AMEn over DMRG for the hp-DG scheme.

TABLE III.

Toggle switch example. Performance of different QTT-based CME solvers. The column “Gap” is the distance (in Frobenius norm) of each method to our new method (Algorithm 2). The output of the block state-time scheme of Dolgov and Savostyanov32 gets closer to ours in the settings with finer time mesh.

MethodCPU time (s)Gap (.F)
New method (Algorithm 2785.21 … 
hp-DG-QTT-DMRG 8773.58 5.85 × 10−4 
hp-DG-QTT-AMEn 2113.32 2.86 × 10−5 
QTT-block-Crank-Nicolson (T0 = 0.1) 2106.66 2.73 × 10−5 
QTT-block-Crank-Nicolson (T0 = 1) 9194.16 6.50 × 10−3 
QTT-block-Crank-Nicolson (T0 = 10) 469.58 5.28 × 10−2 
MethodCPU time (s)Gap (.F)
New method (Algorithm 2785.21 … 
hp-DG-QTT-DMRG 8773.58 5.85 × 10−4 
hp-DG-QTT-AMEn 2113.32 2.86 × 10−5 
QTT-block-Crank-Nicolson (T0 = 0.1) 2106.66 2.73 × 10−5 
QTT-block-Crank-Nicolson (T0 = 1) 9194.16 6.50 × 10−3 
QTT-block-Crank-Nicolson (T0 = 10) 469.58 5.28 × 10−2 
FIG. 3.

Toggle switch example. Accumulated CPU time of our method and previous QTT schemes. On the legend: hp-DG-QTT-DMRG is the method of Kazeev et al.,29 and hp-DG-QTT-AMEn is our re-implementation of that method by replacing DMRG with AMEn32 for comparison purposes; QTT-block-CN is the method of Dolgov and Khoromskij28 with the subinterval length T0 := 0.1. We only show settings that yield comparable answers (see Table III for more details).

FIG. 3.

Toggle switch example. Accumulated CPU time of our method and previous QTT schemes. On the legend: hp-DG-QTT-DMRG is the method of Kazeev et al.,29 and hp-DG-QTT-AMEn is our re-implementation of that method by replacing DMRG with AMEn32 for comparison purposes; QTT-block-CN is the method of Dolgov and Khoromskij28 with the subinterval length T0 := 0.1. We only show settings that yield comparable answers (see Table III for more details).

Close modal

The Goutsias model refers to a simplified version of a time-varying model of transcription regulation proposed by Goutsias.42 This model is considered challenging to CME solvers8,9,11,43 because its probability mass spreads quickly in the directions of the first three species, pushing the FSP to expand into millions of states. Table IV shows the parameters chosen as in the Krylov-FSP,8 and the initial state is

We set tf := 1000, which propels the time horizon much further than in existing attempts on this example. Our method took about 3 h to reach the end time, which is expected for this challenging problem. However, what is surprising is that the QTT representation used less than 112 830 entries to store the QTT solution (Fig. 4). This contrasts with previous results8,11 where a much shorter time tf = 300 already demands one million entries to store the approximate solution. The marginal distributions from the QTT format fit tightly with those generated by one million trajectories using Gillespie’s stochastic simulation algorithm (SSA),44 as seen in Fig. 5.

TABLE IV.

Reaction channels in the Goutsias model of regulated transcription. The parameter A=6.0221415×1023 is Avogadro’s number, and V = 10−15L is the system volume chosen for this experiment. ([X] is the number of copies of the species X.)

ReactionPropensityRate constant (s−1)
1. RNARNA+M c1(RNAc1 = 0.043 
2. M c2(Mc2 = 0.0007 
3. DNA.DRNA+DNA.D c3[DNA.Dc3 = 0.0715 
4. RNA c4(RNAc4 = 0.0039 
5. DNA+DDNA.D c5[DNA][Dc5=0.012×109AV 
6. DNA.DDNA+D c6[DNA.Dc6 = 0.4791 
7. DNA.D+DDNA.2D c7[DNA.D][Dc7=0.00012×109AV 
8. DNA.2DDNA.D+D c8[DNA.2Dc8=0.8765×1011 
9. M+MD c92[M]([M]1) c9=0.05×109AV 
10. DM+M c10[Dc10 = 0.5 
ReactionPropensityRate constant (s−1)
1. RNARNA+M c1(RNAc1 = 0.043 
2. M c2(Mc2 = 0.0007 
3. DNA.DRNA+DNA.D c3[DNA.Dc3 = 0.0715 
4. RNA c4(RNAc4 = 0.0039 
5. DNA+DDNA.D c5[DNA][Dc5=0.012×109AV 
6. DNA.DDNA+D c6[DNA.Dc6 = 0.4791 
7. DNA.D+DDNA.2D c7[DNA.D][Dc7=0.00012×109AV 
8. DNA.2DDNA.D+D c8[DNA.2Dc8=0.8765×1011 
9. M+MD c92[M]([M]1) c9=0.05×109AV 
10. DM+M c10[Dc10 = 0.5 
FIG. 4.

Results of Algorithm 2 on the Goutsias example. Left plot: accumulated CPU time and storage requirement of the QTT solution across integration steps. Right plot: number of cores and the maximal QTT rank in the QTT representation of the solution.

FIG. 4.

Results of Algorithm 2 on the Goutsias example. Left plot: accumulated CPU time and storage requirement of the QTT solution across integration steps. Right plot: number of cores and the maximal QTT rank in the QTT representation of the solution.

Close modal
FIG. 5.

Goutsias example. Marginal distributions of monomer (left) and dimer (right), generated from our QTT solution (solid) and one million SSA trajectories (bar).

FIG. 5.

Goutsias example. Marginal distributions of monomer (left) and dimer (right), generated from our QTT solution (solid) and one million SSA trajectories (bar).

Close modal

We conduct further tests to compare our method with the QTT approach of Dolgov and Khoromskij,28 using subinterval lengths T0{1,10,100} (cf. Fig. 6 and Table V). However, setting T0 := 10 turned out to yield very poor performance, with the method taking over 10 h only to reach t = 650. Thus, we omit the case T0 := 10 in this example.

FIG. 6.

Goutsias example. Accumulated CPU time of our method and previous QTT schemes. On the legend: QTT-block-CN is the method of Dolgov and Khoromskij28 with the subinterval lengths T0 set as 0.1. We only show the settings that yield comparable answers (see Table V for more details).

FIG. 6.

Goutsias example. Accumulated CPU time of our method and previous QTT schemes. On the legend: QTT-block-CN is the method of Dolgov and Khoromskij28 with the subinterval lengths T0 set as 0.1. We only show the settings that yield comparable answers (see Table V for more details).

Close modal
TABLE V.

Goutsias example. Performance of different QTT-based CME solvers. The column “Gap” is the distance (in Frobenius norm) of each method to ours (cf. Algorithm 2).

MethodCPU time (s)Gap (.F)
New method (Algorithm 29 955.06 … 
QTT-block-Crank-Nicolson (T0 = 1) 14 227.07 1.81 × 10−4 
QTT-block-Crank-Nicolson (T0 = 100) 840.69 7.47 × 10−3 
MethodCPU time (s)Gap (.F)
New method (Algorithm 29 955.06 … 
QTT-block-Crank-Nicolson (T0 = 1) 14 227.07 1.81 × 10−4 
QTT-block-Crank-Nicolson (T0 = 100) 840.69 7.47 × 10−3 

The p53 gene is an important barrier against the development of cancerous cells. Here, we revisit a model45 of p53 regulation (cf. Table VI) that we attempted to solve before using the sparse tensor format.46 The QTT format is far more well-suited to this particular problem as we will see here. In our experiment, we set the integration time tf := 1000 and, to make the problem challenging, an initial state of high protein populations

Our method was able to solve the problem in 183.14 s, with the QTT format using no more than 116 952 entries in its representation of the solution. Figure 7 illustrates the performance and adaptivity of our algorithm. The marginal distributions from the QTT format fit tightly with those generated by one million SSA trajectories, as seen in Fig. 8. Figure 9 and Table VII show some comparisons between our method and the scheme of Dolgov and Khoromskij.28 

TABLE VI.

Stochastic model of p53 regulation.45 ([X] is the number of copies of the species X.)

ReactionPropensityParameters
1. p53 kp kp = 0.5 
2. p53 dp[p53]+k1[p53][MDM2nuc] k1=9.963×106 
   dp=1.925×105 
3. RNAnuc km+k2[p53]1.8kD1.8+[p53]1.8 (km,k2,kD)=(1.5×103,1.5×102,740) 
4. RNAnucRNAcyt k0[RNAnuc] k0=8.0×104 
5. RNAcyt drc[RNAcyt] drc=1.444×104 
6. RNAcytRNAcyt+MDM2cyt kT[RNAcyt] kT=1.66×102 
7. MDM2cytMDM2nuc ki[MDM2cyt] ki=9.0×104 
8. MDM2nuc+MDM2nucMDM2nuc dmn[MDM2nuc]([MDM2nuc]1) dmn=1.66×107 
9. MDM2nuc+ARFMDM2nuc.ARF k3[MDM2nuc][ARF] k3=9.963×106 
10. ARF ka ka = 0.5 
11. ARF da[ARF] da=3.209×105 
ReactionPropensityParameters
1. p53 kp kp = 0.5 
2. p53 dp[p53]+k1[p53][MDM2nuc] k1=9.963×106 
   dp=1.925×105 
3. RNAnuc km+k2[p53]1.8kD1.8+[p53]1.8 (km,k2,kD)=(1.5×103,1.5×102,740) 
4. RNAnucRNAcyt k0[RNAnuc] k0=8.0×104 
5. RNAcyt drc[RNAcyt] drc=1.444×104 
6. RNAcytRNAcyt+MDM2cyt kT[RNAcyt] kT=1.66×102 
7. MDM2cytMDM2nuc ki[MDM2cyt] ki=9.0×104 
8. MDM2nuc+MDM2nucMDM2nuc dmn[MDM2nuc]([MDM2nuc]1) dmn=1.66×107 
9. MDM2nuc+ARFMDM2nuc.ARF k3[MDM2nuc][ARF] k3=9.963×106 
10. ARF ka ka = 0.5 
11. ARF da[ARF] da=3.209×105 
FIG. 7.

Results of Algorithm 2 on the p53 example. Left plot: accumulated CPU time and storage requirement of the QTT solution across integration steps. Right plot: number of cores and the maximal TT rank in the QTT representation of the solution.

FIG. 7.

Results of Algorithm 2 on the p53 example. Left plot: accumulated CPU time and storage requirement of the QTT solution across integration steps. Right plot: number of cores and the maximal TT rank in the QTT representation of the solution.

Close modal
FIG. 8.

Results of Algorithm 2 on the p53 example. Marginal distributions of cytoplasmic MDM2 (left) and nuclear p53 (right), generated from the QTT solution (solid) and one million SSA trajectories (bar).

FIG. 8.

Results of Algorithm 2 on the p53 example. Marginal distributions of cytoplasmic MDM2 (left) and nuclear p53 (right), generated from the QTT solution (solid) and one million SSA trajectories (bar).

Close modal
FIG. 9.

p53 example. Accumulated CPU time of our method and existing QTT schemes. On the legend: QTT-block-CN refers to the method of Dolgov and Khoromskij28 with subinterval lengths T0 set to 1, 10, and 100.

FIG. 9.

p53 example. Accumulated CPU time of our method and existing QTT schemes. On the legend: QTT-block-CN refers to the method of Dolgov and Khoromskij28 with subinterval lengths T0 set to 1, 10, and 100.

Close modal
TABLE VII.

p53 example. Difference in performance between our method and the global scheme of Dolgov and Khoromskij.28 The column “Gap” is the distance in Frobenius norm from the solution computed in each scheme to that of our adaptive method.

MethodCPU time (s)Gap (.F)
New method (Algorithm 2183.14 … 
QTT-block-Crank-Nicolson (T0 = 1) 311.93 5.91 × 10−7 
QTT-block-Crank-Nicolson (T0 = 10) 91.36 4.99 × 10−7 
QTT-block-Crank-Nicolson (T0 = 100) 178.67 3.23 × 10−6 
MethodCPU time (s)Gap (.F)
New method (Algorithm 2183.14 … 
QTT-block-Crank-Nicolson (T0 = 1) 311.93 5.91 × 10−7 
QTT-block-Crank-Nicolson (T0 = 10) 91.36 4.99 × 10−7 
QTT-block-Crank-Nicolson (T0 = 100) 178.67 3.23 × 10−6 

We perform more tests to study the effect of using adaptive hyper-rectangles in our method. To do so, we solve the three test problems in Sec. VII again using Algorithm 2 with adaptive hyper-rectangles turned off. For the fixed FSP in these tests, we use the largest hyper-rectangles selected by the adaptive version (cf. Table VIII). Note that when the hyper-rectangle is fixed, Algorithm 2 will use uniform Taylor degree m and stepsize τ across all time steps. The non-adaptive setting is relieved from the overhead of checking the FSP error and related costs, and this could result in a marginal gain over the adaptive version in some problems (toggle switch and p53). However, we see that the fixed FSP leads to an increase of about 2.5 times in computational time in the Goutsias model. The large hyper-rectangle forces the uniformization to select a small stepsize in the Taylor approximation, and this leads to increased computational cost in a long integration time.

TABLE VIII.

Results of Algorithm 2 for the settings with adaptive hyper-rectangle (left) and fixed hyper-rectangle (right). nspecies: number of species/dimensions; m: Taylor truncation index; τmin: minimum stepsize; τmax: maximum stepsize; τavg: average stepsize; CPU: computational time (in seconds). Note that the non-adaptive stepsize uses a uniform stepsize for the whole integration.

AdaptiveNon-adaptive
Problem (nspecies)nstepmτminτmaxτavgCPU (s)nstepmτCPU (s)
Toggle switch (2) 1887 255 7.58 × 10−3 1.51 × 10−2 7.71 × 10−3 785.21 2 259 255 7.58 × 10−3 700.35 
Goutsias (6) 8844 255 3.36 × 10−2 1.78 × 10+00 1.13 × 10−1 9955.06 29 722 255 3.36 × 10−2 24 600.75 
p53 (6) 439 31 1.53 × 10+00 6.21 × 10+00 2.28 × 10+00 183.14 655 31 1.53 × 10+00 127.09 
AdaptiveNon-adaptive
Problem (nspecies)nstepmτminτmaxτavgCPU (s)nstepmτCPU (s)
Toggle switch (2) 1887 255 7.58 × 10−3 1.51 × 10−2 7.71 × 10−3 785.21 2 259 255 7.58 × 10−3 700.35 
Goutsias (6) 8844 255 3.36 × 10−2 1.78 × 10+00 1.13 × 10−1 9955.06 29 722 255 3.36 × 10−2 24 600.75 
p53 (6) 439 31 1.53 × 10+00 6.21 × 10+00 2.28 × 10+00 183.14 655 31 1.53 × 10+00 127.09 

In this work, we have developed a new approach for solving the chemical master equation by combining the finite state projection, tensor train decomposition, inexact uniformization, and AMEn methods. In comparison to previous CME solvers that have also leveraged the tensor train format,28,29 our method makes a distinctive use of a new method for efficiently evaluating the action of a QTT-formatted matrix exponential using a Taylor approximation in a data-rich tensor representation, with an adaptive FSP and stepsize selection strategy. The numerical tests confirm the practical benefits of these adaptive mechanisms. Overall, our method yields substantial savings in execution time on top of the impressive savings in memory enabled by the QTT/TT format. Our QTT-based inexact uniformization can potentially be used to evaluate the transient solution to other large-scale Markov chains such as those arising in stochastic automata networks.47 

The present work opens new directions for further developments. For example, instead of just expanding the adaptive hyper-rectangles in our scheme, they could slide and/or shrink, with the effect of reducing the number of QTT cores in the representation of the solution. When advancing the integration or reporting the final results, there could be other refinements stemming from the error control, provided that the interior states of the hyper-rectangles could be economically distinguished and treated differently from the boundary states that were turned into artificial absorbing states. We note further that the quantized tensor approximation could be constructed not only for the tensor train26 but also for other formats such as those in the Hierarchical Tucker family48 (this is also discussed by Khoromskij27), and it might be insightful to further compare the tensor train with other formats. However, practical experience indicates that the TT/QTT provides us with the simplest implementation with impressive compression rates. It could also be useful to generalize the way we recast the inexact uniformization in the QTT format in tandem with AMEn to other applications that make use of the matrix exponential. Conversely, alternative ways of evaluating the matrix exponential in tensor format49 could lead to other variants of the algorithm presented here.

This work was supported by NSF Grant No. DMS-1320849. The authors thank the anonymous referees for their comments.

1.
J.
Goutsias
and
G.
Jenkinson
,
Phys. Rep.
529
,
199
(
2013
).
2.
M. B.
Elowitz
,
A. J.
Levine
,
E. D.
Siggia
, and
P. S.
Swain
,
Science
297
,
1183
(
2002
).
3.
C. V.
Rao
,
D. M.
Wolf
, and
A. P.
Arkin
,
Nature
420
,
231
(
2002
).
4.
M.
Kaern
,
T. C.
Elston
,
W. J.
Blake
, and
J. J.
Collins
,
Nat. Rev. Genet.
6
,
451
(
2005
).
5.
B.
Munsky
,
B.
Trinh
, and
M.
Khammash
,
Mol. Syst. Biol.
5
,
318
(
2009
).
6.
B.
Munsky
,
Z.
Fox
, and
G.
Neuert
,
Methods
85
,
12
(
2015
).
7.
B.
Munsky
and
M.
Khammash
,
J. Chem. Phys.
124
,
044104
(
2006
).
8.
K.
Burrage
,
M.
Hegland
,
S.
MacNamara
, and
R. B.
Sidje
, in
150th Markov Anniversary Meeting, Charleston, SC, USA
, edited by
A. N.
Langville
and
W. J.
Stewart
(
Boson Books
,
2006
), pp.
21
38
.
9.
V.
Wolf
,
R.
Goel
,
M.
Mateescu
, and
T. A.
Henzinger
,
BMC Syst. Biol.
4
,
42
(
2010
).
10.
B.
Munsky
and
M.
Khammash
,
J. Comput. Phys.
226
,
818
(
2007
).
11.
R. B.
Sidje
and
H. D.
Vo
,
Math. Biosci.
269
,
10
(
2015
).
12.
K. N.
Dinh
and
R. B.
Sidje
,
Phys. Biol.
13
,
035003
(
2016
).
13.
P.
Deuflhard
,
W.
Huisinga
,
T.
Jahnke
, and
M.
Wulkow
,
SIAM J. Sci. Comput.
30
,
2990
(
2008
).
14.
T.
Jahnke
and
T.
Udrescu
,
J. Comput. Phys.
229
,
5724
(
2010
).
15.
I.
Kryven
,
S.
Röblitz
, and
C.
Schütte
,
BMC Syst. Biol.
9
,
67
(
2015
).
16.
Y.
Cao
,
A.
Terebus
, and
J.
Liang
,
Multiscale Model. Simul.
14
,
923
(
2016
).
17.
S.
Peleš
,
B.
Munsky
, and
M.
Khammash
,
J. Chem. Phys.
125
,
204104
(
2006
).
18.
S.
MacNamara
,
A. M.
Bersani
,
K.
Burrage
, and
R. B.
Sidje
,
J. Chem. Phys.
129
,
095105
(
2008
).
19.
T.
Jahnke
,
Multiscale Model. Simul.
9
,
1646
(
2011
).
20.
T. G.
Kolda
and
B. W.
Bader
,
SIAM Rev.
51
,
455
(
2008
).
21.
B. N.
Khoromskij
,
Chemom. Intell. Lab. Syst.
110
,
1
(
2012
).
22.
L.
Grasedyck
,
D.
Kressner
, and
C.
Tobler
,
GAMM Mitt.
36
,
53
(
2013
); e-print arXiv:1302.7121v1.
23.
M.
Hegland
and
J.
Garcke
,
ANZIAM J.
52
,
C628
(
2011
).
24.
S.
Liao
,
T.
Vejchodský
, and
R.
Erban
,
J. R. Soc., Interface
12
,
20150233
(
2015
).
25.
I. V.
Oseledets
and
E.
Tyrtyshnikov
,
SIAM J. Sci. Comput.
31
,
3744
(
2009
).
26.
I. V.
Oseledets
,
SIAM J. Sci. Comput.
33
,
2295
(
2011
).
27.
B. N.
Khoromskij
,
Constr. Approximation
34
,
257
(
2011
).
28.
S.
Dolgov
and
B. N.
Khoromskij
,
Numer. Linear Algebra Appl.
22
,
197
(
2013
).
29.
V.
Kazeev
,
M.
Khammash
,
M.
Nip
, and
C.
Schwab
,
PLoS Comput. Biol.
10
,
e1003359
(
2014
).
30.
W. J.
Stewart
,
Introduction to the Numerical Solution of Markov Chains
(
Princeton University Press
,
Princeton, NJ
,
1994
).
31.
R. B.
Sidje
,
K.
Burrage
, and
S.
MacNamara
,
SIAM J. Sci. Comput.
29
,
2562
(
2007
).
32.
S.
Dolgov
and
D.
Savostyanov
,
SIAM J. Sci. Comput.
36
,
A2248
(
2014
).
33.
Y.
Saad
,
SIAM J. Numer. Anal.
29
,
209
(
1992
).
34.
R. B.
Sidje
,
ACM Trans. Math. Software
24
,
130
(
1998
).
35.
H. D.
Vo
and
R. B.
Sidje
,
Numer. Linear Algebra Appl.
24
,
e2090
(
2017
).
36.
I. V.
Oseledets
,
SIAM J. Matrix Anal. Appl.
31
,
2130
(
2010
).
37.
V.
Kazeev
and
C.
Schwab
,
SIAM J. Matrix Anal. Appl.
36
,
1221
(
2015
).
38.
V.
Kazeev
,
B. N.
Khoromskij
, and
E.
Tyrtyshnikov
,
SIAM J. Sci. Comput.
35
,
A1511
(
2013
).
39.
S. V.
Dolgov
,
B. N.
Khoromskij
, and
I. V.
Oseledets
,
SIAM J. Sci. Comput.
34
,
A3016
(
2012
).
40.
S. V.
Dolgov
and
I. V.
Oseledets
,
SIAM J. Sci. Comput.
34
,
A2718
(
2012
).
41.
T. S.
Gardner
,
C. R.
Cantor
, and
J. J.
Collins
,
Nature
403
,
339
(
2000
).
42.
J.
Goutsias
,
J. Chem. Phys.
122
,
184102
(
2005
).
43.
S.
Macnamara
,
K.
Burrage
, and
R. B.
Sidje
,
Multiscale Model. Simul.
6
,
1146
(
2008
).
44.
D. T.
Gillespie
,
J. Phys. Chem.
81
,
2340
(
1977
).
45.
G. B.
Leenders
and
J. A.
Tuszynski
,
Front. Oncol.
3
,
64
(
2013
).
46.
H. D.
Vo
and
R. B.
Sidje
,
Phys. Biol.
13
,
035001
(
2016
).
47.
B.
Plateau
and
W. J.
Stewart
, in
Computational Probability
, edited by
W. K.
Grassmann
(
Springer US
,
Boston, MA
,
2000
), pp.
113
151
.
48.
L.
Grasedyck
,
SIAM J. Matrix Anal. Appl.
31
,
2029
(
2010
).
49.
I.
Gavrilyuk
and
B.
Khoromskij
,
Comput. Methods Appl. Math.
11
,
273
(
2011
).