Solving the chemical master equation directly is difficult due to the curse of dimensionality. We tackle that challenge by a numerical scheme based on the quantized tensor train (QTT) format, which enables us to represent the solution in a compressed form that scales linearly with the dimension. We recast the finite state projection in this QTT framework and allow it to expand adaptively based on proven error criteria. The end result is a QTT-formatted matrix exponential that we evaluate through a combination of the inexact uniformization technique and the alternating minimal energy algorithm. Our method can detect when the equilibrium distribution is reached with an inexpensive test that exploits the structure of the tensor format. We successfully perform numerical tests on high-dimensional problems that had been out of reach for classical approaches.
I. INTRODUCTION
Complex interacting networks are prevalent across many fields of science and engineering, ranging from chemical kinetics and pharmacology to social sciences.1 In biochemistry, such networks notably arise through the reactions between the cell’s components such as DNA and RNA molecules. Since these key molecules often appear in low copy numbers, the intrinsic randomness of their interactions becomes significant.2–5 This motivates a stochastic framework that treats the chemical populations as a continuous-time, discrete-state Markov process. Finding the time-dependent probability distribution of this process amounts to solving the chemical master equation (CME).6
Many recent direct CME solvers build on the finite state projection (FSP) of Munsky and Khammash,7 which allows the extremely large or even infinite CME to be truncated to a finite, more tractable, system of linear ordinary differential equations (ODEs). These solvers improve efficiency with selective choices of the projection and adaptive time-stepping schemes.8–12 Alternatively, one can represent the FSP-truncated system using a different basis, such as orthogonal polynomials,13 wavelets,14 and radial basis functions.15 We also mention the multi-finite buffers introduced by Cao et al.16 In addition, there are special methods that exploit specific properties of the underlying system such as the time scale separation17 and the quasi-steady state assumption.18 One can also develop hybrid methods to reduce the dimensionality.19
In this paper, we are interested in representing the high-dimensional solution to the CME in a low-rank tensor format. Similar endeavors have proved effective in several high-dimensional problems,20–22 and research efforts for solving the CME are benefiting from the ideas such as the early work of Hegland and Garcke23 who proposed a tensor-based CME solver using the canonical polyadic (CP) format. Also, Liao et al.24 recently applied tensor methods to fit steady-state data using the stationary solution to the chemical Fokker-Planck equation. We focus here on the tensor train (TT) format,25,26 in particular the quantized tensor train (QTT) variant.27 We note that there have been other promising studies to solve the CME using the QTT format.28,29 These previous studies employed the QTT scheme on the macro-time steps, whereas our contribution further develops the scheme by introducing new adaptive integration strategies to vary both the FSP and the integration steps, resulting in substantially reduced execution times as comparative numerical tests will show.
The organization of the paper is as follows. Sections II and III are review sections, briefly outlining, respectively, the CME and then the tensor train and its quantized variant. From there the rest of the paper delves into our new QTT-based CME solver, featuring in Sec. IV a tensor formulation of the FSP on hyper-rectangles with absorbing boundaries, after which we describe in Sec. V how we allow the FSP to extend adaptively and how we update the cores of the QTT format accordingly. We then develop in Sec. VI a way to use the inexact uniformization method30,31 to find the QTT-formatted matrix exponential with a tensor-flavored implementation that utilizes the alternating minimal energy (AMEn) iterative algorithm.32 It is also where we describe a way to detect the equilibrium distribution at essentially no extra cost. We perform numerical tests in Sec. VII and give some concluding remarks in Sec. VIII.
II. THE CHEMICAL MASTER EQUATION
Consider a system of N different chemical species that are interacting via M chemical reactions; the state of the system is a vector of non-negative integers counting each species. The kth reaction, , is associated with the stoichiometric vector .
The evolution of the system is modeled as a continuous-time, discrete-state Markov process. The state space is a subset of the lattice of N-tuple of non-negative integers. A reaction k changes the current state to , with propensity . In this work, we assume that the propensity functions are separable, i.e.,
where are some scalar functions. This assumption is broad enough to encompass important cases in practice, including the standard mass-action kinetics where
and the Hill functions
Let denote the probability for the system to be in state at time t and assume that the initial probability distribution at t = 0 is known. The probability distribution evolves over time according to the CME, which has the well-known form
Alternatively, we can think of the probability mass function as a probability vector indexed by the states and define the transition rate matrix by
The CME can then be stated as a system of linear ODEs
where is the initial probability vector representing the initial condition of the biochemical system. This form leads to the exact solution
In practice, however, we cannot directly use the matrix exponential of the full matrix due to its enormous size. Reducing the complexity of the CME through the finite state projection and its tensor train decomposition is the focus of our present work.
III. TENSOR TRAIN REPRESENTATION AND BASIC TENSOR ARITHMETIC
A. Tensor notations and basic operations
By an N-way tensor of size , we mean a set of real numbers where the multi-index is an N-tuple of integers with . The numbers are called the mode sizes of the tensor , and the number N is called the order of that tensor. A tensor can be viewed either as a function on the N-dimensional lattice or as a linearized vector of length indexed by the elements of this set. It is helpful to keep both aspects in mind when reading our description. We denote the set of all tensors by . To keep the presentation succinct, we will make no distinction between a tensor and its vectorization. The usage will be clear from the context.
A linear operator from to can be represented by a tensor of size . We can think of as a square matrix whose rows and columns are, respectively, indexed by and . Depending on the context, we shall also identify such a tensor with the matrix of size resulting from its matricization.
B. Tensor train representation
The tensor train (TT) representation of a tensor takes the form
where , , with the boundary conditions r0 = rN = 1 to make (3) a scalar. The quantities rk are called the TT ranks of , and the three dimensional arrays
are called the TT cores. We shall denote these TT cores by .
While an explicit storage of the probability vector of the CME would require entries, where , the TT format only stores the TT cores at a total cost of entries, where . Thus, provided rTT is small, the storage of the TT format scales linearly with N, leading to dramatic savings from the explicit format where the storage scales exponentially with N. Any tensor can be written in the TT format by the TT-SVD algorithm that is based on the singular value decomposition (SVD).26
The CME falls in the common case of a matrix indexed by rows and columns , which is usually represented in the matrix TT format
where each is an matrix with the boundary conditions r0 = rN = 1. Whereas accessing a conventional matrix entry often amounts to a simple table lookup, the TT format implies that doing so through (5) now requires a sequence of smaller matrix-vector evaluations, evocative of how the advent of fast computerized algorithms superseded past mathematical tables of function values. Nevertheless, we should not lose sight of the fact that there is really no computational overhead in using (5) when the dramatic savings in storage makes possible to successfully solve problems that could not be solved in the first place. Consider also that individual entries will seldom be needed if we devise matrix-free algorithms that leverage the functional representation. Moreover, efficient CPUs can execute substantial computations without adversely affecting the CPU time.
Something else worth highlighting at this point is that our adaptive scheme starts with a relatively small tensor, meaning that it quickly obtains the initial decomposition (through the TT-SVD mentioned earlier), and it then uses cheap updates to gradually account for more likely states of the CME as explained in Sec. IV.
C. Operations in the TT format
A difficulty in dealing with the TT format is that even the most basic operations of addition and multiplication increase the TT ranks as we now illustrate. Consider two tensors that are already given in the TT format, i.e.,
The sum can also be written exactly in the TT format with cores given by the matrix concatenation
The matrix-vector product between given by (5), and an TT tensor given by (3) has a TT form with cores given by
Hence, the resulting tensor ranks increase quickly after each product. This discourages the direct use of Krylov subspace approximation techniques33–35 that rely heavily on matrix-vector products. Fortuitously, there is an efficient way to evaluate the matrix exponential through the uniformization method as we shall show in Sec. VI B.
So far, we have focused on the standard TT format that separates the “physical” modes of the tensor. In the case of the CME, these modes correspond to the species and their sizes could be large if there are several hundreds of molecules in each species as we will encounter in the numerical examples in Sec. VII. It then becomes more advantageous to reshape the tensor into another tensor of higher order with smaller mode sizes and apply the TT decomposition on the latter. This is the idea behind the quantized tensor train (QTT) format.27,36 In fact, Khoromskij27 proves that the QTT rank remains constant (or small) for a wide class of function-generated tensors, and practical experience corroborates that the QTT rank scales linearly in the number of modes and logarithmically in the mode sizes (cf. Table I). This is why our implementation will use the QTT format that we discuss next.
D. The quantized tensor train format
With the assumption that , a tensor of order N and size can be reshaped into a tensor of order and size through the correspondence
where are obtained from the binary representation of is − 1, i.e.,
in which the notation means the base 2 representation of an integer. We also refer to as the virtual indices of the tensor . The QTT format of is then defined as the TT format of the reshaped tensor , and we call the cores (respectively, ranks) of the TT representation of the QTT cores (respectively, QTT ranks) of .
The QTT format requires entries, where and rQTT is the maximum of the QTT ranks. When rQTT is small, the storage complexity is logarithmic in the mode size and linear in the order N. Dolgov and Khoromskij28 have analyzed the QTT ranks of the CME matrix, while Kazeev and Schwab37 gave bounds in the context of the stationary solution. Our implementation uses the TT-Toolbox (https://github.com/oseledets/TT-Toolbox) that provides handy routines to operate in the QTT format, which we combine with an adaptive finite state projection as we describe next.
IV. TENSOR-STRUCTURED FINITE STATE PROJECTION
A. Standard finite state projection on a hyper-rectangle
The size of the CME could be infinite or extremely large. To alleviate this problem, Munsky and Khammash7 introduced the finite state projection (FSP) that truncates the problem to a finite size. To obtain the QTT formulation of the FSP, we will restrict to a hyper-rectangle and then write the FSP-truncated CME operator in terms of Kronecker products as in previous studies.23,28,29 The hyper-rectangle is
where are positive integers and can be interpreted as maximum copy numbers of the chemical species. Recalling the separability assumption (1), we define the vector
Now, using Kronecker products, the FSP truncation of the CME operator corresponding to the hyper-rectangle (7) is
where is the identity matrix of size . are shift matrices defined by
In our implementation, the matrices are generated quickly with the routine tt_qshift of the TT-Toolbox (cf. the work of Kazeev and Khoromskij38 for the mathematical foundation of the routine). Given a time step , the one-step solution to the CME is then approximated by
with padding by zeros as necessary, and the associated restriction of the initial condition to .
In practice, we use a step-by-step integration scheme, and as shown in Sec. V, the truncation error will be controlled by imposing a condition based on a bound similar to that of the classic FSP.7 While previous studies employed the tensor-structured FSP for the whole integration time,28,29 we will make the FSP adaptive on-the-fly in several ways as explained later. We begin first by describing a new variant of the FSP targeted at hyper-rectangles where we dynamically adjust their boundaries according to the spread of the probability mass.
B. Finite state projection on a hyper-rectangle with absorbing boundary
Our starting point is to restrict the original CME to the subset of states in the hyper-rectangle , where the states on its boundary are turned into artificial absorbing states in order to keep track of the probability mass that flows out of .
The purpose of making the boundary states absorbing is to keep track of the probability mass that might leak out of the current FSP so that it can be used to monitor the error. The amount of probability mass absorbed in the boundary states can be bounded by summing up marginal distributions as stated below.
This follows from (12) and the fact that is a probability measure. □
Using the preceding lemma and the same techniques as in the proof of the original FSP theorem, we have the following result.
The theorem provides a computable error bound when the FSP is restricted to a hyper-rectangle with absorbing boundaries. It also suggests a way to improve accuracy through probability tracking and expansion in the directions where the probability mass leaks the most. Indeed, if (16) fails for some species s, we can selectively increase the corresponding bounds and try again with the enlarged . This will be the strategy used in our step-by-step integration scheme that we explain next.
V. STEP-BY-STEP INTEGRATION WITH ADAPTIVE HYPER-RECTANGLES
A. Overview
Having described just one step of our scheme, we now show how it is embedded in an adaptive step-by-step integration procedure that advances the solution through time points using the recurrence
where is the stepsize and is the hyper-rectangle enclosing the most likely states of the CME within the time interval , with and in the QTT format. All these quantities change adaptively across the K total number of steps eventually needed. Recall that operand vectors are padded with zeros for consistency. Section VI will address the selection of the stepsize and the evaluation of the matrix exponential in (18).
We focus here on explaining how adaptivity in our scheme is driven by the control of the local truncation error between consecutive iterates with the goal of approximating the final solution at a certain accuracy. To see how this is done, let be the sth marginal distribution derived from the approximation and define the vector
From Theorem IV.2, the local error when going from tk to is bounded by .
Let be a user-prescribed error tolerance; we impose the condition that the global sum of the local errors satisfies
by requiring that
otherwise, we expand the current hyper-rectangle of the FSP in the directions that violate (20) and retake the step with the ensuing larger projection and an adjusted stepsize determined by the uniformization method in Sec. VI.
Our approach is somewhat reminiscent of the sliding window technique9 in that we use a hyper-rectangle. However, we use a posteriori criteria to expand the window instead of allowing the window to slide. Our current implementation does not support sliding hyper-rectangles. We also perform the expansion of the hyper-rectangles by simply doubling the mode sizes in the direction of the highest loss of probability. We discuss next how the corresponding extension of the current solution is done in the QTT format.
B. Expanding the hyper-rectangle and padding the solution in QTT format
When criterion (20) is not met, some sides of the hyper-rectangle have to be increased. Traditional FSP methods7,8,11 that store the solution using a standard vector simply pad zero entries corresponding to the newly added states. However, our method stores the solution in the QTT format, and exactly how the padding is achieved needs some further explanation. We choose to expand by adding a new QTT core. Since each QTT core corresponds to a factor of 2 in the mode sizes of the tensor, our expansion has the effect of doubling the lengths of the sides of the current hyper-rectangle.
Let be the current solution associated with the hyper-rectangle . Assume that we double the sth side to obtain
and we want to find the QTT representation of the augmented tensor such that
Let the QTT format of be
with cores, then the QTT representation of preserves all the existing cores of and inserts just one more core after the -th core. Specifically, there are now cores such that for all pairs except for a newly added which we define as
The pseudo-code in Algorithm 1 summarizes the overall structure of our scheme. We describe next how the matrix exponential can be evaluated in the QTT format.
Input: Number of species N, number of reactions M, stoichiometry vectors and separable propensities |
for . |
Initial states and their probabilities. Final time tf. |
Initial hyper-rectangle with sizes . |
Error tolerance for FSP truncation. |
Output: Approximate solution to the CME at time tf (K is the number of steps). |
1: Form the matrix defined by (14) in QTT format. |
2: Form the initial distribution based on the initial states and their probabilities. |
3: Set t := 0, k := 0. |
4: while do |
5: Compute the tentative with an appropriate stepsize . |
6: if (criteria (20) fails for any s) then |
7: Set Ls: = Ls + 1 for those s where (20) is not satisfied. |
8: Form the new matrix and update the QTT representation of . |
9: else |
10: . |
11: k := k + 1. |
12: end if |
13: end while |
Input: Number of species N, number of reactions M, stoichiometry vectors and separable propensities |
for . |
Initial states and their probabilities. Final time tf. |
Initial hyper-rectangle with sizes . |
Error tolerance for FSP truncation. |
Output: Approximate solution to the CME at time tf (K is the number of steps). |
1: Form the matrix defined by (14) in QTT format. |
2: Form the initial distribution based on the initial states and their probabilities. |
3: Set t := 0, k := 0. |
4: while do |
5: Compute the tentative with an appropriate stepsize . |
6: if (criteria (20) fails for any s) then |
7: Set Ls: = Ls + 1 for those s where (20) is not satisfied. |
8: Form the new matrix and update the QTT representation of . |
9: else |
10: . |
11: k := k + 1. |
12: end if |
13: end while |
VI. EVALUATING THE MATRIX EXPONENTIAL IN QTT FORMAT
In this section, we implicitly understand vectors and matrices as already in the QTT format.
A. Uniformization method
Let be the transition rate matrix of the truncated CME in one step of our scheme (18). We drop the bar and the subscript for brevity and use to denote the operand probability vector. To advance the solution with stepsize , we approximate
using the uniformization method. It evaluates the action of the exponential operator by introducing
where is the identity matrix, and is an upper bound on the absolute values of the diagonal entries of , which are non-positive as seen in (2). From this, we have so that the Taylor approximation yields
Formula (22) is numerically more stable than a naïve Taylor approximation since is column-stochastic with non-negative entries bounded by one. The degree m is determined by ensuring the scalar inequality
which guarantees that the difference in L1-norm between the approximation and the true solution in (22) is below .
The selection of the stepsize follows the classic uniformization,31 which determines a uniform stepsize for a fixed matrix . In our case, however, the matrix can expand when the hyper-rectangle is enlarged, and the stepsize is not kept uniform but is instead adjusted at each expansion. To be specific, suppose that at step k we have a new matrix with the corresponding , we now wish to choose a new stepsize as if the integration domain was [tk, tf]. From there, as in the classical case,31 we take
and then set . The safety parameter is chosen to prevent underflow in the factor in the uniformization formula (22), and we will experiment with either 10 or 100 in our numerical examples. We note that there is a possibility of tweaking adaptively to improve performance, as a larger means a larger , but this remains to be investigated.
As noted before, an essential component of the uniformization method is the determination of the parameter . Preferably, this upper bound should be close to the maximum magnitude of the diagonal entries of . While it is straightforward in small matrices to find it simply by an exhaustive search, this is not the case with the CME where the diagonal entries are buried deep in an enormous tensor of size and so more advanced methods are required. In our implementation, we use the tt_max routine in the TT-toolbox for the estimation of .
B. Inexact uniformization in QTT format
To implement (22), one could think of simply computing the actions of the matrix sequentially. This usual approach is not practical in the QTT format. The reason is that matrix-vector products significantly increase the rank, and thus the storage requirement, of the computed vectors to the point of defeating the savings of the tensor format. We instead use a completely different approach based on solving a linear system as follows. Let , then the terms satisfy the recurrence
These relations lead to a linear system of the form
or, more succinctly,
where is the lower shift matrix of size , is the unit vector of length m + 1, and . We then set this problem in the QTT format and solve it by the Alternating Minimal Energy (AMEn) method of Dolgov and Savostyanov.32
It is remarkable that even though (25)–(27) are mathematically equivalent representations of , we observed experimentally that the compact tensor storage format copes better by far with the data-rich representation in (27) than the traditional approach in (25) that repeatedly uses matrix-vector products to disjointly retrieve the terms individually. Our observation here is consistent with the experience in other applications of the QTT format39 (see also Ref. 28), where the all-inclusive linear system formulation is found to be more efficient than sequential evaluations.
which amounts to a tensor contraction between the QTT format of and the vector . In practice, the iterative solver only returns an approximate solution and we instead have
The evaluation of (22) through an approximate solution to (27) is closely related to the inexact uniformization method.31 Let the residual be
Using the partitioning , where each has the same length as each vector , we have
Similar to the classical case (Theorem 3.1 in Ref. 31), we can establish the formula for the gap
where the only notable difference here is that we also include the initial error . We then easily arrive at the bound
C. Detecting equilibrium
The uniformization framework provides a way to detect when an equilibrium distribution of the Markov chain is reached, namely, through checking if . When the inexact evaluation mentioned above is used, the QTT structure of the solution to (27) gives rise to a simple way of checking this.
Let be the size of the tensor terms in (25). The tensor form of the solution to (27) will be of size . Hence, the QTT representation of consists of L + q cores, where the first cores correspond to the quantization of the N “physical” modes of the CME, and the remaining q cores result from the quantization of the last mode of size (m + 1). The Lth rank of the QTT format of characterizes the separability between the tensors . More precisely, we have the following result.
Although the theory seems convoluted, the resulting criterion for detecting the equilibrium amounts to simply checking that the Lth QTT rank of is unity. This quantity is readily available after completion of the AMEn algorithm.
Algorithm 2 summarizes in pseudo-code all the essential building blocks of our step-by-step framework, consisting of the FSP, QTT, inexact Uniformization, and AMEn methods. We clarify here that the use of an adaptive FSP in our method is not targeted at improving the compression rate of the tensor, which is already impressive thanks to the use of the QTT format. Rather, its purpose is to improve computational time, with the added benefit of freeing the user from finicky selections of appropriate values for the species bounds . The improved computational time comes from the fact that, from a numerical viewpoint, smaller hyper-rectangles likely yield smaller norms for the matrix , thus benefiting the matrix exponential (or the corresponding linear system of ODEs). Also, a smaller hyper-rectangle means that fewer cores are needed in the QTT format. The AMEn algorithm32 requires sweeping through the cores of the tensor train, and a shorter train means shorter sweeps. The downside of having adaptive hyper-rectangles is the overhead of monitoring the error and the potential cost of failed steps (for solving the CME on a rectangle that is too small), but we found this overhead not to be of significant consequence. The following testing section includes numerical experiments to assess the efficiency of our algorithm on well-known test problems in the CME literature. The section reports comparisons of our adaptive approach with the fixed approach that uses a non-adaptive hyper-rectangle large enough from the outset, showing that adaptivity is very beneficial to a problem such as the Goutsias example.
Input: Number of species N, number of reactions M, stoichiometry vectors and separable propensities |
for . |
Initial states and their probabilities. Final time tf. |
Initial hyper-rectangle with sizes . |
Error tolerances: for FSP, for AMEn, for uniformization. |
Output: Approximate solution to the CME at time tf (K is the number of steps). |
1: Form the matrix defined by (14) in QTT format. |
2: Form the initial distribution based on the initial states and their probabilities. |
3: Find and determine the stepsize and Taylor degree m. |
4: Form . |
5: Set t := 0, k := 0. |
6: while do |
7: Solve (27) with AMEn to obtain . |
8: Compute . |
9: if (criteria (20) fails for any s) then |
10: Set Ls := Ls + 1 for those s where (20) is not satisfied. |
11: Form the new matrix and update the QTT representation of . |
12: Determine , stepsize and Taylor degree m. |
13: Form . |
14: else |
15: if (-th QTT rank of is 1) then |
16: STOP ‘equilibrium reached’. |
17: end if |
18: . |
19: k := k + 1. |
20: . |
21: end if |
22: end while |
Input: Number of species N, number of reactions M, stoichiometry vectors and separable propensities |
for . |
Initial states and their probabilities. Final time tf. |
Initial hyper-rectangle with sizes . |
Error tolerances: for FSP, for AMEn, for uniformization. |
Output: Approximate solution to the CME at time tf (K is the number of steps). |
1: Form the matrix defined by (14) in QTT format. |
2: Form the initial distribution based on the initial states and their probabilities. |
3: Find and determine the stepsize and Taylor degree m. |
4: Form . |
5: Set t := 0, k := 0. |
6: while do |
7: Solve (27) with AMEn to obtain . |
8: Compute . |
9: if (criteria (20) fails for any s) then |
10: Set Ls := Ls + 1 for those s where (20) is not satisfied. |
11: Form the new matrix and update the QTT representation of . |
12: Determine , stepsize and Taylor degree m. |
13: Form . |
14: else |
15: if (-th QTT rank of is 1) then |
16: STOP ‘equilibrium reached’. |
17: end if |
18: . |
19: k := k + 1. |
20: . |
21: end if |
22: end while |
VII. NUMERICAL EXAMPLES
The computing environment is MATLAB 2016b on an ASUS UX51VZA laptop running Ubuntu Linux with 8 GB of RAM and 2.1 GHz Intel core i7 CPU, using routines from the TT-toolbox version 2.2 (https://github.com/oseledets/TT-Toolbox). We specifically use the MEX-Fortran implementation of AMEn in the file fort_amen_solve.ffor fast executions of the tensor linear system solves. In all examples, the tolerance parameters are set at . The safety parameter in (24) is set at 100 for the toggle switch and the Goutsias examples and at 10 for the p53 example.
We will also present comparative results of our method with the other well-known QTT-based CME solvers28,29 that we summarize here as follows:
The hp-DG-QTT solver of Kazeev et al.29 integrates the CME using the hp-discontinuous Galerkin time discretization that results in a time-stepping scheme with a tensor-formatted linear system solved at each step with the DMRG (Density Matrix Renormalization Group) algorithm.40 Since DMRG was found to be less efficient than AMEn in several settings,32 we also implemented a slight variant of the original algorithm by replacing DMRG with AMEn. All the algorithmic parameters of Kazeev et al.29 are kept, except the DMRG solver tolerance that we set to 10−5 (we found the strict tolerance there to be slower), and our implementation makes calls to the efficient MEX routines in the TT-toolbox.
The block state-time approach of Dolgov and Khoromskij28 also integrates the CME on a fixed hyper-rectangle, but they use the Crank-Nicolson scheme instead of hp-DG. Furthermore, the Crank-Nicolson steps are not evaluated sequentially but are combined into a block state-time linear system solved with AMEn. The integration interval [0, tf] is broken into subintervals of uniform size T0, and an all-inclusive solver is used on each interval , . We set Nt = 1024 time steps for the Crank-Nicolson time-discretization on each subinterval and vary T0 in {10−3tf, 10−2tf, 10−1tf}. We always use the AMEn tolerance 10−5 in our implementation. In what follows, we will call this the QTT-block-Crank-Nicolson approach.
A. A stochastic model of genetic toggle switch
We revisit the example solved by Kazeev et al.,29 which is a stochastic model for the toggle switch of Gardner et al.41 There are two species U and V that inhibit the production of each other. Table II shows the values of the parameters.29 We start with an initial rectangle of size and integrate to tf := 100. Our method detects equilibrium at , which leads to an early termination. We ended up using only about 785 s of CPU time (Fig. 1). The equilibrium distribution displays biomodality, which is a well-known characteristic of this system (Fig. 2).
To compare our solution with the two existing CME solvers that also used the QTT format,28,29 we fixed their FSPsize to , a tight range suggested by our method and which is slightly smaller than the hyper-rectangle tested by Kazeev et al.29 From the results reported in Table III and Fig. 3, we see that our method was about at least two times faster than previous approaches that deliver results with comparable accuracy on this problem. In addition, our results confirm the effectiveness of AMEn over DMRG for the hp-DG scheme.
Method . | CPU time (s) . | Gap () . |
---|---|---|
New method (Algorithm 2) | 785.21 | … |
hp-DG-QTT-DMRG | 8773.58 | 5.85 × 10−4 |
hp-DG-QTT-AMEn | 2113.32 | 2.86 × 10−5 |
QTT-block-Crank-Nicolson (T0 = 0.1) | 2106.66 | 2.73 × 10−5 |
QTT-block-Crank-Nicolson (T0 = 1) | 9194.16 | 6.50 × 10−3 |
QTT-block-Crank-Nicolson (T0 = 10) | 469.58 | 5.28 × 10−2 |
Method . | CPU time (s) . | Gap () . |
---|---|---|
New method (Algorithm 2) | 785.21 | … |
hp-DG-QTT-DMRG | 8773.58 | 5.85 × 10−4 |
hp-DG-QTT-AMEn | 2113.32 | 2.86 × 10−5 |
QTT-block-Crank-Nicolson (T0 = 0.1) | 2106.66 | 2.73 × 10−5 |
QTT-block-Crank-Nicolson (T0 = 1) | 9194.16 | 6.50 × 10−3 |
QTT-block-Crank-Nicolson (T0 = 10) | 469.58 | 5.28 × 10−2 |
B. Goutsias model
The Goutsias model refers to a simplified version of a time-varying model of transcription regulation proposed by Goutsias.42 This model is considered challenging to CME solvers8,9,11,43 because its probability mass spreads quickly in the directions of the first three species, pushing the FSP to expand into millions of states. Table IV shows the parameters chosen as in the Krylov-FSP,8 and the initial state is
We set tf := 1000, which propels the time horizon much further than in existing attempts on this example. Our method took about 3 h to reach the end time, which is expected for this challenging problem. However, what is surprising is that the QTT representation used less than 112 830 entries to store the QTT solution (Fig. 4). This contrasts with previous results8,11 where a much shorter time tf = 300 already demands one million entries to store the approximate solution. The marginal distributions from the QTT format fit tightly with those generated by one million trajectories using Gillespie’s stochastic simulation algorithm (SSA),44 as seen in Fig. 5.
. | Reaction . | Propensity . | Rate constant (s−1) . |
---|---|---|---|
1. | c1(RNA) | c1 = 0.043 | |
2. | c2(M) | c2 = 0.0007 | |
3. | c3[DNA.D] | c3 = 0.0715 | |
4. | c4(RNA) | c4 = 0.0039 | |
5. | c5[DNA][D] | ||
6. | c6[DNA.D] | c6 = 0.4791 | |
7. | c7[DNA.D][D] | ||
8. | c8[DNA.2D] | ||
9. | |||
10. | c10[D] | c10 = 0.5 |
. | Reaction . | Propensity . | Rate constant (s−1) . |
---|---|---|---|
1. | c1(RNA) | c1 = 0.043 | |
2. | c2(M) | c2 = 0.0007 | |
3. | c3[DNA.D] | c3 = 0.0715 | |
4. | c4(RNA) | c4 = 0.0039 | |
5. | c5[DNA][D] | ||
6. | c6[DNA.D] | c6 = 0.4791 | |
7. | c7[DNA.D][D] | ||
8. | c8[DNA.2D] | ||
9. | |||
10. | c10[D] | c10 = 0.5 |
We conduct further tests to compare our method with the QTT approach of Dolgov and Khoromskij,28 using subinterval lengths (cf. Fig. 6 and Table V). However, setting T0 := 10 turned out to yield very poor performance, with the method taking over 10 h only to reach t = 650. Thus, we omit the case T0 := 10 in this example.
Method . | CPU time (s) . | Gap () . |
---|---|---|
New method (Algorithm 2) | 9 955.06 | … |
QTT-block-Crank-Nicolson (T0 = 1) | 14 227.07 | 1.81 × 10−4 |
QTT-block-Crank-Nicolson (T0 = 100) | 840.69 | 7.47 × 10−3 |
Method . | CPU time (s) . | Gap () . |
---|---|---|
New method (Algorithm 2) | 9 955.06 | … |
QTT-block-Crank-Nicolson (T0 = 1) | 14 227.07 | 1.81 × 10−4 |
QTT-block-Crank-Nicolson (T0 = 100) | 840.69 | 7.47 × 10−3 |
C. Regulation of p53
The p53 gene is an important barrier against the development of cancerous cells. Here, we revisit a model45 of p53 regulation (cf. Table VI) that we attempted to solve before using the sparse tensor format.46 The QTT format is far more well-suited to this particular problem as we will see here. In our experiment, we set the integration time tf := 1000 and, to make the problem challenging, an initial state of high protein populations
Our method was able to solve the problem in 183.14 s, with the QTT format using no more than 116 952 entries in its representation of the solution. Figure 7 illustrates the performance and adaptivity of our algorithm. The marginal distributions from the QTT format fit tightly with those generated by one million SSA trajectories, as seen in Fig. 8. Figure 9 and Table VII show some comparisons between our method and the scheme of Dolgov and Khoromskij.28
. | Reaction . | Propensity . | Parameters . |
---|---|---|---|
1. | kp | kp = 0.5 | |
2. | |||
3. | |||
4. | |||
5. | |||
6. | |||
7. | |||
8. | |||
9. | |||
10. | ka | ka = 0.5 | |
11. |
. | Reaction . | Propensity . | Parameters . |
---|---|---|---|
1. | kp | kp = 0.5 | |
2. | |||
3. | |||
4. | |||
5. | |||
6. | |||
7. | |||
8. | |||
9. | |||
10. | ka | ka = 0.5 | |
11. |
Method . | CPU time (s) . | Gap () . |
---|---|---|
New method (Algorithm 2) | 183.14 | … |
QTT-block-Crank-Nicolson (T0 = 1) | 311.93 | 5.91 × 10−7 |
QTT-block-Crank-Nicolson (T0 = 10) | 91.36 | 4.99 × 10−7 |
QTT-block-Crank-Nicolson (T0 = 100) | 178.67 | 3.23 × 10−6 |
Method . | CPU time (s) . | Gap () . |
---|---|---|
New method (Algorithm 2) | 183.14 | … |
QTT-block-Crank-Nicolson (T0 = 1) | 311.93 | 5.91 × 10−7 |
QTT-block-Crank-Nicolson (T0 = 10) | 91.36 | 4.99 × 10−7 |
QTT-block-Crank-Nicolson (T0 = 100) | 178.67 | 3.23 × 10−6 |
D. Comparison with the non-adaptive version
We perform more tests to study the effect of using adaptive hyper-rectangles in our method. To do so, we solve the three test problems in Sec. VII again using Algorithm 2 with adaptive hyper-rectangles turned off. For the fixed FSP in these tests, we use the largest hyper-rectangles selected by the adaptive version (cf. Table VIII). Note that when the hyper-rectangle is fixed, Algorithm 2 will use uniform Taylor degree m and stepsize across all time steps. The non-adaptive setting is relieved from the overhead of checking the FSP error and related costs, and this could result in a marginal gain over the adaptive version in some problems (toggle switch and p53). However, we see that the fixed FSP leads to an increase of about 2.5 times in computational time in the Goutsias model. The large hyper-rectangle forces the uniformization to select a small stepsize in the Taylor approximation, and this leads to increased computational cost in a long integration time.
. | Adaptive . | Non-adaptive . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Problem () . | . | m . | . | . | . | CPU (s) . | . | m . | . | CPU (s) . |
Toggle switch (2) | 1887 | 255 | 7.58 × 10−3 | 1.51 × 10−2 | 7.71 × 10−3 | 785.21 | 2 259 | 255 | 7.58 × 10−3 | 700.35 |
Goutsias (6) | 8844 | 255 | 3.36 × 10−2 | 1.78 × 10+00 | 1.13 × 10−1 | 9955.06 | 29 722 | 255 | 3.36 × 10−2 | 24 600.75 |
p53 (6) | 439 | 31 | 1.53 × 10+00 | 6.21 × 10+00 | 2.28 × 10+00 | 183.14 | 655 | 31 | 1.53 × 10+00 | 127.09 |
. | Adaptive . | Non-adaptive . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Problem () . | . | m . | . | . | . | CPU (s) . | . | m . | . | CPU (s) . |
Toggle switch (2) | 1887 | 255 | 7.58 × 10−3 | 1.51 × 10−2 | 7.71 × 10−3 | 785.21 | 2 259 | 255 | 7.58 × 10−3 | 700.35 |
Goutsias (6) | 8844 | 255 | 3.36 × 10−2 | 1.78 × 10+00 | 1.13 × 10−1 | 9955.06 | 29 722 | 255 | 3.36 × 10−2 | 24 600.75 |
p53 (6) | 439 | 31 | 1.53 × 10+00 | 6.21 × 10+00 | 2.28 × 10+00 | 183.14 | 655 | 31 | 1.53 × 10+00 | 127.09 |
VIII. CONCLUSION
In this work, we have developed a new approach for solving the chemical master equation by combining the finite state projection, tensor train decomposition, inexact uniformization, and AMEn methods. In comparison to previous CME solvers that have also leveraged the tensor train format,28,29 our method makes a distinctive use of a new method for efficiently evaluating the action of a QTT-formatted matrix exponential using a Taylor approximation in a data-rich tensor representation, with an adaptive FSP and stepsize selection strategy. The numerical tests confirm the practical benefits of these adaptive mechanisms. Overall, our method yields substantial savings in execution time on top of the impressive savings in memory enabled by the QTT/TT format. Our QTT-based inexact uniformization can potentially be used to evaluate the transient solution to other large-scale Markov chains such as those arising in stochastic automata networks.47
The present work opens new directions for further developments. For example, instead of just expanding the adaptive hyper-rectangles in our scheme, they could slide and/or shrink, with the effect of reducing the number of QTT cores in the representation of the solution. When advancing the integration or reporting the final results, there could be other refinements stemming from the error control, provided that the interior states of the hyper-rectangles could be economically distinguished and treated differently from the boundary states that were turned into artificial absorbing states. We note further that the quantized tensor approximation could be constructed not only for the tensor train26 but also for other formats such as those in the Hierarchical Tucker family48 (this is also discussed by Khoromskij27), and it might be insightful to further compare the tensor train with other formats. However, practical experience indicates that the TT/QTT provides us with the simplest implementation with impressive compression rates. It could also be useful to generalize the way we recast the inexact uniformization in the QTT format in tandem with AMEn to other applications that make use of the matrix exponential. Conversely, alternative ways of evaluating the matrix exponential in tensor format49 could lead to other variants of the algorithm presented here.
ACKNOWLEDGMENTS
This work was supported by NSF Grant No. DMS-1320849. The authors thank the anonymous referees for their comments.