Tensor-SqRA: Modeling the Transition Rates of Interacting Molecular Systems in terms of Potential Energies

Estimating the rate of rare conformational changes in molecular systems is one of the goals of Molecular Dynamics simulations. In the past decades, a lot of progress has been done in data-based approaches towards this problem. In contrast, model-based methods such as the Square Root Approximation (SqRA), directly derive these quantities from the potential energy functions. In this article we demonstrate how the SqRA formalism naturally blends with the tensor structure obtained by coupling multiple systems, resulting in the tensor-based Square Root Approximation (tSqRA). It enables efficient treatment of high-dimensional systems using the SqRA and provides an algebraic expression of the impact of coupling energies between molecular subsystems. Based on the tSqRA, we also develop the Projected Rate Estimation (PRE), a hybrid data-model-based algorithm that efficiently estimates the slowest rates for coupled systems. In addition, we investigate the possibility of integrating low-rank approximations within this framework to maximize the potential of the tSqRA.


Introduction
Classical molecular systems are modeled by a function V (x) which provides the potential energy of the system as a function of the N-dimensional coordinate vector x of the atoms of the system.The potential energy function V is calculated as a sum of interaction energies, and each summand depends on a small subset of the coordinates only.If we assume that parts of the molecular system are spatially far enough apart from each other, then these parts of the system move almost independently.According to this assumption, let the potential energy of the system, e.g., be written as where x 1 is a d-dimensional vector and x 2 is an (N − d)-dimensional vector of disjoint subsets of coordinates of the entire N-dimensional system.The potential defined in eq. ( 1) can be investigated from three different points of view: (i) V 1 and V 2 are analyzed individually as isolated subsystems; (ii) V 1 and V 2 are analyzed as a non-interacting combined system; (iii) V 1 and V 2 interact by means of the potential V c giving rise to a coupled system.In real applications, it is interesting to understand how the term V c of the coupled system changes the rate of rare transitions of the combined system.
In order to answer this fundamental question, the typical approach (known as Markov State Modelling [1]) is to produce classical Molecular Dynamics (MD) simulations at the atomistic level, then the simulation data are used to construct transition probability matrices, whose spectral analysis allows the determination of the time scales of the "macroscopic movements".
As an alternative to the data-based approach of Markov State Modelling, which is widely known and applied, the model-based Square Root Approximation (SqRA) [2][3][4][5] can directly derive transition rate matrices from the potential energy function of the molecular system without taking the detour of generating molecular simulation data.
In this article, we review the fundamentals of SqRA and show how its algebraic structure can be leveraged to calculate the rare event rates of molecular systems with potentials defined as in eq. 1.In particular, we show how SqRA allows to directly represent the kinetic properties of the non-interacting system V (x 1 , x 2 ) in terms of the rate matrices of the isolated subsystems V 1 (x 1 ) and V 2 (x 2 ) using the Kronecker formalism [6], thereby circumventing the curse of dimensionality.Clearly, this simple decomposition breaks down when introducing the coupling term V c (x 1 , x 2 ).
To alleviate this problem, we developed a tensor formulation of SqRA (tSqRA) which allows to represent the coupling terms by Hadamard products.Since these act on the interacting particles only this enables to inherit as much of the underlying decoupled structure as possible.This reduces the complexity while still providing exact results, avoiding the (computationally expensive) production of molecular simulation data to obtain transition rate matrices of the entire molecular system.Using tensor algebra to analyze coupled Markov processes is not a new idea, there exist introductory texts to this kind of approaches, see Dayar [7] or Ludyk [8].Also analytical and linear algebraic methods are well-established and widely known, see Pollock [9] and Jokar [10].Not only in theory, but also in applications, these methods have been used intensively, see Fernandez [11] and Ching [12].However, our manuscript goes one step further than the existing theory and methods by combining the algebraic form of SqRA which provides an explicit link between transition rates and potential energy given by with the tensor algebra approach.
Furthermore, we introduce the Projected Rate Estimation (PRE), a hybrid data-model-based method to obtain the global transition rates of coupled systems as refinements of the transition rates of noninteracting subsystems by carrying out only a few local simulations.
The presented methods do not exploit any low-rank structures in the problem yet and therefore cannot be applied directly to larger systems.However, we demonstrate how the provided formalism lends itself to future low-rank approximations via tensor-trains or -networks, thereby facilitating computations on larger scales [13][14][15][16].

Theory
Originally, SqRA [2,17] was invented to solve the problem of finding a transition rate matrix Q ∈ R M×M of a molecular system with an Euclidean state space discretized into M subsets, such that this matrix is reversible and has a predefined stationary distribution π ∈ R M .It turned out that a regular grid discretization of the state space guarantees a simple algebraic form of the matrix Q and that increasing the size of the grid M → ∞ leads to a convergence of Q towards the Fokker-Planck operator of the underlying overdamped Langevin dynamics of the system [18].Note that for the application of SqRA, a regular grid can be constructed by a hypercubic grid which can be seen as a Voronoi tessellation with equidistant center points in each direction.In many applications of SqRA, it is assumed that the discretization of the state space is fine enough to approximate the continuous stationary distribution by the pointwise evaluation of the Boltzmann distribution at the centers of the cells.In the following sections we will show how the algebraic structure of the SqRA lends itself to the application to combined and even coupled systems.
In section 2.1 we start by introducing the necessary foundations of the SqRA.Section 2.2 then shows how the SqRA matrices for the combined system can be composed from the SqRA matrices of the isolated systems using Kronecker products and sums.In section 2.3 we show how these matrices are changed by the introduction of a coupling between the subsystems.Section 2.4 provides the most general formulation in terms of tensors and Hadamard products paving the way for future low-rank developments.

Linear Algebraic Form of SqRA
Consider an N-dimensional dynamical system governed by the potential energy function V (x) : R N → R, where the state space x is discretized by a hypercubic grid with M center points x i with i = 1, 2, . . ., M. The stationary distribution π is given by the Boltzmann distribution with entries where k B is the Boltzmann constant and T the temperature.The SqRA the defines the sparse rate matrix or alternatively in matrix form Here D is a diagonal matrix with the square roots of the vector π on the diagonal, i.e.D = diag( √ π), A is the the adjacency matrix of the grid, and e M = (1, 1, . . ., 1) T is a column vector with M entries.We denote the off-diagonal part of Q by Q out = D −1 AD and the diagonal part E = diag(D −1 AD e M ) is chosen in such a way that Q has row sum zero.Note that the stationary distribution in eq. ( 2) does not need to be normalized since the quotient in the SqRA's formula cancels out the normalization constant.When definining Q this way it has unit-less entries.To obtain a proper rate matrix, it is necessary to multiply each entry Q i j by a flux term Φ i j that depends on the grid geometry and the diffusion [4].For simplicity we assume a regular grid and constant diffusion such that we can omit Φ ∝ 1 [time units] −1 .
With this assumption, the matrix Q is a transition rate matrix.The rows of Q out comprise the outgoing rates from one given state (subset) of the system to the adjacent states and the diagonal entries of Q are negative and represent the "total exit rate" from a given state.Since these are also the entries of E, we will also refer to E as the "exit rate matrix".Note that Q can be seen as a similarity transform of A − E: Thus, eigenvalues of Q and A − E coincide.As a consequence the implied time-scales of molecular systems, which are derived from these eigenvalues, only depend on the "total exit rates" and on the adjacency of the discretized states of the system.In this regard A − E is like a decomposition of the process into a "entropical" part (adjacency matrix A) and a "energetic" part (exit rates matrix E).The eigenvectors (or their sign structure) of Q on the other hand can be used to identify macro-states in the form of conformations or metastabilities by algorithms like PCCA [19] or PCCA+ [20].If v is an eigenvector of A − E, then D −1 v is the corresponding eigenvector of Q (sharing the same sign structure with v).
Describing transitions in terms of rate matrices is only one possible notation.A very common way of analyzing Markov processes is the use of transition probability matrices instead of rate matrices.Our approach can be transfered to the framework of transition matrices.A Markov Chain which is observed for a specific lag time τ (the time unit is defined by the flux in eq. ( 5)) is represented by a conditional probability matrix where exp(•) denotes the matrix exponential.Eq. ( 6) defines the discrete version of the Koopman operator that transports observable functions f ∈ L ∞ forward in time.Correspondingly, its transpose is a discretization of the propagator P τ which transports probability densities ρ ∈ L 1 .The entries of the ith row of K τ quantify the conditional probability for a system that starts in this state i to end up in the respective states in the time span τ.In contrast to Q, the matrix K τ is usually a dense matrix.However, K τ has the same eigenvectors like Q and if λ is an eigenvalue of Q, then exp(τλ ) is an eigenvalue of K τ .In this regard many results and methods developed for K τ or P τ directly carry over to Q.

The Kronecker Formalism of SqRA for Combined Subsystems
When considering two isolated systems consisting of M 1 and M 2 possible states respectively, the combined system can be in one of M = M 1 × M 2 possible states and the corresponding Q matrix is of size However, since they do not interact they should still be described by the two individual system's rates matrices Q 1 , Q 2 which are of size M 1 × M 1 and M 2 × M 2 only, therefore providing a way more compact representation.In this section, we will show how the SqRA allows for such a compact representation in the form of Kronecker products and sums.We first consider the case of n = 2 combined subsystems, respectively of dimensions L 1 and L 2 such that L 1 + L 2 = N, before stating the more general result for arbitrary n.The combined potential energy is the sum of the subsystems potentials V 1 : R L 1 → R and V 2 : R L 2 → R: where x 1 ∈ R L 1 and x 2 ∈ R L 2 are the state vectors of the subsystems.Similarly, due to its exponential form, the overall stationary distribution defined in eq. ( 2) factors into the product of the marginal distributions of the subsystems The hyper cubic grid discretizing the Euclidean state space is given by the combination of the grids along the two system's sets of coordinates, each consisting of M 1 and M 2 cells respectively for a total number of M = M 1 • M 2 cells.Let us now see how the matrices A, D, E, Q out and Q introduced in eq. ( 4) are rewritten in terms of the smaller matrices associated with the two subsystems.

Adjacency matrix
The matrix A of the coupled system is given by the Kronecker sum of the corresponding adjacency matrices of the subsystems: where I 1 and I 2 are respectively two identity matrices of size (M 1 × M 1 ) and (M 2 × M 2 ).

Diagonal matrix
The stationary distributions π 1 and π 2 of the subsystems are used to build the diagonal matrices ), and the matrix D is given by Kronecker product of the corresponding subsystems matrices: (10)

Off-diagonal matrix
Inserting eq. ( 9) and eq.( 10) into the off-diagonal part of the SqRA (eq.( 4)) we obtain and we see that the off-diagonal part indeed decomposes into the Kronecker sum of the individual systems

Rate matrix
Since transition probabilities are given by the product probabilities for the respective transitions of the subsystems, the discretization of the Koopman operator of the full system can be decomposed as [7] where we applied the Kronecker sum rule for exponential matrices in the last line.This implies

Exit rate matrix
Applying eq. ( 13) and eq.( 11) yields from which one derives For an arbitrary number n of subsystems we summarize this finding in the following proposition: Proposition 1.Consider an N-dimensional system with potential V : R N → R and stationary distribution π : R N → R defined on a state space that can be discretized by an N-dimensional hypercubic grid.Assume that the coordinates can be partitioned into n subsets of size L 1 , L 2 , . . ., L n each, with ∑ n i=1 L i = N, such that the potential V is written as where each potential V i : R L i → R is an L i -dimensional function that depend only on the ith subset of coordinates.Correspondingly, the stationary distribution is rewritten as For each subsystem i, we define the adjacency matrix A i , the diagonal matrix D Then, the SqRA matrices for the entire system are given in terms of the subsystem matrices

Matrix Representation of the SqRA with a Global Coupling Term
The previous section showed that the structure of the SqRA naturally leads to a low-rank representation for the coupled case.We will now study how the results change when adding a coupling between the systems.
As in the previous section we start with the case of n = 2 subsystems first.Given the coupling potential V c the total potential energy of the coupled system is The unnormalized stationary distribution is decomposed as where ).The adjacency relations are not affected by the coupling term, so the adjacency matrix A is the same as in eq. ( 9).On the other hand, each entry of the diagonal matrix D is reweighted by the coupling, leading to where is the (M × M) diagonal matrix built with the stationary distribution of the coupling potential V C , while D 1 and D 2 are defined as for the combined system.Similarly, the inverse satisfies Note that, given that D C is a diagonal matrix, the calculation in eq.( 17) can be interpreted as either a matrix-matrix multiplication or an elementwise multiplication.This section focuses on the matrix formalism, but the elementwise interpretation will play a central role in the next section.The off-diagonal matrix Q out of the coupled system is written as and thus also According to eq. ( 4), the SqRA rate matrix for the coupled system then reads In the general case for n subsystems coupled by a single coupling term, Proposition 1 results are modified as follows: Proposition 2. Consider the situation of Proposition 1 but with a global coupling potential V C : R n → R. The potential thus decomposes into with additional coupling term V C : R n → R and the stationary distribution factorizes as The SqRA matrices for the coupled system are then given by

Generalisation to the Tensor Formulation for Arbitrary Interactions
Using the tensor formalism, previous results can be easily generalized to the case where the potential V is given by a sum of lower order potentials, i.e. potentials that act only on a subset of coordinates, leading to a flexible decomposition in terms of Hadamard products and paving the way for low-rank tensor computations.
To this end let us consider each coordinate as an individual subsystem, i.e. n = N, and introduce the space of tensors of order N, T (N) = R M 1 ×...×M N .Each individual state of the (discretized) system can be understood as a single entry of this tensor, then elements x ∈ T (N) represent distributions or functions over all states.We furthermore introduce the symbolic multi-indices I ∈ I ⊂ P({1, ..., N}) where P denotes the power set, i.e. the set of all possible subsets of indices that can appear.We use these multi-indices in subscript to denote the coordinates upon which the individual lower-order potential contributions V I depend: Further we use Greek superscript letters to denote the individual grid positions of those respective coordinates.For example, the tensor holds the evaluation of all potential contributions of the combinations of first and second coordinates at the respective product grid.To each set of indices I corresponds a tensor D I of order |I| consisting of the square roots of the stationary distribution (eq.( 2)) of the corresponding potential This tensor holds all the information about the interaction between the indices I.We combine the individual interactions to the whole interaction tensor D of order N by means of the (widened) Hadamard product where we "broadcast" or "widen" the elementwise multiplication along all dimensions appearing only on one side.The SqRA tensor D ∈ T (N) then decomposes into the Hadamard factors corresponding to the respective lower order potentials or shortly In the previous sections, we introduced the adjacency matrix A. In the case of a 1-dimensional system it consists of a sparse matrix with two off-diagonals.For the flattened representation of higher-dimensional systems it is a multibanded-matrix with 2×N bands.However, the Kronecker sum representation defined in eq. ( 9), directly translates to a tensor representation with entries with δ being the usual Kronecker delta: δ αα ′ = 1 if α = α ′ and 0 otherwise.In this regard we will think of A as a linear map, mapping tensors of order N to tensors of order N.The action of the SqRA tensor Q : T (N) → T (N) on a state x ∈ T (N) can then be computed via where Let us now discuss the practical implications of this formulation in terms of computational complexity.Let M denote the size of a state vector in T (N).The action of the adjacency operator A on a tensor state x ∈ T (N) can be computed by 2NM floating point operations.The regular grid leads to a banded matrix-representation for flattened states, that allows for a very cache-efficient implementation (c.f.appendix B).Since D ∈ T (N) it requires just as much memory as we need to hold the state x in memory.Note here that when computing D according to eq. ( 28) we evaluate the potential functions only on grids up to the order of the interaction.
For example, consider a system of N particles in 1-d space with pairwise interactions and a grid of m cells for each particle.Using the tensor representation eq. ( 28) each of the N × (N − 1) pairwise potentials gets evaluated on a grid of the size m 2 , resulting in O(N 2 m 2 ) potential evaluations, compared to O(N 2 m N ) evaluations for a naive application on the whole grid.Similarly, a system of n = N /3 particles in 3 dimensional space with bond (pair-wise) and angle (triplet-wise) interactions, discretized to m cells in each of the N = 3n coordinates, requires O(n 3 m 3 ) evaluations instead of O(n 3 m N ).
Put differently: Let j denote the order of the highest order interactions and assume that the number of these interactions is fixed.The computational effort for computing the D tensor then scales with O(m j ) compared to O(m N ) for the classical evaluation on the whole grid, i.e. it does not depend on the full system dimension.Note however, that the number of cells M = m N , still grows exponentially in the dimension.Therefore when using a dense state representation the application of D scale badly.
However, since each D I acts only on a few modes, low-rank representations of the state, in conjunction with approximate low-rank computations of the Hadamard product allow for a closed low-rank representation of Q.In cases where the dynamics do indeed permit low-rank representations, as would be expected for weakly interacting systems, this method promises to break the curse of dimensionality (see also the discussion in Sec. 5).
Finally, note that for the iterated application, just as with the similarity transform in the matrix case, we have Thus, in order to compute the spectrum of Q by an iterative solver we merely need the repeated evaluation of A−E, resulting in O((2N +1)M) floating point operations.Considering that A is inherently low-rank (c.f.eq. ( 9)), using a low-rank state representation promiseS an exponential speedup.
To summarize, using the tensor SqRA (denoted as tSqRA) allows to alleviate the (explicit) exponential dependence of the computation cost for Q in the system dimension, replacing it by the maximal order of interactions.Even though the scalability is still influenced by the state-size (implying an exponential relationship for dense representation), the tensor formalism should naturally facilitate the integration of low-rank techniques to manage this aspect.

Coupling of the eigenvalues and the eigenvectors
We noted previously that the eigenvalues and eigenvectors of Q are of special interest to understand the slow time-scale dynamics of the process.Let us therefore investigate what we can say about the spectrum of the composed and coupled system.Since Q is similar to A − E, which in turn is symmetric, it follows that all eigenvalues are real numbers.Furthermore, the leading eigenvalue of Q matrices is 0 (with a constant eigenvector) and all other eigenvalues are negative.
Let 0 = λ denote the eigenvalues with respective eigenfunctions X i , . . ., X where Λ := i Λ i is a diagonal matrix with eigenvalues as diagonal entries, and X := i X i is a matrix with the eigenvectors of the combined system.In terms of individual eigenvalues λ ( j) this comes down to the eigenvalues being a sum of one of each of the isolated system's eigenvalues, where index j (•) picks an index for each subsystem (depending on j).The corresponding eigenvector X ( j) is then given by the product of the respective subsystem's eigenvectors: When introducing a coupling term, V C , the spectrum is perturbed from λ ( j) to λ ( j) as well as from X ( j) to X ( j) .The easiest case is the coupling of two subsystems.One can observe a repeating algebraic pattern comparing the construction of the rate matrix Q, from the adjacency matrix A, with the construction of coupled systems from uncoupled systems.In eq. ( 4), we have a conversion from A which represents transition rates with regard to a constant potential energy function into a matrix Q of exit rates with regard to a non-constant potential V .Interestingly, this is the same kind of linear algebra like the transition from the positive exit rates of the combined system (matrix Q out ) with a constant coupling energy to the rate matrix of the coupled system Q with a non-constant V I .It is given by a similarity transform of a positive rate matrix using a diagonal matrix followed by subtracting the diagonal matrix of row sums.Although this is a simple linear algebraic correspondence between the matrices, it also shows, that a transition from Q to Q can change the solution of the eigenproblem largely, as the transition from a pure adjacency A to a molecular system (by regarding the potential V ) does.However, by the structure of the equation where ∆E = E − E is the difference of "total leaving rates", one can see that the eigenvalues of Q are identical to the eigenvalues of (Q − ∆E).Furthermore, the eigenvectors of Q are the eigenvectors of (Q − ∆E) except for a componentwise rescaling using the diagonal matrix D −1 C .This rescaling does not have an effect on the sign structure of the eigenvectors.
Lemma 1.The matrix Q is a similarity transform of (Q − ∇E), the whole change of the timescales is therefore driven by a perturbation of the diagonal only.
In conclusion, for a PCCA-based analysis of the influence of a coupling energy term on the slowest processes, one has to analyze how changing the diagonal of Q influences the result of the eigenproblem.

Methods
We applied our theoretical framework to two illustrative examples in Sec. 4.Here we provide the corresponding algorithmic and implementation details.First, we show how we can use the tSqRA to compute the application of Q and thereby also its spectrum without the large memory overhead of the representions of Q in a dense or sparse format, leading to exponential respectively linear memory savings in the number of dimensions.Then, we introduce the Projected Rate Estimation (PRE), an efficient algorithm to estimate the macroscopical transition rates of a coupled system from comparatively few simulations using the tSqRA of the combined system as a prior.

Exploiting the tensor formulation to solve the eigenvalue problem of the coupled system
The use of the tensor formulation does not offer any particular advantage in solving e.g.eigenvalue problem of the combined matrix Q, where it is more convenient to compose the the individual eigenvalues, eqs.(33) and eigenvectors (34).However, in the case of the coupled system Q it permits to avoid the explicit construction of a matrix for the application of Q to a vector, thus enabling the use of matrix-free methods.
Using the results for a global coupling (Section 2.3), we write the rate matrix applied to a column vector v of size (M × 1) as Making use of the definition D C = diag[ √ π C ] (23) and the fact that multiplying a diagonal matrix from the left is equivalent to the element-wise product the left hand term of the matrix-vector product (36) becomes Likewise, for the second term we can write In this way we have transformed the application of Q (36) to a series of element-wise operations as well as the application of Q out : Note that Q out is the result of the Kronecker sum of the matrices Q i and as such can be efficiently stored and computed by applying the individual Q i to the relevant components of the respective input.
In Appendix A.1 we illustrate how this can be achieved by a series of rearrangements of the input followed by a batched matrix-vector product and provide a corresponding python implementation.
Alternatively to this approach of pulling back the computation to matrix algebra, the banded structure of the Kronecker sum can be used to devise a direct cache efficient implementation of its action.We provide a self-contained Julia implementation of this approach, together with an implementation of the tensor formalism from Section 2.4 in Appendix B In any case, both approaches allow us to calculate the eigenvalues of Q using matrix-free solvers like the ARPACK algorithm [21] without ever creating the whole matrix Q.It follows that we can write the vectors v and e as tensors of order n and shape (M 1 , M 2 , . . ., M n ), and the two matrix-vector multiplications as sum of tensor dot products as in the previous section .
This has strong implications in terms of memory consumption.Consider a system with n dimensions, discretized into m bins each: A state vector in that case has a size of 8 • m n bytes, a dense representation of Q would need 8 • m 2n bytes, but even when using a sparse representation it would have approximately 8 • 2nm n bytes.For e.g. a system in n = 9 dimensions with m = 10 bins each, we would need approximately 8 GB for the state, 8 × 10 9 GB for a dense and 134 GB for a sparse Q matrix.Using the tensor decomposition of coupled systems the biggest part of storing Q is the storage of D C which has the same size as a single state vector.

The projected rate estimation
When analyzing a coupled system one is not necessarily interested in the whole matrix Q.In practice it may be sufficient to study the impact of the coupling on the characteristic time-scales, such as the rates between metastable macrostates.
To this end we propose an algorithm that can be understood as hybrid between the formal SqRA and the data-based Koopman estimation by trajectory simulation.We use the SqRA on the isolated systems to compute their spectrum and obtain the eigenfunctions of the combined systems by means of equation eq. ( 34).Using the PCCA+ algorithm we transform these into membership functions that characterize the slow-scale dynamics and span an invariant subspace of the combined system.
We then use a Gillespie algorithm to simulate trajectories according to the coupled system (without the need to compute the whole Q).By projecting these trajectories onto the membership functions of the combined system we can estimate the slowest rates of the interacting coupled.
To this end, let us quickly introduce PCCA+, an algorithm to compute the characteristic time-scales, a process also known as "coarse-graining".The fundamental object of PCCA+ are the membership functions χ ∈ R M×n C obtained by a linear combination of the n C dominant eigenfunctions X of Q via the matrix PCCA+ finds a matrix A such that χ is non-negative and all its row sums equal to one.The projection of Q onto this subspace leads to the coarse-grained generator Q c = A −1 ΛA ∈ R n C ×n C that provides the rates between the macroscopic states characterized by χ.One defining property of Q c and χ is that time propagation of the system (via Q or Q c ) and projection to the subspace χ commute, i.e.Qχ = χQ c , or using the (Moore-Penrose) pseudoinverse Q c = χ + Qχ.Analogous results hold for the Koopman operator K = K τ = exp(τQ) with lag time τ ∈ R (and the coarse grained K c ) which has the advantage that it allows for a Monte-Carlo approximation via simulations of the system, where x τ i are the end points of N sampled trajectories of length τ started in x.The guiding idea of this algorithm is that Kχ = χK c (43) and similarly for the coupled system K χ = χ K c if χ was computed with respect to K, which in practice is not known.However, for small coupling terms, we can expect that the invariant subspace of the combined system would not change too much, i.e. χ ≈ χ.This then leads to 4 Results

Two-Dimensional Example
This 2-dimensional example illustrates our theoretical elaborations.No special increases in efficiency are required to recalculate these examples in a comprehensible manner.

Numerical experiment parameters
As an illustrative example, we considered the overdamped Langevin dynamics of a particle of mass m = 1 amu and friction ξ = 1 ps −1 which moves in a two-dimensional space under the action of the potential energy function The function is made of two potentials V 1 and V 2 describing the dynamics of two non-interacting subsystems along the coordinates x 1 and x 2 respectively.Additionally, a coupling term V 12 can be activated (c ̸ = 0) or deactivated (c=0) by the parameter c.The two-dimensional function, illustrated in fig.1-(a) for c = 0 and c = 1, describes a surface with two wells of different heights separated by a barrier.For our numerical experiments, we assumed standard thermodynamic parameters: the temperature of the system was T = 300 K, the molar Boltzmann constant k B = 0.008314463 kJ • mol −1 • K −1 , and the diffusion constant was D = 2.49 nm 2 ps −1 in each direction.

The rate matrix
In order to build the rate matrix by SqRA, we discretized the x 1 -range [−3.4,3.4] nm and x 2 -range [−3.4,3.4] nm respectively, in M 1 = M 2 = 50 subsets of the same length ∆x 1 = ∆x 2 = 0.13 nm, for a total of M = M 1 × M 2 = 2500 square subsets of the two-dimensional space.
For the combined system (c = 0), we built respectively the rate matrices Q 1 and Q 2 (size 50 × 50), then, we estimated the first four eigenvalues and right eigenvectors solving the eigenvalue problems and where X 1 and X 2 are matrices of size 50 × 4 containing the first four eigenvectors, and λ 1 and λ 2 are four-dimensional vectors containing the corresponding eigenvalues.For the combined system (c = 0), the rate matrix Q of the entire system can be estimated from the Kronecker sum of the rate matrices Q 1 and Q 2 .Exploiting this property (see Ref. [6] for more details), we estimated sixteen eigenvalues λ of Q as the sum of the eigenvalues of Q 1 and Q 2 (eq.( 33)), and the corresponding eigenvectors X as Kronecker product (eq.( 34)), In eqs.( 53) and (54), we introduce the index k = i + 4 j.Note that the eigenvectors X 1 and X 2 have size M 1 = M 2 = 50, while X has size M = 2500.The eigenvalues and the corresponding eigenvectors are reported respectively in fig.1-(b,c).The first eigenvector is constant and is associated to the eigenvalue λ = 0.This eigenvector represents the stationary state of the system when the other eigenmodes have decayed.The other eigenvectors X k , with k > 0, represent the dominant kinetic processes between regions of the space with negative (blue color) and positive values (red color).The associated eigenvalues λ k are negative and represent the timescales − 1 /λ k at which the dominant processes decay.We conclude that the slowest process is a transition between the region of the space with x 1 < 0 and x 1 > 0, the second slowest process is a transition between the region of the space with x 2 < 0 and x 2 > 0 and the third slowest process is a mix transition between the quadrants of the plane.
For the coupled system (c ̸ = 1), the relations in eqs.( 53) and (54) do not hold, i.e. the eigenvectors and eigenvalues of the full system are not directly derived from those of the individual subsystems.In this example, we built the full matrix Q applying eq. ( 20) which requires to provide the diagonal matrix D 12 of size 50 × 50 = 2500.The main effect of the coupling term is to distort the eigenvectors and to break the symmetries of the combined system.With regard to the eigenvalues, we observe a rise in the second eigenvalue λ 1 and a decrease in the third eigenvalue λ 2 .From a physical perspective, this corresponds to an acceleration of the transition along the x 2 coordinate and a slowdown along the x 1 coordinate.In fig.1-(b), we also plot the eigenvalues of the off-diagonal matrices Q out and D −1 12 Q out D 12 .The eigenvalues of the two matrices perfectly match, due to a similarity transform.Thus, it is the matrix E of the total leaving rates defined in eq. ( 23) that affects the timescales of the system.

Coarse-grained rate matrix
The analysis of the system's eigenvectors and eigenvalues indicates that the system oscillates between two metastable states, i.e. two regions of the state space where the system resides for most of the time, with the transition from one state to the other occurring rarely.It is therefore convenient to reduce the matrices Q and Q to matrices Q c and Q c of size 2 × 2, containing the rates k 12 and k 21 between the two metastable regions.For this purpose, we used PCCA+, which provides the membership functions χ containing the probabilities that a state belongs to the identified metastable states.The membership functions can then be used to build the coarse-grained matrix Q c by projecting the fine-grained matrix Q: With regard to the combined system (c = 0), in practice, it is more convenient to exploit the algebraic structure of the rate matrix Q and to apply PCCA+ to Q 1 and Q 2 to obtain the membership functions χ 1 (size 50 × 2) and χ 2 (size 50 × 1), and the rate matrices Q c,1 (size 2 × 2) and Q c,2 (size 1).Note that χ 2 = 1 (size 50) because the system has one metastability along the x 2 direction and Q c,2 = 0.Then, the membership functions of the coupled system are obtained by the Kronecker product while the rate matrix is given by the Kronecker sum Because the system has two metastabilities, the Kronecker product simply results in an increase in the size of χ 1 from 50 entries to 50 × 50 = 2500 entries since χ 2 is a constant function, and the rate matrix Q c is equal to Q c,1 .However, if the second subsystem has two metastabilities as well, then the coupled system would have four metastabilities and the corresponding Q c matrix would have the size 4 × 4.

Coupled system
For the coupled system (c = 1) we applied the PRE algorithm descrived in section 3.2.The algorithm is based on the assumption that the Koopman operator K τ of the coupled system applied to membership functions χ of the combined system, well approximates the resultant of the coarse-grained Koopman operator K τ c applied to χ (see eq. ( 44)).First of all, we verified the validity of this assumption by constructing the discretized coupled Koopman operator as where Q is the rate matrix of the coupled system built by SqRA and expm(•) denotes a matrix exponential.The matrix K τ has been multiplied to the membership functions χ of the combined system and we solved the linear regression problem i.e. we looked for the matrix Kτ c (size 2 × 2) that minimizes the 2-norm ∥ K τ χ − χ Kτ c ∥.The matrix Kτ c approximates the exact coarse-grained Koopman operator K τ c for the coupled system and yields the approximated coarse-grained rate matrix as In fig.2-(b), we plotted the off-diagonal entries of Qc (dashed lines) for several τ values in the range [0,1.5]ps, and compared with the exact values of Q c (solid lines) obtained by applying PCCA+ to Q.We observe that while the exact results do not depend on the choice of lag time τ, the approximate results tend to converge only for large values of τ.However, the approximate rates do not reach the exact values.If building the matrix K τ is not feasible due to the high dimensionality, the action of the Koopman operator on the membership functions χ can only be approximated for some points x in the state space.Provided N R trajectories of length τ, starting in x and reaching the points x τ i , with i = 1, 2, ..., N R yields The trajectories could be generated by solving the underlying equations of motion of the coupled system, for example by using the Euler-Maruyama integration scheme.Instead, we exploited the knowledge of the transition rates between adjacent subsets of the state space, approximated by SqRA, and applied the Gillespie algorithm.
In order to explain the procedure, consider the following example.Let us assume that we know the matrices Q 1 and Q 2 of the one-dimensional individual subsystems and that the system is in a state x = (x 1 , x 2 ) belonging to a subset of the state space with indices (i 1 , i 2 ).The system, in an infinitesimal interval of time, can evolve its state in five ways: two transitions along the x 1 direction, two along the x 2 direction, or no transition.The four transition rates are contained in the Q 1 and Q 2 matrices, but these must be reweighted according to the coupling term V 12 in eq.(50).For example, the forward transition rate along the x 1 direction becomes: Likewise, we obtain the rates and the total leaving rate given by minus the sum of the former.Thus, we have estimated only a row of the matrix Q, which can be used within a standard Gillespie algorithm to estimate: (i) the infinitesimal time interval at which the next transition occurs, (ii) which of the five possible events (four transitions and no transition) takes place.The reweighting of rates for the coupled system occurs at each simulation step, however, for highly dimensional systems this is the only solution since it is not possible to estimate the entire matrix Q.
Since the Gillespie algorithm provides the solution in terms of jumps between the subsets of the state space, it provides the indices ( j 1 , j 2 ) of the arrival subset of x τ and the calculation of the membership functions can be approximated by calculating the value that χ assumes in the target subsets: Repeating this calculation for N R trajectories, one can approximate the Koopman operator according to eq. ( 61) and solve the linear regression problem in eq. ( 59).However, unlike the situation where we have the entire matrix K τ , after solving the linear regression problem, it is necessary to normalize the rows of the matrix K τ c .We studied the sensitivity of the method to the initial points and the lag time τ of the Koopman operator and reported the results in fig. 2. First of all, we studied the dependence of the method on the number of initial points x, randomly drawn from a uniform distribution that covers the state space, setting a lag time equal to τ = 0.5 ps and a number of replicas per point N R = 100.We do not observe any particular improvement in the results by increasing the number of initial points, so for the next analysis, we considered 10 initial points and increased the number of replicas to 500 to reduce the stochastic error.Excluding the first point, which corresponds to a lag time of 0.1 ps, the agreement with the expected results (dashed line) is excellent.

n-dimensional example
Without special mathematical tricks one could not calculate the SqRA for high-dimensional examples.In this example, we heavily use the algebraic structure of tSqRA to rigorously calculate the eigenvalues of a large rate "matrix".That this becomes possible is shown in Fig. 3.

Numerical experiment parameters.
As a second example, we studied a "metastable chain" made of n bimetastable systems interacting via harmonic oscillators.The potential energy function describing the dynamics of this system is written as The solid lines represent the exact results using Q, the dashed lines represent the projected rate estimation (PRE) using the exact K τ = exp(Qτ), and the squares represent the PRE using Gillespie's algorithm to estimate where and The combined system (c = 0) is characterized by n 2 metastable states, while the coupled system (c = 1) has only two metastable states, as illustrated in fig.3-(a) for n = 2 and n = 3, respectively.For the numerical experiments, we used the same thermodynamic parameters as in the previous examples.To build the rate matrix Q i of each bimetastable system by SqRA, we discretized the range [−2.5, 2.5] nm of each coordinate x i into M i = 5 subsets of length ∆x i = 1.0 nm, then the total number of subsets of the system is 5 n .Differently from the previous example, constructing the rate matrices Q and Q of the coupled system (combined and coupled) is not feasible for high values of n.Thus, we used the approach described in section 3.1 to calculate the eigenvalues and eigenvectors without building the full rate matrices but providing to the eigensolver a function that represents their action on a vector of size (M × 1).Fig. 3-(b) shows the first 8 eigenvalues of the rate matrix Q and Q.For the combined case (a), we observe that the number of eigenvalues associated to the slowest processes is equal to n, then the slowest transitions occur along the coordinates x i .On the contrary, the effect of the coupling term is to change the symmetry of the system and to create a hierarchy of sub-processes arising at different time scales.Thus, we observe a decreasing scale of eigenvalues for each n-dimensional system.The first non-zero eigenvalue corresponds to the slowest process occurring along the diagonal of the n-dimensional circle, as shown in the fig.3-(c) by the corresponding eigenvectors for the case with n = 2 and n = 3.Our article is a step towards understanding the origins of these timescales and the influence of interaction energies between subsystems.
Using the tensor approach (tSqRA) we can describe the direct relation between energies and rates of coupled systems which allows for its mathematical analysis based on first principle models of molecular processes.Combining the tSqRA with PCCA+ we also suggest a new hybrid method, the projected rate estimation (PRE), which uses the combined non-interacting subsystems (model-based) as prior for an effective estimation of the macroscopical rates of a coupled system by a comparatively small number of simulations (data-based).
On the computational side, the tSqRA improves the applicability and efficacy of the SqRA for coupled systems even without the use of low-rank approximations.However, we also explain how tSqRA can be extended to incorporate low-rank approximations.Currently the limiting factor for an efficient tensor-based calculation is the dense representation of the system's state, which is succumbed to the curse of dimensionality.However, a low rank representation, e.g. in the form of tensor networks, could alleviate that problem [22].The lacking piece so far is the support for truncated Hadamard-products.
Note in this regard that the adjacency matrix A, being a Kronecker sum, is inherently low rank.Similarly the matrix D is composed of Hadamard products like in eq. ( 28).Assuming that the system under consideration admits a low-rank representation, as to be expected for two weakly coupled systems, we expect the exploitation of these low-rank structures to dramatically improve both memory and compute costs.
From our perspective, it is worthwhile to better comprehend the computational advantage of tensorbased complexity reduction methods for future analysis of molecular systems without having to generate molecular simulation data.
A.2 Python codes for computations of Qv and eigenvalues of Q based on matrices Listing 1: The code defines the function sum_tensor_dots() that calculates the matrix-vector multiplication Qv as sum of tensor dot products, then it solves the eigenvalue problem.In this example, the matrix Q i and the vector v are randomly generated, and it is assumed that each dimension is discretized into the same number of subsets.The function np.tensordot() applies the tensor contraction to axis 1 of the matrix, and the index i of the vector in its tensor formulation.Similar solutions can be also found with the function np.einsum().Note that in Python the indexing starts from 0.

B Julia implementation for multiple low-order interactions
The following code illustrates how to compute the application of Q using the tensor approach developed in section 2.4.apply_A exploits the fact that A is banded (with entries 1) to copy and add the specific bands of the input x to the output y (see Figure 4).compute_D then loops over a given list of potential functions, evaluates them at the grids of the specified dimensions and multiplies them according to eq. (28) along the corresponding modes using Julia's broadcasting capabilities.We finally compute D and E for the example system with potential V (x, y) = x 2 + (xy) 2 on a a square grid of size 10 × 10.The last two functions implement the application of A − E and Q respectively.
Using this we were able to compute a simplified 9-dimensional pentane molecule with 10 cells in each dimension, resulting in a memory demand of 10 9 × 64bit = 8Gb per state.Whilst the computa-

Figure 1 :
Figure 1: 2D model.(a) Potential energy function for combined (left) and coupled system with c = 1 (right); (b) Eigenvalues of the matrix Q out (left) and Q (right) for the combined (blue) and coupled system (red); (c) First four eigenvectors for combined (top) and coupled system (bottom), the red and blue color denotes positive and negative values respectively, the white lines represent the zero-values.

Figure 2 :
Figure 2: 2D model.Transition rates k 12 (blue) and k 21 (red) of the coarse-grained model as functions of the number of starting points (a) and the lag time τ (b).The solid lines represent the exact results using Q, the dashed lines represent the projected rate estimation (PRE) using the exact K τ = exp(Qτ), and the squares represent the PRE using Gillespie's algorithm to estimate K τ .

Figure 3 :
Figure 3: Metastable chain.(a) Combined and coupled potential energy function for n = 2 and n = 3, dark and bright colors represent respectively low and high energy values.;(b) first 8 eigenvalues for n = 2, 3, . . . 10 of the combined and coupled system; (c) second eigenvectors for n = 2 and n = 3 of the combined and coupled system.

Table 1 :
Table of symbols.Observable function of the ith subsystem fi (x i ) L i -dimensional function Observable function of the i ∼ j coupled system f i j (x i , x j ) (L i + L j )-dimensionalfunction Number of subsets in which x i is discretized M i scalar constant Total number of subsets in which x is discretized M scalar constant Indices of the subsets of a discretized coordinate α, β , γ, . . .scalar index Center of the αth subset of the discretized coordinate x i x α i scalar variable