We develop a thorough mathematical analysis to deduce conditions for the accuracy and convergence of different approximations of the memory integral in the Mori-Zwanzig (MZ) equation. In particular, we derive error bounds and sufficient convergence conditions for short-memory approximations, the *t*-model, and hierarchical (finite-memory) approximations. In addition, we derive useful upper bounds for the MZ memory integral, which allow us to estimate *a priori* the contribution of the MZ memory to the dynamics. Such upper bounds are easily computable for systems with finite-rank projections. Numerical examples are presented and discussed for linear and nonlinear dynamical systems evolving from random initial states.

## I. INTRODUCTION

The Mori-Zwanzig (MZ) formulation is a technique from irreversible statistical mechanics that allows the development of formally exact evolution equations for the quantities of interest such as macroscopic observables in high-dimensional dynamical systems.^{45,10,41,42} One of the main advantages of developing such exact equations is that they provide a theoretical starting point to avoid integrating the full (possibly high-dimensional) dynamical system and instead solve directly for the quantities of interest, thus reducing the computational cost significantly. Computing the solution to the Mori-Zwanzig equation, however, is a challenging task that relies on approximations and appropriate numerical schemes. One of the main difficulties lies in the approximation of the MZ memory integral (convolution term), which encodes the effects of the so-called orthogonal dynamics in the time evolution of the quantity of interest. The orthogonal dynamics are essentially a high-dimensional flow satisfying a complex integro-differential equation. Over the years, many techniques have been proposed for approximating the MZ memory integral, the most efficient ones being problem-dependent.^{31,41} For example, in applications to statistical mechanics, Mori’s continued fraction method^{27,18} has been quite successful in determining exact solutions to several prototype problems, such as the dynamics of the auto-correlation function of a tagged oscillator in a harmonic chain.^{17,23} Other effective approaches to approximate the MZ memory integral rely on perturbation methods,^{6,42,8} mode coupling theories,^{2,31} functional approximation techniques,^{40,19,43,21,29} and data-driven methods.^{5,25} In a parallel effort, the applied mathematics community has, in recent years, attempted to derive general easy-to-compute representations of the memory integral. In particular, various approximations such as the *t*-model,^{10,12,34} the modified *t*-model,^{7} and, more recently, renormalized perturbation methods^{36} were proposed to address the approximation of the memory integral in situations where there is no clear separation of scales between resolved and unresolved dynamics.

The main objective of this paper is to develop rigorous estimates of the memory integral and provide convergence analysis of different approximation models of the Mori-Zwanzig equation, such as the short-memory approximation,^{33} the *t*-model,^{12} and hierarchical methods.^{35} In particular, we study the MZ equation corresponding to two broad classes of projection operators: (i) infinite-rank projections (e.g., Chorin’s projection^{10}) and (ii) finite-rank projections (e.g., Mori’s projection^{28}). We develop our analysis for both state-space and probability density function (PDF) space formulations of the MZ equation. These two descriptions are connected by the same duality principle that pairs the Koopman and Frobenious-Perron operators.^{15}

This paper is organized as follows. In Sec. II, we outline the general procedure to derive the MZ equation in the phase space and discuss the common choices of projection operators. In Sec. III, we derive error bounds for the MZ memory integral and provide the convergence analysis of different memory approximation methods, including the *t*-model,^{34,10,12} the short-memory approximation,^{33} and the hierarchical memory approximation technique.^{35} Such estimates are built upon semigroup estimation methods we present in Appendix A. In Sec. IV, we present numerical examples demonstrating the accuracy of the memory approximation/estimation methods we develop throughout the paper. The main findings are summarized in Sec. V. Convergence analysis of the MZ memory term in the probability density function space formulation is presented in Appendix B.

## II. THE MORI-ZWANZIG FORMULATION

Consider the nonlinear dynamical system

evolving on a smooth manifold $S$. For simplicity, let us assume that $S=Rn$. We will consider the dynamics of scalar-valued observables $g:S\u2192C$, and for concreteness, it will be desirable to identify structured spaces of such observable functions. In Ref. 15, it was argued that *C*^{*}-algebras of observables such as $L\u221e(S,C)$ (the space of all measureable, essentially bounded functions on $S$) and $C0(S,C)$ (the space of all continuous functions on $S$, vanishing at infinity) make natural choices. In what follows, we do not require the observables to comprise a *C*^{*}-algebra, but we will want them to comprise a Banach space as the estimation theorems of Sec. III make an extensive use of the norm of this space. Having the structure of a Banach space of observables also gives greater context to the meaning of the linear operators $L$, $K$, $P$, and $Q$ to be defined hereafter.

The dynamics of any scalar-valued observable *g*(*x*) (quantity of interest) can be expressed in terms of a semi-group $K(t,s)$ of operators acting on the Banach space of observables. This is the Koopman operator^{24} which acts on the function *g* as

where

Rather than computing the Koopman operator applicable to all observables, it is often more tractable to compute the evolution only of a (closed) subspace of the quantities of interest. This subspace can be described conveniently by means of a projection operator $P$ with the subspace as its image. Both $P$ and the complementary projection $Q=I\u2212P$ act on the space of observables. The nature, mathematical properties, and connections between $P$ and the observable *g* are discussed in detail in Ref. 15 and summarized in Sec. II A. For now, it suffices to assume that $P$ is a bounded linear operator and that $P2=P$. The MZ formalism describes the evolution of observables initially in the image of $P$. Because the evolution of observables is governed by the semi-group $K(t,s)$, we seek an evolution equation for $K(t,s)P$. By using the definition of the Koopman operator (3) and the well-known Dyson identity

we obtain the operator equation

By applying this equation to an observable function *u*_{0}, we obtain the well-known MZ equation in phase space

Acting on the left with $P$, we obtain the evolution equation for projected dynamics

Note, in fact, that the projection of the second term in (5), i.e., $PetQLQLx0=0$, vanishes since $PQ=0$.

### A. Projection operators

In this section, we make a summary on the commonly used projection operators $P$ in the Mori-Zwanzig framework. To make our definition mathematically sound, we begin by assuming that the Liouville operator (3) acts on observable functions in a *C*^{*}-algebra $A$, for instance, $L\u221e(M,\Sigma ,\mu )$, where $M$ is a space such as $RN$, Σ is a *σ*-algebra on $M$, and *μ* is a measure on Σ. Let $\sigma \u2208A*$ be a positive linear functional on $A$. We define the weighted pre-inner product

This can be used to define a Hilbert space $H=L2(M,\sigma )$, which is the completion of the quotient space

endowed with the inner product ⟨·,·⟩_{σ}. The *L*^{2} norm induced by the inner product is denoted as ∥·∥_{σ}. In the rest of the paper, the positive linear functional *σ* is always induced by a probability distribution $\sigma \u0303$ through

where $\sigma \u0303$ is typically chosen to be the probability density of the initial condition *ρ*_{0} or the equilibrium distribution *ρ*_{eq} in statistical physics. To conform to the literature, we also use notation $\u27e8\u22c5,\u22c5\u27e9\rho 0$, $\u27e8\u22c5,\u22c5\u27e9\rho eq,\u27e8\u22c5,\u22c5\u27e9eq$ to represent the weighted inner product corresponding to different probability measures $\sigma \u0303(\omega )d\omega $. With the Hilbert space determined, we now focus on the following two broad classes of orthogonal projections on $H$.

#### 1. Infinite-rank projections

The first class of projection operators to consider in this setting are the conditional expectations $P$ such that $P*\sigma =\sigma $. In this case, the properties of conditional expectations (in particular, that $P[P(f)gP(h)]=P(f)P(g)P(h)$^{39}) and the fact that $P*\sigma =\sigma $ imply that

so that

for all $f,g\u2208H$. It follows that

Therefore both $P$ and $Q$ are self-adjoint (i.e., orthogonal) projections onto closed subspaces of $H$ and hence contractions $\Vert P\Vert \sigma \u22641$, $\Vert Q\Vert \sigma \u22641$. Chorin’s projection^{12,10} is one of this class and is defined as

Here $x(t;x0)=(x^(t;x^0,x\u03030),x\u0303(t;x^0,x\u03030))$ is the flow map generated by (1) split into resolved ($x^$) and unresoved ($x\u0303$) variables, and $g(x)=g(x^,x\u0303)$ is the quantity of interest. For Chorin’s projection, the positive functional *σ* defining the Hilbert space $H$ may be taken to be integration with respect to the probability measure $\rho 0(x^0,x\u03030)$. Clearly, if *x*_{0} is deterministic, then $\rho 0(x^0,x\u03030)$ is a product of Dirac delta functions. On the other hand, if $x^(0)$ and $x\u0303(0)$ are statistically independent, i.e., $\rho 0(x^0,x\u03030)=\rho ^0(x^0)\rho \u03030(x\u03030)$, then the conditional expectation $P$ simplifies to

In the special case where $u(x^,x\u0303)=x^(t;x^0,x\u03030)$, we have

i.e., the conditional expectation of the resolved variables $x^(t)$ given the initial condition $x^0$. This means that an integration of (9) with respect to $\rho ^0(x^0)$ yields the mean of the resolved variables, i.e.,

Obviously, if the resolved variables $x^(t)$ evolve from a deterministic initial state $x^0$, then the conditional expectation (9) represents the the average of the reduced-order flow map $x^(t;x^0,x\u03030)$ with respect to the PDF of $x\u03030$, i.e., the flow map

In this case, the MZ equation (6) is an exact (unclosed) partial differential equation (PDE) for the multivariate field $X0(t,x^0)$. In order to close such an equation, a mean field approximation of the type $Pf(x^)=f(Px^)$ was introduced by Chorin *et al.* in Refs. 10, 12, and 11, together with the assumption that the probability distribution of *x*_{0} is invariant under the flow generated by (1).

#### 2. Finite-rank projections

Another class of projections is defined by choosing a closed (typically finite-dimensional) linear subspace $V\u2282H=L2(M,\sigma )$ and letting $P$ to be the orthogonal projection onto *V* in the *σ* inner product. An example of such projection is Mori’s projection,^{46} widely used in statistical physics. For finite-dimensional *V*, given a linearly independent set {*u*_{1}, …, *u*_{M}} ⊂ *V* that spans *V*, $P$ can be defined by first constructing the positive definite Gram matrix $Gij=\u27e8ui,uj\u27e9\sigma $. Then

This projection is orthogonal with respect to the $L\sigma 2$ inner product. In statistical physics, a common choice for the positive functional *σ* that generates $H$ is integration with respect to the Gibbs canonical distribution $\rho eq=e\u2212\beta H/Z$, for the Hamiltonian $H=H(p,q)$ and the associated partition function *Z*. Here *q* are generalized coordinates while *p* are kinetic momenta.

## III. ANALYSIS OF THE MEMORY INTEGRAL

In this section, we develop a thorough mathematical analysis of the MZ memory integral

and its approximation. We begin by describing the behavior of the semigroup norms $\Vert etL\Vert $, $\Vert etQLQ\Vert $, and $\Vert etLQ\Vert $ as functions of time, for different choices of projection $P$ and different norms. As we will see, the analysis will give clear computable bounds only in some circumstances, illustrating the difficulty of this problem and the need for further development and insight.

### A. Semigroup estimates

For any Liouville operator $L$ of the form (3) acting on $A=L\u221e(M,\Sigma ,\mu )$ and for any *σ* identified with an element of $L1(M,\Sigma ,\mu )$, the functional $L*\sigma $ (assuming that *σ* lies in the domain of $L*$) is absolutely continuous with respect to *σ* (essentially because $L$ acts locally) and the Radon-Nikodym derivative^{20} may be identified with the negative of the divergence of the vector field *F* with respect to the measure induced by *σ*, i.e.,

where

When $M=RN$ and *σ* has the form $\sigma (u)=\u222bRN\sigma \u0303(x)u(x)dx$, this can be shown more directly using integration by parts. By assuming that $\sigma \u0303(x)$ or *F*_{i} decays to 0 at *∞*, we have

from which we see

Therefore,

and the following are equivalent: (i) $L*\sigma =0$ (i.e., *σ* is invariant); (ii) div_{σ}*F* = 0; (iii) $L$ is the skew-adjoint with respect to the *σ* inner product. More generally, on $H=L2(M,\sigma )$, we find that

Using this numerical abscissa *ω*, we obtain the following $L\sigma 2$ estimation of the Koopman semigroup:

and moreover, *e*^{ωt} is the smallest exponential function that bounds $\Vert etL\Vert \sigma $.^{14} When $P$ and $Q=I\u2212P$ are the orthogonal projections on $L2(M,\sigma )$, we can observe that the numerical abscissa of $QLQ$ is bounded by that of $L$. In fact,

[see Eq. (17)] so that

It should be noticed that this bound for the orthogonal semigroup is not necessarily tight. The tightness of the bound depends on the choice of projection $P$ and comes down to whether functions in the image of $Q$ can be chosen localized to regions where div_{σ}*F* is close to its infimal value.

When estimating the MZ memory integral, we need to deal with the semigroup $etLQ$. It turns out to be extremely difficult to prove strong continuity of such a semigroup in general, due to the unboundedness of $LQ$. It is shown in Appendix A that when $PLQ$ is an unbounded operator, as is typical when $P$ is a conditional expectation, the semigroup $etLQ$ can only be bounded as

for some $MQ>1$, due to the fact that $\Vert etLQ\Vert \sigma $ has an infinite slope at *t* = 0. More work is needed to obtain satisfactory, computable values for $MQ$ and $\omega Q$, in the case where $P$ and $Q$ are the infinite-rank projections. It is also shown that when either $PLQ$ or $LP$ is bounded, for example, when $P$ is a finite-rank projection, we can get computable semigroup bounds of the form

where *ω* = −inf div_{σ}*F* if we use the $L\sigma 2$ estimation of $etL$.

### B. Memory growth

We begin by seeking to bound the MZ memory integral (B4) as a whole and build our analysis from there. A key assumption of our analysis is that the semigroup $etLQ$ is strongly continuous, i.e., the map $t\u21a6etLQg$ is continuous in the norm topology on the space of observables for each fixed *g*.^{16} As we pointed out in Sec. III A, it is very difficult to rigorously prove a strong continuity of $etLQ$ and to obtain a computable upper bound for unbounded generators of the form $LQ$. We leave this problem for future research, and assume for now that that there exist constants $MQ$ and $\omega Q$ such that $\Vert etLQ\Vert \u2264MQet\omega Q$. Throughout this section, ∥·∥ denotes a general Banach norm. We begin with the following simple estimate:

**Memory growth**).

*Let*$etL$

*and*$etLQ$

*be strongly continuous semigroups with upper bounds*$\Vert etL\Vert \u2264Met\omega $

*and*$\Vert etLQ\Vert \u2264MQet\omega Q$

*. Then*

*where*

*and*$C1=MMQ\Vert LQLu0\Vert $

*is a constant. Clearly, lim*

_{t→0}

*M*

_{0}

*(t) = 0.*

*Proof.*

Theorem 3.1 provides an upper bound for the growth of the memory integral based on the assumption that $etL$ and $etLQ$ are strongly continuous semigroups. We emphasize that only for simple cases can such upper bounds be computed analytically (we will compute one of the cases later in Sec. IV) because of the fundamental difficulties in computing the upper bound of $etLQ$. However, it will be shown later that, although the specific expression for *M*_{0}(*t*) is unknown, the *form* of it is already useful as it enables us to derive some verifiable theoretical predictions for general nonlinear systems.

### C. Short memory approximation and the *t*-model

Theorem 3.1 can be employed to obtain upper bounds for well-known approximations of the memory integral. Let us begin with the *t*-model proposed in Ref. 12. This model relies on the approximation

**Memory approximation via the**

*t*

**-model**

^{12}).

*Let*$etL$

*and*$etLQ$

*be strongly continuous semigroups with upper bounds*$\Vert etL\Vert \u2264Met\omega $

*and*$\Vert etLQ\Vert \u2264MQet\omega Q$

*. Then*

*where*

*and*$C1=MMQ\Vert P\Vert 2\Vert LQLu0\Vert $

*.*

*Proof.*

*t*-model. The limit

*t*-model for short integration times. On the other hand, depending on the semigroup constants

*M*,

*ω*, $MQ$, and $\omega Q$ (which may be estimated numerically), the error of the

*t*-model may remain small for longer integration times (see the numerical results in Sec. IV B 2). Next, we study the short-memory approximation proposed in Ref. 33. The main idea is to replace the integration interval [0,

*t*] in (13) by a shorter time interval [

*t*− Δ

*t*,

*t*], i.e.,

*t*∈ [0,

*t*] identifies the effective

*memory length*. The following result provides an upper bound to the error associated with the short-memory approximation.

**Short memory approximation**

^{33}).

*Let*$etL$

*and*$etLQ$

*be strongly continuous semigroups with upper bounds*$\Vert etL\Vert \u2264Met\omega $

*and*$\Vert etLQ\Vert \u2264MQet\omega Q$

*. Then the following error estimate holds true:*

*where*

*and*$C1=MMQ\Vert P\Vert 2\Vert LQLu0\Vert $

*.*

We omit the proof due to its similarity to that of Theorem 3.1. Note that $lim\Delta t\u2192t$*M*_{2}(Δ*t*, *t*) = 0 for all finite *t* > 0.

### D. Hierarchical memory approximation

An alternative way to approximate the memory integral (13) was proposed by Stinis in Ref. 35. The key idea is to repeatedly differentiate (13) with respect to time and establish a hierarchy of PDEs which can eventually be truncated or approximated at some level to provide an approximation of the memory. In this section, we derive this hierarchy of memory equations and perform a thorough theoretical analysis to establish the accuracy and convergence of the method. To this end, let us first define

to be the memory integral (13). By differentiating $w$_{0}(*t*) with respect to time, we obtain

where

By iterating this procedure *n* times, we obtain

where

The hierarchy of Eqs. (27) and (28) is equivalent to the following infinite-dimensional system of PDEs:

evolving from the initial condition $w$_{i}(0) = 0, *i* = 1, 2, … [see Eq. (28)]. With such an initial condition available, we can solve (29) with backward substitution, i.e., from the last equation to the first one, to obtain the following (exact) *Dyson series representation* of the memory integral (26):

So far no approximation was introduced, i.e., the infinite-dimensional system (29) and the corresponding formal solution (30) are *exact*. To make progress in developing a computational scheme to estimate the memory integral (26), it is necessary to introduce approximations. The simplest of these rely on truncating the hierarchy (29) after *n* equations, while simultaneously introducing an approximation of the *n*th order memory integral $w$_{n}(*t*). We denote such an approximation as $wnen(t)$. The truncated system takes the form

The notation $wjn(t)$ (*j* = 0, …, *n* − 1) emphasizes that the solution to (31) is, in general, different from the solution to (29). The initial condition of the system can be set as $win(0)=0$, for all *i* = 0, *…*, *n* − 1. By using backward substitution, this yields the following formal solution:

representing an approximation of the memory integral (26). Note that, for a given system, such approximation depends only on the number of equations *n* in (31) and on the choice of the approximation $wnen(t)$. In the present paper, we consider the following choices:

Approximation by truncation (

*H*-model),

Type-I finite memory approximation (FMA),

Type-II finite memory approximation,

*H*_{t}-model,

The quantities *t*_{n} and Δ*t*_{n} appearing in (34) and (35) will be defined in Theorems 3.5 and 3.6, respectively. The first approximation, i.e., (33), is a truncation of the hierarchy obtained by assuming that $w$_{n}(*t*) = 0. Such approximation was originally proposed by Stinis in Ref. 35, and we shall call it the *H*-model. The Type-I finite memory approximation (FMA) is obtained by applying the short memory approximation to the *n*th order memory integral $w$_{n}(*t*). The Type-II finite memory approximation (FMA) is a modified version of the Type-I, with a larger memory band. The *H*_{t}-model approximation is based on replacing the *n*th order memory integral $w$_{n}(*t*) with a classical *t*-model. Note that in this setting, the classical *t*-model approximation proposed by Chorin and Stinis^{12} is equivalent to a zeroth-order *H*_{t}-model approximation.

Hereafter, we present a thorough mathematical analysis that aims at estimating the error $\Vert w0(t)\u2212w0n(t)\Vert $, where $w$_{0}(*t*) is the full memory at time *t* [see (26) or (30)], while $w0n(t)$ is the solution of the truncated hierarchy (31), with $wnen(t)$ given by (33), (34), (35), or (36). With such error estimates available, we can infer whether the approximation of the full memory $w$_{0}(*t*) with $w0n(t)$ is accurate and, more importantly, if the algorithm to approximate the memory integral converges. To the best of our knowledge, this is the first time a rigorous convergence analysis is performed on various approximations of the MZ memory integral. It turns out that the distance $\Vert w0(t)\u2212w0n(t)\Vert $ can be controlled through the construction of the hierarchy under some constraint on the initial condition.

#### 1. The *H*-model

Setting $wnen(t)=0$ in (31) yields an approximation by truncation, which we will refer to as the *H*-model (hierarchical model). Such a model was originally proposed by Stinis in Ref. 35. Hereafter we provide error estimates and convergence results for this model. In particular, we derive an upper bound for the error $\Vert w0(t)\u2212w0n(t)\Vert $ and sufficient conditions for the convergence of the reduced-order dynamical system. Such conditions are problem dependent, i.e., they involve the Liouvillian $L$, the initial condition *u*_{0}, and the projection $P$.

**Accuracy of the**

*H*

**-model**).

*Let*$etL$

*and*$etLQ$

*be strongly continuous semigroups with upper bounds*$\Vert etL\Vert \u2264Met\omega $

*and*$\Vert etLQ\Vert \u2264MQet\omega Q$

*, and let T > 0 be a fixed integration time. For some fixed n, let*

*Then, for any*1

*≤ p ≤ n and all t ∈ [0, T], we have*

*where*

*and*

*Proof.*

_{0}and its approximation $w0p$,

Theorem 3.4 states that for a given dynamical system (represented by $L$) and quantity of interest (represented by $P$), the error bound $M3p(t)$ is strongly related to {*α*_{j}} which is ultimately determined by the initial condition *x*_{0}. It turns out that by bounding {*α*_{j}}, we can control $M3p(t)$ and therefore the overall error $\Vert w0(t)\u2212w0p(t)\Vert $. The following corollaries discuss sufficient conditions such that the error $\Vert w0(T)\u2212w0n(T)\Vert $ decays as we increase the differentiation order *n* for fixed time *T* > 0.

*Corollary 3.4.1*

**Uniform convergence of the**

*H*-

**model**).

*If {α*

_{j}

*} in Theorem 3.4 satisfy*

*for any fixed time T >*0,

*then there exists a sequence of constants δ*

_{1}

*> δ*

_{2}

*>*⋯

*> δ*

_{n}

*such that*

*Proof.*

*T*> 0 yields

*C*

_{2}=

*C*

_{2}(

*T*) =

*C*

_{1}

*A*

_{1}

*A*

_{2}. If there exists

*δ*

_{p}≥ 0 such that

*δ*

_{p+1}such that

*α*

_{p+1}< (

*p*+ 2)/

*T*. Moreover, the condition

*α*

_{j}< (

*j*+ 1)/

*T*holds for all 1 ≤

*j*≤

*n*. Therefore, we conclude that for any fixed time

*T*> 0, there exists a sequence of constants

*δ*

_{1}>

*δ*

_{2}> ⋯ >

*δ*

_{n}such that $\Vert w0(T)\u2212w0p(T)\Vert \u2264\delta p$, where 1 ≤

*p*≤

*n*.

Corollary 3.4.1 provides a sufficient condition for the error $\Vert w0(t)\u2212w0p(t)\Vert $ to decrease monotonically as we increase *p* in (31). A stronger condition that yields an asymptotically decaying error bound is given by the following Corollary.

*Corollary 3.4.2*

**Asymptotic convergence of the**

*H*

**-model**).

*If α*

_{j}

*in Theorem 3.4 satisfies*

*for some positive constant C, then for any fixed time T >*0

*and arbitrary δ >*0

*, there exists a constant*1

*≤ p < +∞ such that for all n > p,*

*Proof.*

*α*

_{j}<

*C*in the Proof of Theorem 3.4, we obtain

*p*< +

*∞*such that for all

*n*>

*p*, $\Vert w0(T)\u2212w0n(T)\Vert \u2264\delta $.

An interesting consequence of Corollary 3.4.2 is the existence of a *convergence barrier*, i.e., a “hump” in the error plot $\Vert w0(T)\u2212w0p(T)\Vert $ versus *p* generated by the *H*-model. While Corollary 3.4.2 only shows that behavior for an upper bound of the error, not directly the error itself, the feature is often found in the actual errors associated with numerical methods based on these ideas. The following corollary shows that the requirements on {*α*_{j}} can be dropped (we still need *α*_{j} < +*∞*) if we consider relatively short integration times *T*.

*Corollary 3.4.3*

**Short-time convergence of the**

*H*

**-model**).

*For any integer n for which α*

_{j}

*< ∞ for*1

*≤ j ≤ n and any sequence of constants δ*

_{1}

*> δ*

_{2}

*>*⋯

*> δ*

_{n}

*>*0

*, there exists a fixed time T >*0

*such that*

*for*1

*≤ p ≤ n.*

*Proof.*

*α*

_{j}< +

*∞*, we can choose $C=max1\u2264\u2009j\u2264n$

*α*

_{j}. By following the same steps we used in the Proof of Theorem 3.4, we conclude that for

*p*≤

*n*.

Corollaries 3.4.1 and 3.4.2 provide sufficient conditions for the error $\Vert w0(T)\u2212w0n(T)\Vert $ generated by the *H*-model to decay as we increase the truncation order *n*. However, we still need to answer the important question of whether the *H*-model actually provides accurate results for a given nonlinear dynamics ($L$), quantity of interest ($P$) and initial state *x*_{0}. Corollary 3.4.3 provides a partial answer to this question by showing that, at least in the short time period, condition (41) is always satisfied (assuming that {*α*_{j}} are finite). This guarantees the short-time convergence of the *H*-model for any reasonably smooth nonlinear dynamical system and almost any observable. However, for longer integration times *T*, convergence of the *H*-model for arbitrary nonlinear dynamical systems cannot be established in general, which means that we need to proceed on a case-by-case basis by applying Theorem 3.4 or by checking whether the hypotheses of Corollary 3.4.1 or Corollary 3.4.2 are satisfied. On the other hand, the convergence of the *H*-model can be established for any finite integration time in the case of linear dynamical systems, as we have recently shown in Ref. 44. From a practical viewpoint, the implementation of the *H*-model requires computing $(LQ)nLx0$ to high-order in *n*. This is not straightforward if the dynamical systems are nonlinear. However, for linear systems, such terms can be easily computed (see Sec. III E).

#### 2. Type-I finite memory approximation (FMA)

The Type-I finite memory approximation is obtained by solving the system (31) with $wnen(t)$ given by (34). As before, we first derive an upper bound for $\Vert w0(t)\u2212w0n(t)\Vert $ and then discuss sufficient conditions for convergence. Such conditions basically control the growth of an upper bound on $\Vert w0(t)\u2212w0n(t)\Vert $.

**Accuracy of the Type-I FMA**).

*Let*$etL$

*and*$etLQ$

*be strongly continuous semigroups and let T >*0

*be a fixed integration time. If*

*then for each*1

*≤ p ≤ n and for Δt*

_{p}≤ t ≤ T,*where*

*and C*

_{1},

*A*

_{1},

*A*

_{2}

*are as in Theorem 3.4.*

*Proof.*

*p*th level is of the form

*f*

_{p}as

*A*

_{1},

*A*

_{2}are defined in (38), then we have that

We notice that if the effective memory band at each level decreases as we increase the differentiation order *p*, then we can control the error $\Vert w0(t)\u2212w0n(t)\Vert $. The following corollary provides a sufficient condition that guarantees this sort of control of the error.

*Corollary 3.5.1*

**Uniform convergence of the Type-I FMA**).

*If α*

_{j}

*in Therorem 3.5 satisfy*

*then for any T*> 0

*and δ*> 0,

*there exists an ordered sequence*Δ

*t*

_{n}< Δ

*t*

_{n−1}< ⋯ < Δ

*t*

_{1}<

*T such that*

*which satisfies*

*Proof.*

*p*≤

*n*, we set

*t*

_{p}:

*t*

_{p}satisfies

*t*

_{n}, we find that it is a decreasing time sequence 0 < Δ

*t*

_{n}< Δ

*t*

_{n−1}< ⋯ < Δ

*t*

_{1}<

*T*such that $\Vert w0(T)\u2212w0n(T)\Vert \u2264\delta $ holds for all

*t*∈ [0,

*T*] and which satisfies (45).

*Remark.*

*uniform convergence*of the Type-I finite memory approximation. If we replace condition (44) with

*C*is a positive constant (independent on

*T*), then we obtain asymptotic convergence. In other words, for each

*δ*> 0, there exists an integer

*p*such that for all

*n*>

*p*, we have $w0(t)\u2212w0n(t)<\delta $. This result is based on the limit

*p*for which the upper bound on Δ

*t*

_{p}is smaller or equal to zero. In such a case, the Type I FMA degenerates to the

*H*-model, for which Corollary 3.4.2 holds.

#### 3. Type-II finite memory approximation

The Type-II finite memory approximation is obtained by solving the system (31) with $wnen(t)$ given in (35). We first derive an upper bound for $\Vert w0(t)\u2212w0n(t)\Vert $ and then discuss sufficient conditions for convergence.

**Accuracy of the Type-II FMA**).

*Let*$etL$

*and*$etLQ$

*be strongly continuous semigroups with upper bounds*$\Vert etL\Vert \u2264Met\omega $

*and*$\Vert etLQ\Vert \u2264MQet\omega Q$

*. If*

*then for*1 ≤

*p*≤

*n*,

*where*

*and*$C1=MMQ\Vert P\Vert 2\Vert (LQ)p+1Lu0\Vert $

*.*

*Proof.*

*t*and

*t*

_{p}, respectively.

*Corollary 3.6.1*

**(Uniform convergence of the Type-II FMA**).

*If α*

_{j}

*in Theorem 3.6 satisfy*

*for*1 ≤

*j*≤

*n*,

*then for any arbitrarily small δ*> 0,

*there exists an ordered sequence*0 <

*t*

_{0}<

*t*

_{1}< ⋯ <

*t*

_{n}≤

*T such that*

*which satisfies*

*when*$\omega =\omega Q$

*and*

*when*$\omega \u2260\omega Q$

*.*

*Proof.*

*t*≤

*T*, we can take

*t*≤

*T*, we can take

*α*

_{p}<

*p*/

*T*, and the upper bound of the time sequence satisfies

*t*

_{1}< ⋯ <

*t*

_{n}such that $\Vert w0(t)\u2212w0p(t)\Vert \u2264\delta $ for all 0 ≤

*t*≤

*T*. And since we have proved that this

*δ*-bound on the error holds for all

*t*

_{n}upper bounded as in the two cases above, there exists such an increasing time sequence 0 <

*t*

_{1}< ⋯ <

*t*

_{n}with

*t*

_{n}

*lower*-bounded by the same quantities. Indeed, because of the coarseness of the approximations applied in the proof, there may exist such a time sequence with significantly larger

*t*

_{i}.

*Remark.*

*ϵ*is some arbitrary constant satisfying

*ϵ*> 1, then we have

*j*such that the upper bound for

*t*

_{j}is greater than or equal to

*T*. For such a case, the Type II FMA degenerates to the truncation approximation (

*H*-model), for which Corollary 3.4.2 grants us asymptotic convergence.

#### 4. *H*_{t}-model

The *H*_{t}-model is obtained by solving the system (31) with $wnen(t)$ approximated using Chorin’s *t*-model^{12} [see Eq. (36)]. Convergence analysis can be performed by using the mathematical methods we employed for the proofs of the *H*-model. Note that the classical *t*-model is equivalent to a zeroth-order *H*_{t}-model.

**(Accuracy of the**

*H*

_{t}-

**model)**.

*Let*$etL$

*and*$etLQ$

*be strongly continuous semigroups with upper bounds*$\Vert etL\Vert \u2264Met\omega $

*and*$\Vert etLQ\Vert \u2264MQet\omega Q$

*, and let T >*0

*be a fixed integration time. For some fixed n, let*

*Then, for any*1 ≤

*p*≤

*n*and all

*t*∈ [0,

*T*]

*, we have*

*where*

*and C*

_{1}

*, A*

_{1}

*, A*

_{2}

*are the same as before*.

*Proof.*

*p*th order

*H*

_{t}-model, the difference between the memory term $w$

_{0}and its approximation $w0p$ is

*g*

_{p}(

*t*,

*ω*) may be bounded from above as

One can see that the upper bounds $M6p(t)$ and $M3p(t)$ (see Theorem 3.4) share the same structure, the only difference being the constant out front. Hence by changing *C*_{2} to *C*_{4}, we can prove a series of corollaries similar to 3.4.1–3.4.3. In summary, what holds for the *H*-model also holds for the *H*_{t}-model. For the sake of brevity, we omit the statement and proofs of those corollaries.

### E. Linear dynamical systems

The upper bounds we obtained above are not easily computable for general nonlinear systems and infinite-rank projections, e.g., Chorin’s projection (7). However, if the dynamical system is linear, then such upper bounds are explicitly computable and the convergence of the *H*-model can be established for linear phase space functions in any finite integration time *T*. To this end, consider the linear system *ẋ* = *Ax* with a random initial condition *x*(0) sampled from the joint probability density function

In other words, the initial condition for the quantity of interest *u*(*x*) = *x*_{1}(*t*) is set to be deterministic, while all other variables *x*_{2}, *…*, *x*_{N} are zero-mean and statistically independent at *t* = 0. Here we assume for simplicity that *ρ*_{0j} (*j* = 2, …, *N*) are independent identically distributed (i.i.d.) standard normal distributions. Observe that the Liouville operator associated with the linear system *ẋ* = *Ax* is

where *A*_{ij} are the entries of the matrix *A*. If we choose observable *u* = *x*_{1}(*t*), then Chorin’s projection operator (9) yields the evolution equation for the conditional expectation $E[x1(t)|x1(0)]$, i.e., the *conditional mean path* (11), which can be explicitly written as

where $A11=PLx1(0)$ is the first entry of the matrix *A* and $w$_{0} represents the memory integral (26). Next, we explicitly compute the upper bounds for the memory growth and the error in the *H*-model for this system. To this end, we first notice that the domain of the Liouville operator can be restricted to the linear space

In fact, *V* is invariant under $L$, $P$, and $Q$, i.e., $LV\u2286V$, $PV\u2286V$, and $QV\u2286V$. These operators have the following matrix representations:

where *M*_{11} is the minor of the matrix of *A* obtained by removing the first column and the first row, while

Therefore,

At this point, we set *x*_{01} = *x*_{1}(0) and

Since $x\u03030=(x2(0),\u2026,xN(0))$ is random, $q(t,x01,x\u03030)$ is a random variable. By using Jensen’s inequality $[E(X)]2\u2264E[X2]$, we have the following *L*^{∞} estimate:

On the other hand, we have

For linear dynamical systems, both $\Vert \u22c5\Vert L\rho 02(V)$ and $\Vert \u22c5\Vert L\rho 02$ upper bounds can be used to estimate the norm of the semigroup $etL$. However, for the semigroup $etLQ$, we can only obtain the explicit form of the $\Vert \u22c5\Vert L\rho 02(V)$ bound, which is given by the following perturbation theorem^{16} (see also Appendix A):

**Memory growth**. It is straightforward at this point to compute the upper bound of the memory growth we obtained in Theorem 3.1. Since $\Vert P\Vert L\rho 02=\Vert Q\Vert L\rho 02=1$ ($P$ and $Q$ are the orthogonal projections relative to *ρ*_{0}), we have the following result:

where $\Lambda xi+1(0)$ is a *N* − 1 × *N* − 1 diagonal matrix with $\Lambda ii=\u27e8xi+1(0)\u27e9\rho 0$ and ∥·∥_{2} is the vector 2-norm. **Accuracy of the***H***-model**. We are interested in computing the upper bound of the approximation error generated by the *H*-model (see Sec. III D 1–Theorem 3.4). By using the matrix representation of $L$, $P$, and $Q$, the *n*th order *H*-model MZ Eq. (56) for the linear system can be explicitly written as

where *M*_{11}, *a*, and *b* are defined as before [see Eq. (58)]. The upper bound for the memory term approximation error is explicitly obtained as

where *A*_{1}, *A*_{2} are defined in (38), while *ω* and $\omega Q$ are given in (61) and (62), respectively. Note that the error bound (65) is slightly different from the one we obtained in Theorem 3.4. The reason is that here we choose to bound $\Vert L(QL)nu0\Vert $, instead of the quotient $\alpha n=|L(QL)n+1u0\Vert /\Vert L(QL)nu0\Vert $. For each fixed integration time *T*, the upper bound (65) goes to zero as we send *n* to infinity, i.e.,

This means that the *H*-model converges for all linear dynamical systems with observables in the linear space (57).

### F. Memory estimates for finite-rank projections and Hamiltonian systems

The semigroup estimates we obtained in Sec. III A allow us to compute explicitly an *a priori* estimate of the memory kernel in the Mori-Zwanzig equation if we employ *finite-rank* projections. In this section, we outline the procedure to obtain such an estimate for Hamiltonian dynamical systems. We begin by recalling that, in general, Hamiltonian systems are necessarily divergence-free, i.e.,

Here, *F*(*x*) is the velocity field at the right-hand side of (1), while $\rho eq=e\u2212\beta H/Z$ is the canonical Gibbs distribution. Equation (66) can be easily obtained by noticing that

By combining the estimate (18) with the divergence-free constraint (66), we find that the Koopman semigroup of a Hamiltonian dynamical system is a *contraction* in the $L\rho eq2$ norm, i.e.,

Moreover, the MZ equation (6) with a finite-rank projection $P$ of the form (12) (with *σ* = *ρ*_{eq}) can be reduced to the following Volterra integro-differential equation:

where

Equation (68) is often called the generalized Langevin equation (GLE)^{13,31} for the projected quantity of interest *u*_{i}(*t*). To derive (69a)–(69c), we used the fact that $L$ is skew-adjoint and $Q$ is self-adjoint with respect to the $L\rho eq2$ inner product and that $Q2=Q$. In classical statistical physics, Eq. (69c) is known as the *second fluctuation dissipation theorem*. Next, define the temporal-correlation matrix

By applying $\u27e8uj,(\u22c5)\u27e9eq$ to both sides of Eq. (68), we obtain the following exact evolution equation for *C*_{ij}(*t*):

Moreover, if we employ a one-dimensional Mori’s basis, i.e., *M* = 1, then we obtain the simplified equation

where $C(t)=\u27e8u1(0),u1(t)\u27e9eq$. The main difficulty in solving the GLE (71) [or (72)] lies in computing the memory kernels *K*_{ij}(*t*). Hereafter we prove that such memory kernels can be uniformly bounded by a computable quantity that *depends only on the initial condition of the system*. For the sake of simplicity, we will focus on the one-dimensional GLE (72).

*The memory kernel K(t) in the one-dimensional GLE (*

*71*

*) is uniformly bounded by*$\Vert u\u03071(0)\Vert L\rho eq22/\Vert u1(0)\Vert L\rho eq22$

*, i.e.,*

*Proof.*

*K*(

*t*) satisfies

*ρ*

_{eq}, we have $\Vert etQLQ\Vert L\rho eq2=\Vert QetQLQ\Vert L\rho eq2\u2264\Vert Q\Vert L\rho eq2\Vert etQLQ\Vert L\rho eq2\u22641$. This yields

Theorem 3.8 provides an *a priori* and easily computable upper bound for the memory kernel defining the dynamics of any quantity of interest *u*_{1} that is initially in the Gibbs canonical ensemble $\rho eq=e\u2212\beta H/Z$. In Sec. IV, we will calculate the upper bound (73) analytically and compare it with the exact memory kernel we obtain in prototype linear and nonlinear Hamiltonian systems.

*Remark.*

*ω*

_{0}< 0. In this case, the memory kernel turns out to be uniformly bounded by an exponentially decaying function since

## IV. NUMERICAL EXAMPLES

In this section, we provide simple numerical examples of the MZ memory approximation methods we discussed throughout the paper. Specifically, we study Hamiltonian systems (linear and nonlinear) with finite-rank projections (Mori’s projection) and non-Hamiltonian systems with infinite-rank projections (Chorin’s projection). In both cases, we demonstrate the accuracy of the *a priori* memory estimation method we developed in Secs. III F and III E. We also compute the solution to the MZ equation for non-Hamiltonian systems with the *t*-model, the *H*-model, and the *H*_{t}-model.

### A. Hamiltonian dynamical systems with finite-rank projections

In this section, we consider dimension reduction in linear and nonlinear Hamiltonian dynamical systems with finite-rank projection. In particular, we consider the Mori projection and study the MZ equation for the temporal auto-correlation function of a scalar quantity of interest.

#### 1. Harmonic chains of oscillators

Consider a one-dimensional chain of harmonic oscillators. This is a simple but illustrative example of a linear Hamiltonian dynamical system which has been widely studied in statistical mechanics, mostly in relation with the microscopic theory of Brownian motion.^{3,18,17} The Hamiltonian of the system can be written as

where *q*_{i} and *p*_{i} are, respectively, the displacement and momentum of the *i*th particle, *m* is the mass of the particles (assumed constant throughout the network), and *k* is the elasticity constant that modulates the intensity of the quadratic interactions. We set fixed boundary conditions at the endpoints of the chain, i.e., *q*_{0}(*t*) = *q*_{N+1}(*t*) = 0 and *p*_{0}(*t*) = *p*_{N+1}(*t*) = 0 (particles are numbered from left to right) and *m* = *k* = 1. The Hamilton’s equations are

which can be written in a matrix-vector form as

where *B* is the adjacency matrix of the chain and *D* is the degree matrix (see Ref. 4). Note that (76) is a linear dynamical system. We are interested in the velocity auto-correlation function of a tagged oscillator, say, the one at location *j* = 1. Such an auto-correlation function is defined as

where the average is with respect to the Gibbs canonical distribution $\rho eq=e\u2212\beta H/Z$. It was shown in Ref. 18 that $Cp1(t)$ can be obtained analytically by employing Lee’s continued fraction method. The result is the well-known *J*_{0} − *J*_{4} solution

where *J*_{i}(*t*) is the *i*th Bessel function of the first kind. On the other hand, the Mori-Zwanzig equation derived by the following Mori’s projection

yields the following GLE for $Cp1(t)$:

Here,

since $\u27e8pi(0),qj(0)\u27e9eq=0$, while *K*(*t*) is the MZ memory kernel. For the *J*_{0} − *J*_{4} solution, it is possible to derive the memory kernel *K*(*t*) analytically. To this end, we simply insert (78) into (80) and apply the Laplace transform

to obtain

where $\u0108(s)=L\u2009[Cp1(t)]$ and $K^(s)=L\u2009[K(t)]$. The inverse Laplace transform of (81) can be computed analytically as

With *K*(*t*) available, we can verify the memory estimated we derived in Theorem 3.8. To this end,

Here we used the exact solution of the velocity auto-correlation function and displacement auto-correlation function of the fixed-end harmonic chain given by (see Ref. 18)

#### 2. Hald system

In this section, we study the Hamiltonian system studied by Chorin *et al.* in Refs. 12 and 9. The Hamiltonian function is defined as

while the corresponding Hamilton’s equations of motion are

We assume that the initial state is distributed according to canonical Gibbs distribution $\rho eq=e\u2212H(p,q)/Z$. The partition function *Z* is given by

where *K*_{0}(*t*) is the modified Bessel function of the second kind. We aim to study the properties of the autocorrelation function of the first component *q*_{1}, which is defined as

Obviously, $Cq1(0)=1$. The evolution equation for $Cq1(t)$ is obtained by using the MZ formulation with the Mori’s projection

This yields the GLE

The streaming term $\Omega q1Cq1(t)$ is again identically zero since

Theorem 3.8 provides the following computable upper bound for the modulus of *K*(*t*):

where *U*(*a*, *b*, *y*) is the confluent hypergeometric function of the second kind. In Fig. 2(a), we plot the correlation function $Cq1(t)$ we obtained by sampling (85) and then averaging over all realizations. The samples of the initial condition are generated from the Gibbs equilibrium distribution by using Markov Chain Monte Carlo (MCMC). In Fig. 2(b), we plot the memory kernel *K*(*t*) we computed numerically based on $Cq1(t)$. To compute such a kernel, we inverted numerically the Laplace transform of (88), i.e.,

where $\u0108(s)=L\u2009[Cq1(t)]$. In practice, we replaced $Cq1(t)$ within the time interval [0, 20] with a high-order polynomial interpolant at Gauss-Chebyshev-Lobatto nodes, computed the Laplace transform of such polynomial analytically, and then computed the inverse Laplace transform (90) numerically with the Talbot algorithm.^{1}

### B. Non-Hamiltonian systems with infinite-rank projections

In this section, we study the accuracy of the *t*-model, the *H*-model, and the *H*_{t} model in predicting scalar quantities of interest in non-Hamiltonian systems. In particular, we consider the MZ formulation with Chorin’s projection operator. For the particular case of linear dynamical systems, we also compute the theoretical upper bounds we obtained in Sec. III E for the memory growth and the error in the *H*-model and compare such bounds with exact results.

#### 1. Linear dynamical systems

We begin by considering a low-dimensional linear dynamical system *ẋ* = *Ax* evolving from a random initial state with density *ρ*_{0}(*x*) to verify the MZ memory estimates we obtained in Sec. III E. For simplicity, we choose *A* to be negative definite

In this case, the origin of the phase space is a stable node and it is easy to estimate $etL\rho 0$. We set *x*_{1}(0) = 1 and *x*_{2}(0), *x*_{3}(0) independent standard normal random variables. In this setting, the semigroup estimates (61) and (62) are explicit,

Therefore, we obtain the following explicit upper bounds for the memory integral and the error of the *H*-model [see Eqs. (63) and (65)]:

Next, we compare these error bounds with numerical results obtained by solving numerically the *H*-model (64). For example, the second-order *H*-model reads

In Fig. 3, we demonstrate the convergence of the *H*-model to the benchmark solution computed by Monte-Carlo simulation as we increase the *H*-model differentiation order. In Fig. 4, we plot the bound on the memory growth [Eq. (92)] and the bound in the memory error [Eq. (93)] together with exact results.

*Remark.*

*H*-model (64) for the 100-dimensional linear dynamical system defined by the matrix (

*N*= 100)

*B*=

*e*

^{C}Λ

*e*

^{−C}and

*H*-model converges as we increase the differentiation order in any finite time interval, in agreement with the theoretical prediction of Sec. III E.

#### 2. Nonlinear dynamical systems

##### a. Lorenz-63 system.

Consider the classical Lorenz-63 model

where *σ* = 10 and *β* = 8/3. The phase space Liouville operator for this ordinary differential equation (ODE) is

We choose the resolved variables to be $x^={x1,x2}$ and aim at formally integrating out $x\u0303=x3$ by using the Mori-Zwanzig formalism. To this end, we set $x3(0)\u223cN(0,1)$ and consider the zeroth-order *H*_{t}-model (*t*-model)

where $x1m(t)=E[x1(t)|x1(0),x2(0)]$ and $x2m(t)=E[x2(t)|x1(0),x2(0)]$ are the *conditional mean paths*. To obtain this system, we introduced the following mean field closure approximation:

Higher-order *H*_{t}-models can be derived based on (98). As is well known, if *r* < 1, the fixed point (0, 0, 0) is a global attractor and exponentially stable. In this case, the *t*-model (zeroth-order *H*_{t}-model) yields an accurate prediction of the conditional mean path for long time (see Fig. 6). On the other hand, if we consider the chaotic regime at *r* = 28, then the *t*-model and its higher-order extension, i.e., the *H*_{t}-model, are accurate only for relatively short time. This is in agreement with our theoretical predictions. In fact, different from linear systems where the hierarchical representation of the memory integral can be proven to be convergent for long time, in nonlinear systems, the memory hierarchy is, in general, provably convergent only in a short time period (Theorem 3.7 and Corollary 3.4.3). This does not mean that the *H*-model or the *H*_{t}-model is not accurate for nonlinear systems. It just means that the accuracy depends on the system, the quantity of interest, and the initial condition.

##### b. Modified Lorenz-96 system.

As an example of a high-dimensional nonlinear dynamical system, we consider the following modified Lorenz-96 system:^{22,26}

where *F* is constant. As is well known, depending on the values of *N* and *F*, this system can exhibit a wide range of behaviors.^{22} Suppose we take the resolved variables to be $x^={x1,x2}$. Correspondingly, the unresolved ones, i.e., those we aim at integrating through the MZ framework, are $x\u0303={x3,\u2026,xN}$, which we set to be independent standard normal random variables. By using the mean field approximation (98), we obtain the following zeroth-order *H*_{t}-model (*t*-model) of the modified Lorenz-96 system (99):

In Fig. 7, we study the accuracy of the *H*_{t}-model in representing the conditional mean path with *F* = 5 and *N* = 100. It is seen that the the *H*_{t}-model converges only for short time (in agreement with the theoretical predictions), and it provides results that are more accurate than the classical *t*-model.

## V. SUMMARY

In this paper, we developed a thorough mathematical analysis to deduce conditions for the accuracy and convergence of different approximations of the memory integral in the Mori-Zwanzig (MZ) equation. In particular, we studied the short memory approximation, the *t*-model, and various hierarchical memory approximation techniques. We also derived useful upper bounds for the MZ memory integral which allowed us to estimate *a priori* its contribution to the dynamics. Such upper bounds are easily computable for systems with finite-rank projections—such as Mori’s projection. Furthermore, if the system is Hamiltonian, then the MZ memory bounds we derived are tight. This can be attributed to the fact that the Koopman operator of a Hamiltonian dynamical system is always a contraction relative to the $L\rho eq2$ norm weighted by the Gibbs canonical distribution *ρ*_{eq}. In the more general case of dissipative systems, however, we do not have any guarantee that the MZ memory bounds we obtained are tight. Indeed, for dissipative systems, the numerical abscissa in Eq. (17) [see also Eqs. (21a) and (21b)] can be positive, yielding an exponential growth of the memory bounds. In the presence of infinite-rank projections—such as Chorin’s projection—we found that it is extremely difficult to obtain explicitly computable upper bounds for the MZ memory or any of its approximations discussed in this paper. The main difficulties are related to the lack of an explicitly computable upper bound for the operator semigroup $etLQ$, which involves the unbounded generator $LQ$ (see the discussion in Sec. III A and Appendix A). To demonstrate the accuracy of the approximation methods and error bounds we developed in this paper, we provided numerical examples involving Hamiltonian dynamical systems (linear and nonlinear) and more general dissipative nonlinear systems evolving from random initial states. The numerical results we obtained are found to be in agreement with our theoretical predictions.

## ACKNOWLEDGMENTS

This work was supported by the Air Force Office of Scientific Research Grant No. FA9550-16-586-1-0092.

### APPENDIX A: SEMIGROUP BOUNDS VIA FUNCTION DECOMPOSITION

In looking for the *numerical abscissa*^{14} (i.e., the logarithmic norm) of $LQ$, we seek to bound

Notice that if $PLQ=0$, then $LQ=QLQ$ so that we have the previously proven bound

In this section, we consider what happens when $PLQ\u22600$. To that end, let us note that $x\u2208D(LQ)$ may be decomposed as

where $\alpha \u2208C$ and $Py$ are orthogonal to $PLQx$. In other words, we define $Py$ as

and define *α* as

Then

Since we assume $PLQ\u22600$, there exists *x* such that $PLQx\u22600$ and then for any *α* such that

we have

so that for $PLQ\u22600$,

Now, fix any $Qx\u2208D(L)\u22600$ and consider the expression

where

Then, for this fixed $Qx$,

Differentiating with respect to *a* and setting equal to zero, we find that the latter expression is extremized when

i.e., when

Since *β*^{2} > 0, $\xi +a\beta 21+a2\beta 2$ is maximized at $\xe2=\u2212\xi +\xi 2+\beta 2\beta 2$. Then

so that

Therefore,

When $PLQ$ is unbounded, which is the typical case when $P$ is an infinite-rank projection, such as most conditional expectations, there is unlikely to be a finite numerical abscissa for $LQ$. In particular, notice that if div_{σ}(*F*) is a bounded function (bounded both above and below), then $\xi (Qx)$ is bounded for all $Qx$ while $\beta (Qx)$ is unbounded, in which case

It follows Refs. 37 and 30 that in these cases, $\Vert etLQ\Vert \sigma $ has an infinite slope at *t* = 0, and therefore there is no finite *ω* such that $\Vert etLQ\Vert \sigma \u2264e\omega t$ for all *t* ≥ 0 (see Ref. 14). Assuming still that $LQ$ generates a strongly continuous semigroup, we must look to bound the semigroup as $\Vert etLQ\Vert \sigma \u2264Me\omega t$, where

and

On the other hand, if $PLQ$ is a bounded operator, for example, when $P$ is a finite-rank projection [e.g., Mori’s projection (12)], there exists a finite value for the numerical abscissa. Indeed, in this case, since *ζ* is bounded by *ω* = −inf div_{σ}(*F*), the numerical abscissa of $LQ$ may be bounded as

Alternatively, in the case of finite rank $P$, the operator $LQ$ may be thought of as a bounded perturbation of $L$, i.e., $LQ=L\u2212LQ$, and the numerical abscissa of $LQ$ can be bounded using the bounded perturbation theorem Ref. 16, III.1.3, obtaining

Either of these bounds for $\omega LQ$ can be used to bound the semigroup norm

Which of these two estimates gives the tighter bound will generally depend on the values of $\Vert PLQ\Vert \sigma $ and $\Vert LP\Vert \sigma $. It may be noted, however, that, when *σ* is invariant, *ω* = 0 and $L$ is skew-adjoint so that

### APPENDIX B: THE MORI-ZWANZIG FORMULATION IN PDF SPACE

It was shown in Ref. 15 that the Banach dual of (4) defines an evolution in the probability density function space. Specifically, the joint probability density function of the state vector *u*(*t*) that solves Eq. (1) is pushed forward by the Frobenius-Perron operator $F(t,0)$ [Banach dual of the Koopman operator (3)],

where

By introducing a projection $P$ in the space of probability density functions and its complement $Q=I\u2212P$, it is easy to show that the projected density $Pp$ satisfies the MZ equation^{42}

In Appendixes B 1 and B 2, we perform an analysis of different types of approximations of the MZ memory integral

The main objective of such analysis is to establish rigorous error bounds for widely used approximation methods and also propose new provably convergent approximation schemes.

#### 1. Analysis of the memory integral

In this section, we develop the analysis of the memory integral arising in the PDF formulation of the MZ equation. The starting point is the definition (B4). As before, we begin with the following estimate of upper bound estimation of the integral:

**(Memory growth)**.

*Let*$etMQ$

*and*$etM$

*be strongly continuous semigroups with upper bounds*$\Vert etM\Vert \u2264Met\omega $

*and*$\Vert etMQ\Vert \u2264MQet\omega Q$

*, and let T >*0

*be a fixed integration time. Then for any*0

*≤ t ≤ T*,

*we have*

*where*

*and*$C4=max0\u2264s\u2264T\Vert P\Vert \Vert MPMp(s)\Vert $

*. Moreover, N(t) satisfies*$limt\u21920$

*N(t) =*0

*.*

*Proof.*

**(Memory approximation via the**

*t*

**-model)**.

*Let*$etMQ$

*and*$etM$

*be strongly continuous semigroups with bounds*$\Vert etM\Vert \u2264Met\omega $

*and*$\Vert etMQ\Vert \u2264MQet\omega Q$

*, and let T >*0

*be a fixed integration time. If the function*$k(s,t)=PMe(t\u2212s)QMQMPp(s)$

*(integrand of the memory term) is at least twice differentiable respect to s for all t ≥*0

*, then*

*where N*

_{1}(

*t*)

*is defined as*

*and C*

_{4}

*is as in Theorem B.1.*

*Proof.*

#### 2. Hierarchical memory approximation in PDF space

The hierarchical memory approximation methods we discussed in Sec. III D can also be developed in the PDF space. To this end, let us first define

By repeatedly differentiating $v$_{0}(*t*) with respect to time [assuming $v$_{0}(*t*) smooth enough], we obtain the hierarchy of equations

where

By following closely the discussion in Sec. III D, we introduce the hierarchy of memory equations

and approximate the last term in such hierarchy in different ways. Specifically, we consider

Hereafter we establish the accuracy of the approximation schemes resulting from the substitution of each $vnen(t)$ above into (B6).

**(Accuracy of the**

*H*

**-model)**.

*Let*$etM$

*and*$etMQ$

*be strongly continuous semigroups, T >*0

*a fixed integration time, and*

*Then for*1 ≤

*q*≤

*n*,

*we have*

*where*

*, and C*

_{4}

*is as in Theorem B.1.*

*Proof.*

*n*th level can be bounded as

*Corollary B.3.1*

**(Uniform convergence of the**

*H*

**-model)**.

*If β*

_{i}

*in Theorem B.3 satisfy*

*for any fixed time T, then there exits a sequence δ*

_{1}>

*δ*

_{2}> ⋯ >

*δ*

_{n}

*such that*

*where*1 ≤

*q*≤

*n*.

*Corollary B.3.2*

**(Asymptotic convergence of the**

*H*

**-model)**.

*If β*

_{i}

*in Theorem B.3 satisfy*

*for some constant C, then for any fixed time T and arbitrary δ >*0

*, there exits an integer q such that for all n > q,*

**(Accuracy of Type-I FMA)**.

*Let*$etM$

*and*$etMQ$

*be strongly continuous semigroups, T >*0

*be a fixed integration time, and let*

*Then for*1 ≤

*q*≤

*n*,

*where*

*and C*

_{4}

*is as in Theorem B.1*.

*Proof.*

*Corollary B.4.1*

**(Uniform convergence of Type-I FMA)**.

*If β*

_{i}

*in Theorem B.4 satisfy*

*then for any δ*> 0,

*there exists an ordered time sequence*Δ

*t*

_{n}< Δ

*t*

_{n−1}< ⋯ < Δ

*t*

_{1}<

*T such that*

*which satisfies*

**(Accuracy of Type-II FMA)**.

*Let*$etM$

*and*$etMQ$

*be strongly continuous semigroups and T > 0 be a fixed integration time. Set*

*Then for*1 ≤

*q*≤

*n*,

*where*

*is defined in (*

*48*

*), and C*

_{4}

*is as in Theorem B.1.*

*Proof.*

*where*$fq(\omega Q,t)$

*is as in*(48).

*Corollary B.5.1*

**(Uniform convergence of Type-II FMA)**.

*If β*

_{i}

*in Theorem B.5 satisfy*

*for all t*∈ [0,

*T*],

*then for arbitrarily small δ*> 0,

*there exists an ordered time sequence*0 <

*t*

_{0}<

*t*

_{1}< …

*t*

_{n}<

*T such that*

*which satisfies*

*Proof.*

*t*≤

*T*, we can take (for $\omega Q>0$)

*t*

_{1}< ⋯ <

*t*

_{n}such that $\Vert v0(T)\u2212v0n(T)\Vert \u2264\delta $. As in Theorem 3.6, this

*δ*-bound on the error holds for all

*t*

_{n}(with the upper bound as above), which implies the existence of such an increasing time sequence 0 <

*t*

_{1}< ⋯ <

*t*

_{n}with

*t*

_{n}bounded from below by the same quantities.