This article establishes cutoff stability also known as abrupt thermalization for generic multidimensional Hurwitz stable Ornstein–Uhlenbeck systems with (possibly degenerate) Lévy noise at fixed noise intensity. The results are based on several ergodicity quantitative lower and upper bounds some of which make use of the recently established shift linearity property of the Wasserstein–Kantorovich–Rubinstein distance by the authors. It covers such irregular systems like Jacobi chains and more general networks of coupled harmonic oscillators with a heat bath (including Lévy excitations) at constant temperature on the outer edges and the so-called Brownian gyrator.

The Wasserstein–Kantorovich–Rubinstein (WKR) metric is a statistically robust and computationally flexible metric between different probability laws. Certain replica techniques allow to establish new upper and lower bounds for the thermalization for Ornstein–Uhlenbeck systems driven by Brownian motion or other Lévy drivers. We show that, in the case of the 1D linear oscillator with Brownian forcing and the Brownian gyrator, lengthy explicit calculations allow to establish the property of cutoff stability, also known as abrupt convergence. With the help of the previously established ergodicity bounds, we obtain this property without any additional calculation, other than Hurwitz stability and a genericity assumption of the interaction matrix. As a show case for the complexity of systems which are covered by our theorem, and where explicit calculations are out of question, we study Jacobi chains a more general network of coupled harmonic oscillators with a fixed amplitude Brownian or Lévy-type external heat bath forcing.

## I. INTRODUCTION

Since the days of von Smoluchovski,^{1} Langevin,^{2} and Uhlenbeck and Ornstein^{3} more than a century ago and even earlier,^{4} the Ornstein–Uhlenbeck process and its extensions to higher and infinite dimensions and different noises are still intensely studied objects in statistical physics, neuronal networks, probability, and statistics. Despite their apparent simplicity, and an ever better understanding of them, its (multidimensional) dynamics and ergodicity remains an active field of research, see, for instance, Refs. 5–17 and the numerous references therein. Among several competing concepts to measure the thermalization of the current state of such systems to their respective dynamic equilibria, such as relative entropy, total variation or the Hellinger distance, and others^{18–23}, the WKR distance (see Definition 2.5) stands out: due to its statistical robustness;^{24–28} explicit formulas in the Gaussian case, see, for instance, Refs. 29–31; its deep connections to optimal transport and the Monge–Kantorovich problem; and an extensive calculus which allows for many explicit calculations and sharp bounds, see, for instance, Refs. 24,26, and 32–40.

In this paper, we quantify the ergodicity in the WKR distance for multidimensional Lévy driven Ornstein–Uhlenbeck systems with fixed noise amplitude [see Formula (1.3) and Sec. II]. The novelty of our approach in this paper consists in a particular change of perspective of the classical *cutoff phenomenon* (mathematical terminology) or *abrupt thermalization* (physics terminology) for linear systems with additive noise. Essentially, the complete mathematical and physics literature on the cutoff phenomenon in discrete time and space describes the cutoff phenomenon—roughly speaking—as an asymptotic threshold phenomenon for a family of objects parametrized by an *internal parameter* $\epsilon $ of the system, often representing the (inverse) size of the state space, the dimension of the space, or, for instance, as noise amplitude. Standard references in this highly active field of research include Refs. 41–60 starting with the seminal papers by Diaconis and Aldous on card shuffling.^{61–64} In the physics literature, this concept has received quite some attention recently in the context of quantum Markov chains,^{65} chemical reaction kinetics,^{66} quantum information processing,^{67} statistical mechanics,^{57,68} coagulation-fragmentation equations,^{69,70} dissipative quantum circuits,^{71} open quadratic fermionic systems,^{72} neuronal models,^{73} granular flows,^{74} and chaotic microfluid mixing.^{75}

^{32,76–83}the authors have studied the so-called cutoff phenomenon for abstract Langevin equations with $\epsilon $-small, additive Lévy noise $ dL$ (see Definition 2.8) given by the following stochastic differential equation:

A time scale $ ( t \epsilon ) \epsilon > 0$ induces a

*(simple) cutoff phenomenon*if $ D \epsilon , x(\delta t \epsilon )$ tends to the maximal value $M$ of the distance if $\delta <1$, to $0$ if $\delta >1$.A time scale induces a

*window cutoff phenomenon*if $ lim\u2006inf \epsilon \u2192 0 D \epsilon , x( t \epsilon +r)$ tends to $M$ as $r$ tends to $\u2212\u221e$, and $ lim\u2006sup \epsilon \u2192 0 D \epsilon , x( t \epsilon +r)$ tends to $0$ as $r$ tends to $\u221e$.A time scale $ ( t \epsilon ) \epsilon > 0$ induces a

*profile cutoff phenomenon*with*cutoff profile*$ P x$ if $ P x(r)= lim \epsilon \u2192 0 D \epsilon , x( t \epsilon +r)$ exists for all $r\u2208 R$, and $ P x$ tends to $M$ at $\u2212\u221e$, to $0$ at $\u221e$.

*not*depend on the noise intensity parameter $\epsilon $, that is, for

*fixed*noise amplitude $\sigma $. That is, the object of study of this article is the Ornstein–Uhlenbeck system (1.3), which does not depend on any parameter $\epsilon $ in any sense. More precisely, we consider the dynamics of the unique strong solution $X= ( X t ) t \u2265 0$ of the following stochastic differential equation:

*cutoff stability*. We use the notions of

*simple cutoff stability*,

*window cutoff stability*, and

*profile cutoff stability*for $ D \epsilon , x(t)$ satisfying (1), (2), or (3), respectively, for a time scale $ ( t \epsilon ) \epsilon > 0$.

In this situation, there obviously still appears a parameter $\epsilon >0$ in (1.2), but in contrast to (1.4), where it had the role of an *internal* parameter, it rather plays the role of an *external* yard stick parameter, which controls the asymptotic WKR mixing times. In Ref. 88, the authors established such a type of “nonasymptotic” cutoff phenomenon for a process with fixed multiplicative noise under certain commutativity conditions. In Ref. 89, it was established for an infinite dimensional linear energy shell model with scalar random energy injection. This article closes the gap in the literature and studies this concept in the most natural and useful finite dimensional setting with additive noise.

We stress that the situation of (1.3) is more complicated than the situation of (1.1) since it is not quasideterministic, in the sense of being essentially a deterministic system with $\epsilon $-small, though random perturbation. Instead, in (1.3) appears a full-blown dynamical equilibrium, which might be rather irregular in the sense of not admitting a density. This difficulty is enhanced by the fact that $\u2212A$ is only Hurwitz stable but not diagonalizable in general, which is natural, for instance, in the case of linear oscillators with friction. Therefore, arbitrarily large Jordan blocks with possibly non-real eigenvalues are permitted, which are present in the limiting distribution. It is one of the advantages of the WKR distance, in comparison to the total variation distance, that it does require any particular regularity beyond the existence of certain moments. In particular, it does not exclude degenerate noise injection of the system, such as in the case of the linear oscillator (see Example 4.3 or networks of those Examples 4.4 and 4.5). In particular, the WKR distance avoids the technicalities such as controllability associated with the Kalman conditions and hypoellipticity, typically present for results in the total variation distance and the relative entropy, see Ref. 90 (Chapter 6) and references therein. We consider additive perturbations by multidimensional Lévy noise processes with first moments, which include Brownian motion, deterministic linear functions, compound Poisson processes, and its possibly infinite superposition, such as $\alpha $-stable processes with $1\alpha \u22642$, among others. By a standard enhancement of the state space, we also cover the situation of Ornstein–Uhlenbeck noise perturbations with each of the preceding types of noise.

The article is organized into three main parts: First, we provide in Theorem 2.15 of Subsection II A the state of the art including new general lower and upper bounds of $ W p( X t(x),\mu )$ of order $p>0$. In Subsection II B, we collect particularly useful Gaussian bounds for $ W p$, $p\u22651$, applied in Subsection III A.

Using the results of Sec. II, we study *cutoff stability* for systems of the form (1.3). We start with non-degenerate Gaussian systems (1.3) for which we use the explicit formulas of Subsection II A in order to establish cutoff stability for systems (1.3) for the first time in a simple case. More precisely, for normal drift matrix $A$ and non-degenerate dispersion matrix $\sigma $, we provide new explicit formulas for the $ W 2$ distance in Theorem 2.16, which then imply cutoff stability in the sense of (1.4). In Example 3.4, we continue with the study of the scalar damped harmonic oscillator subject to moderate Brownian forcing, which has a degenerate dispersion matrix $\sigma $ in the product space of position and momentum and which is not covered by the formulas in Theorem 2.6. We establish the presence of cutoff stability (1.4) for this elementary, though degenerate, system by explicit calculations, which illustrate the remarkable level of complexity and the infeasibility, in general, to stick to explicit calculations even for linear 2D Gaussian systems.

In Theorem 3.7 of Subsection III B, we show that the non-asymptotic bounds (Theorem 2.15 in Subsection II A) are good enough to establish cutoff stability (1.4) in considerably greater generality than Theorem 2.16. Theorem 3.7 directly covers Example 3.4, the Brownian gyrator in Example 4.2, a biophysical transcription–translation linear oscillator model in Example 4.3, and the benchmark system of a Jacobi chain of oscillators with a heat bath of constant noise intensity on the outer edges in Example 4.4. More precisely, in Theorem 3.7, we establish cutoff stability under general $\sigma $ and generic assumptions on $A$, which are substantially weaker than the results in Sec. III A. In particular, they include Hurwitz stable, but non-normal interaction matrices $A$, a possibly degenerate dispersion matrix $\sigma $ and a large class of Lévy drivers, including Brownian motion and $\alpha $-stable Lévy flights for $\alpha >1$. In Example 4.5, we comment on the validity of our results for more general networks topology.

In Appendix A, the reader finds a list of the most relevant properties of the WKR distances.

## II. NON-ASYMPTOTIC ERGODICITY ESTIMATES FOR THE MULTIDIMENSIONAL OU PROCESS

In this section, we show non-asymptotic ergodicity bounds for solutions of the system (1.3) under the following hypotheses.

### (Positivity)

**(Positivity)**

The matrix $A\u2208 R m \xd7 m$ is constant and all its eigenvalues have strictly positive real parts.

A matrix such that $\u2212A$ satisfies Hypothesis 2.1 is called Hurwitz stable.

### (Diffusion matrix)

**(Diffusion matrix)**

The matrix $\sigma \u2208 R m \xd7 n$ is constant.

We stress that Hypothesis 2.3 on our model (1.3) states that the diffusion matrix is *fixed* and non-small. In fact, there is no particular parameter dependence whatsoever. For convenience, we formulate the following elementary lemma for Hurwitz stable matrices.

Let $A,B\u2208 R m \xd7 m$ be Hurwitz stable matrices. Then, we have the following $:$

$A$ is invertible and $ A \u2212 1$ is Hurwitz.

If $AB=BA$, then $A+B$ is Hurwitz stable. If $AB\u2260BA$, there are counterexamples.

The proof of Lemma 2.4 is given in Appendix C. For a large literature on the respective matrix theory, we refer to Refs. 91, 92, and 93.

### WKR distance of order p > 0

The main basic properties of the WKR distance are gathered in Lemma 1.1. For more details, see Refs. 26 and 40.

For convenience of notation, we do not distinguish a random variable $X$ and its law $ P X$ as an argument of $ W p$. That is, for random variables $X$, $Y$, and probability measure $\mu $, we write $ W p(X,Y)$ instead of $ W p( P X, P Y)$, $ W p(X,\mu )$ instead of $ W p( P X,\mu )$, etc.

### A. A formula for the WKR-2 distance

Denote by $ N(m,C)$ the $m$-dimensional normal distribution with expectation $m$ and covariance matrix $C$. For a square matrix $C=( C i , j)\u2208 R m \xd7 m$, we denote its trace by $Tr(C):= \u2211 j = 1 m c j , j$. For any matrix $M$ with real coefficients, we denote by $ M \u2217$ its transpose, while for any matrix $M$ with complex coefficients, $ M \u2217$ denotes the Hermitian transpose.

We show an exact formula of the WKR distance of order $2$ between a standard multidimensional OU process (with $\sigma \sigma \u2217= I m$ and $L$ a standard Brownian motion in $ R m$) and its invariant measure $\mu = N(0, \Sigma \u221e)$, see Remark 2.20 (3), which we are not aware of in the literature.

#### ( W 2-ergodicity formula for normal interaction matrices and full Brownian forcing)

**( $ W 2$-ergodicity formula for normal interaction matrices and full Brownian forcing)**

Assume that $\sigma \sigma \u2217= I m$.

- If $A$ is a positive definite symmetric matrix with eigenvalues $0< \lambda 1\u2264 \lambda 2\u2264\cdots \u2264 \lambda m$ and corresponding orthogonal eigenvectors $ v 1,\u2026, v m$, then for any $x\u2208 R m$ and $t\u22650$ it follows that$ W 2 ( X t ( x ) , \mu ) = ( \u2211 j = 1 m e \u2212 2 \lambda j t \u27e8 x \u2212 A \u2212 1 \sigma b , v j \u27e9 2 + \u2211 j = 1 m 1 2 \lambda j e \u2212 4 \lambda j t ( 1 \u2212 e \u2212 2 \lambda j t + 1 ) 2 ) 1 / 2 .$
- If $A$ is a normal matrix $A$, that is, $A A \u2217= A \u2217A$, and $A+ A \u2217$ has the following eigenvalues ordered by $0< \phi 1\u2264 \phi 2\u2264\cdots \u2264 \phi m$ and corresponding (generalized) orthonormal eigenvectors $ v 1,\u2026, v m\u2208 R m$, then for any $x\u2208 R m$ and $t\u22650$ it follows thatwhere $ \phi j=2Re( \lambda j)$, $j=1,\u2026,m$ and $ \lambda j$, are the eigenvalues of $A$ $($ordered in ascending by its real parts $)$.$ W 2 ( X t ( x ) , \mu ) = ( \u2211 j = 1 m e \u2212 \phi j t \u27e8 x \u2212 A \u2212 1 \sigma b , v j \u27e9 2 + \u2211 j = 1 m 1 \phi j e \u2212 2 \phi j t ( 1 \u2212 e \u2212 \phi j t + 1 ) 2 ) 1 / 2 ,$

- The main insight from formulas (2.3) and (2.4) is that the WKR-2 distance (implicitly due to the Pythagorean theorem) naturally reflects the dynamics of the mean and the variance of the Ornstein–Uhlenbeck process. In case of $m=n=1$, we have for the solution ofthat the limiting distribution is $\nu = N ( 0 , 1 2 \lambda )$ and$ d X t(x)=\u2212\lambda X t(x) dt+ d B t, X 0(x)=x$that is, the variance adjusts to the limiting variance $ 1 2 \lambda $ at double the speed than the mean converges to $0$ in the limit.$ E[ X t(x)]= e \u2212 \lambda txandVar( X t(x))= 1 2 \lambda ( 1 \u2212 e \u2212 2 \lambda t ),$
- In the case of Lévy drivers, we observe that a $m$-dimensional pure jump Lévy process $L$ cannot be generically decomposed by a sort of principal axes transform just as multivariate Brownian motion in a vector of independent scalar Lévy processesClearly, such Lévy processes do exist but they only refer to Lévy flights with jumps parallel to the axes, which is a very special subcase of limited interest, see Ref. 87.$L= ( L 1 , \u2026 , L m ).$
- We conjecture the mean vs variance separation of scales of item (2), to be true for all Lévy processes with second moments. Let $L= ( L s ) s \u2265 0$ be a symmetric $\alpha $-stable process with $0<\alpha \u22642$. More precisely, the characteristic function of the marginal at time $t\u22650$, $ L t$, is given by $ E[ e i z L t]= e \u2212 t | z | \alpha $, $z\u2208 R$. By Lemma 17.1 in Ref. 87 for the Ornstein–Uhlenbeck process $ X t= e \u2212 \lambda tx+\sigma e \u2212 \lambda t \u222b 0 t e \lambda s d L s$, it follows that the characteristic function of $ X t$ is given bywhich yields that $ X t= e \u2212 \lambda tx+\sigma ( 1 \u2212 e \u2212 \lambda \alpha t \lambda \alpha ) 1 / \alpha L 1$, where the equality is in distribution sense. Hence, the invariant measure $\mu $ has law $\sigma ( 1 \lambda \alpha ) 1 / \alpha L 1$. Therefore, for $1p\alpha $, it follows that$ R \u220b z \u21a6 E [ e i z X t ] = exp \u2061 ( i e \u2212 \lambda t x z + \u222b 0 t | e \u2212 \lambda s \sigma z | \alpha d s ) = exp \u2061 ( i e \u2212 \lambda t x z + \sigma \alpha 1 \u2212 e \u2212 \lambda \alpha t \lambda \alpha | z | \alpha ) ,$We see that, for $x\u22600$ (or more generally $x\u2260 \lambda \u2212 1\sigma E[ L 1]$, see Remark 2.10), the convergence of the right-hand side to $0$ as $t\u2192\u221e$ is of order $ e \u2212 \lambda t$. However, starting precisely in $x=0$, we obtain due to the Taylor expansion of$ W p ( X t , \mu ) = W p ( e \u2212 \lambda t x + \sigma ( 1 \u2212 e \u2212 \lambda \alpha t \lambda \alpha ) 1 / \alpha L 1 , \sigma ( 1 \lambda \alpha ) 1 / \alpha L 1 ) \u2264 ( E [ | e \u2212 \lambda t x + \sigma ( \lambda \alpha ) 1 / \alpha ( 1 \u2212 ( 1 \u2212 e \u2212 \lambda \alpha t ) 1 / \alpha ) L 1 | p ] ) 1 / p .$the accelerated asymptotic rate $ e \u2212 \lambda \alpha t$ as $t\u2192\u221e$.$1\u2212 ( 1 \u2212 y \alpha ) 1 \alpha = 1 \alpha x \alpha +O ( x 2 \alpha ) x \u2192 0$
In higher dimensions, there are no general known explicit formulas for the WKR-2 distance (or any other WKR- $p$ distance) between non-Gaussian distributions. For one-dimensional formulas, see, for instance, Sec. 3 in Ref. 94 and the references therein. That is, one is sent back to the original optimization over all couplings (or replica). Optimizers, so-called, optimal couplings are unknown, which is why the general case for multidimensional Lévy drivers with second moments seems hard to prove.

With no identities for the optimal coupling at hand, we can only prove suboptimal upper bounds, as given in Theorems 2.15 and 2.16, which cannot distinguish the mean-variance split of item (3). These results, however, hold for general WKR- $p$ distances, $p\u22651$, and are not restricted to order $p=2$. We note that in the non-Gaussian case even these new suboptimal lower and upper bounds are not straightforward. In particular, we stress that lower bounds are typically hard to obtain. While these estimates will not allow for a fine properties such

*profile cutoff stability*[see item (3) in the introduction], but still the weaker property of*simple cutoff stability*and*window cutoff stability*.

^{30}(Proposition 7) yields

^{95}(Chapter 5), we have

In the sequel, we calculate $ \phi j$, $j=1,\u2026,m$. Since $A$ is a normal matrix, we have that $A= U \u2217DU$, where $U U \u2217= U \u2217U= I m$. Recall that $A\u2208 R m \xd7 m$. Then, $ A T= A \u2217= U \u2217 D \u2217U$, where $T$ denotes the transpose. Thus, $A+ A T= U \u2217(D+ D \u2217)U$, yields that the eigenvalues of $A+ A \u2217$ are $2Re( \lambda j)$, $j=1,\u2026,m$, where $ \lambda j$, $j=1,\u2026,m$ are the eigenvalues of $A$.

This completes the proof.

### B. Hypotheses on the non-Brownian Lévy perturbations

#### (Lévy noise)

**(Lévy noise)**

The driving noise $L= ( L t ) t \u2265 0$ is a Lévy process in $ R n$, that is, a stochastic process starting in $0\u2208 R n$ with stationary and independent increments, and right-continuous paths (with finite left limits).

The class of Lévy processes $L$ contains several cases of interest: (1) $n$-dimensional standard Brownian motion, (2) $n$-dimensional symmetric and asymmetric $\alpha $-stable Lévy flights, (3) $n$-dimensional compound Poisson process, and (4) deterministic linear function $t\u21a6\gamma t$, $\gamma \u2208 R n$.

Under (2) and (3), the paths contain jump discontinuities. Furthermore, the existence of right-continuous paths with left limits (for short RCLL or càdlàg from the French “continue à droite, limite à gauche”) is not strictly necessary and it can be always inferred up to zero sets of paths.

When $L$ has at least first moment, we point out that $L$ needs not be centered in general, however, by the Lévy property of stationary and independent increments (see Definition 2.8) it follows that $ L t= L ~ t+bt$ a.s., where $b\u2208 R m$ and $ L ~= ( L ~ t ) t \u2265 0$ is a centered Lévy process. In other words, the mean of (1.3) and its limiting distribution are not necessarily centered at the origin, but in $ A \u2212 1( I m\u2212 e \u2212 A t)\sigma b$ and $ A \u2212 1\sigma b$, respectively. All our results are valid for any $b\u2208 R m$.

We denote by $ |\u22c5 |$ the norm induced by the standard Euclidean inner product $\u27e8\u22c5,\u22c5\u27e9$ in $ R m$. Moreover, we use the standard Frobenius matrix norm $\Vert M \Vert 2= \u2211 i , j M i , j 2$, $M\u2208 R m \xd7 n$. We denote the mathematical expectation over $(\Omega , A, P)$ by $ E$.

The following hypothesis is necessary and sufficient to provide the existence of a limiting measure.

The time one marginal of $L$ satisfies $ E[log\u2061(1+ | L 1 |)]<\u221e$.

Note that Hypothesis 2.11 includes Brownian motion, all $\alpha $-stable Lévy flights, and compound Poisson processes where the jump measure has a finite logarithmic moment. We point out that under Hypotheses 2.1, 2.3, and 2.11 there is a unique stationary probability distribution $\mu $ for the random dynamics (1.3). Moreover, for any initial data $x\u2208 R m$, $ X t(x)$ converges in distribution to $\mu $ as $t\u2192\u221e$, see, for instance, Refs. 13, 96, and 97 for the Gaussian case.

### C. Ergodicity bounds via disintegration for $ W p$, *p* **≥** 1

In order to measure the convergence toward the dynamic equilibrium by $ W p$, $p\u22651$, we assume the following stronger condition than Hypothesis 2.11.

#### (Finite moment)

**(Finite moment)**

There is $p>0$ such that $ E[ | L 1 | p]<\u221e$.

Note that Hypothesis 2.12 yields $ E[ | L t | p]<\u221e$ and $ E[ | X t(x) | p]<\u221e$ for any $t\u22650$ and $x\u2208 R m$.

Since the convergence in $ W p$ is equivalent to the convergence in distribution and the simultaneous convergence of the $p$-th absolute moments we have to ensure that the thermalization coming from Hypothesis 2.11 also holds in the stronger WKR sense.

#### (Ergodicity in W p)

**(Ergodicity in $ W p$)**

*2.1*,

*2.3*, and

*2.12*for some $p>0$. Then, there is a unique probability measure $\mu $ in $ R m$ such that for all $x\u2208 R m$

This result is shown in Ref. 98 (Proposition 2.2). By Ref. 98 (Proposition 2.2), Hypotheses 2.1, 2.3, and 2.12 imply the existence of a unique equilibrium distribution $\mu $, and its statistical characteristics such as $p$-th moments are given there.

We now formulate the first main result on the ergodicity bounds for the marginal of $X$ at time $t$.

#### (Quantitative ergodicity bounds for Lévy driven Ornstein–Uhlenbeck systems)

**(Quantitative ergodicity bounds for Lévy driven Ornstein–Uhlenbeck systems)**

Assume Hypotheses *2.1*, *2.3*, and *2.12* for some $p>0$. Then, we have for all $t\u22650$, $x\u2208 R m$ the following bounds:

- Upper bounds:and, in particular,$ W p( X t(x),\mu )\u2264 { | e \u2212 A t x | + W p ( X t ( 0 ) , \mu ) , \u222b R m | e \u2212 A t ( x \u2212 y ) | min { 1 , p} \mu ( d y ) ,$$ W p( X t(0),\mu )\u2264 \u222b R m | e \u2212 A ty | min { 1 , p}\mu ( dy).$
- Lower bounds:where for the identity matrix $ I m\u2208 R m \xd7 m$ we have$ W p ( X t ( x ) , \mu ) \u2265 { | e \u2212 A t x | \u2212 W p ( X t ( 0 ) , \mu ) if p \u2265 1 , | e \u2212 A t x + E [ X t ( 0 ) ] \u2212 \u222b R m z \mu ( d z ) | if p \u2265 1 , | e \u2212 A t x | p \u2212 2 E [ | X t ( 0 ) | p ] \u2212 W p ( X t ( 0 ) , \mu ) if p \u2208 ( 0 , 1 ) , 0 if p 0 ,$$ E[ X t(0)]= e \u2212 A t \u222b 0 t e A s\sigma E[ L 1] ds= A \u2212 1 ( I m \u2212 e \u2212 A t )\sigma E[ L 1].$

The proof is given in Appendix B. It heavily draws on the properties of the WKR distance gathered in Lemma 1.1 of Appendix A. By Jensen’s inequality, we have $ E[ | X t(0) | p]\u2264 E [ | X t ( 0 ) | ] p$ for $p\u2208(0,1]$. An upper bound of $ E[ | X t(0) |]$ is given in Ref. 96 (pp. 1000–1001).

### D. Ergodicity bounds via Gaussian estimates for $ W p$, *p* **≥** 2

It is remarkable that, under many circumstances, that is, for $p\u22652$, meaningful Gaussian estimates can be given for WKR distances of order $p\u22652$ between general non-Gaussian Lévy-OU processes and their equilibrium, in the following sense.

#### (Gaussian ergodicity bounds for non-Brownian, Lévy Ornstein–Uhlenbeck systems)

**(Gaussian ergodicity bounds for non-Brownian, Lévy Ornstein–Uhlenbeck systems)**

*2.1*,

*2.3*, and

*2.12*be satisfied for some $p\u22652$. Then, for all $t\u22650$ and $x\u2208 R m$, it follows

#### Proof of Theorem 2.16

*Proof of Theorem 2.16*

^{30}(Proposition 7) yields

- By the Pythagorean theorem given in Ref. 30 (Proposition 7), it is clear (consider $x=0$) thatand hence for all $t\u22650$ and $x\u2208 R m$ it follows the smaller lower bound $ | e \u2212 A tx |\u2264 W p( X t(x),\mu )$. Since the preceding trace terms are hard to calculate, we give upper bounds for $p=2$, which are easier to obtain, and which turn out to be sharp whenever $A$ is a normal matrix (see Remark 2.20).$Tr ( \Sigma t + \Sigma \u221e \u2212 2 ( \Sigma t 1 / 2 \Sigma \u221e \Sigma t 1 / 2 ) 1 / 2 )\u22650,$
- Note that for a pure jump Lévy process $L$ with finite second moment (see Refs. 84 and 87) and $p=2$ we have by Itô’s isometrywhere $\nu $ is the Lévy jump measure associated with $L$, see Refs. 84 and 87.$ E [ | \u222b t \u221e e \u2212 A r \sigma d L r | 2 ]= \u222b t \u221e \u222b | z | < 1 | e \u2212 A r\sigma z | 2\nu ( dz) dr,$

*2.16*be satisfied for $p=2$. If $L=B=( B 1,\u2026, B m)$ is a standard Brownian motion in $ R m$, we have

The quadratic variation estimate in Corollary 2.18 can be generalized to the Lévy case.

^{86}In addition, there exists a positive constant $ K p$ such that

We stress that, in general, the trace in (2.14) is hard to compute.

- We also point out that the commutativity of $ \Sigma t$ and $ \Sigma \u221e$ is hard to verify due to (2.16). Inspecting the expressioneven for $\sigma = I m$ one can see that the commutativity of $ \Sigma t$ and $ \Sigma \u221e$ is equivalent to the normality of $A$, that is, $ A \u2217A=A A \u2217$. In this case, we have$ \Sigma t \Sigma \u221e= \u222b 0 t \u222b 0 \u221e e \u2212 A s\sigma \sigma \u2217 e \u2212 A \u2217 s e \u2212 A r\sigma \sigma \u2217 e \u2212 A \u2217 r ds dr$$ \Sigma t \Sigma \u221e = \u222b 0 t \u222b 0 \u221e e \u2212 ( A + A \u2217 ) r e \u2212 ( A + A \u2217 ) s d s d r = \u222b 0 t \u222b 0 \u221e e \u2212 ( A + A \u2217 ) r e \u2212 ( A + A \u2217 ) s d s d r = \u222b 0 t e \u2212 ( A + A \u2217 ) r d r ( A + A \u2217 ) \u2212 1 = \u2212 ( A + A \u2217 ) \u2212 1 ( e \u2212 ( A + A \u2217 ) t \u2212 I m ) ( A + A \u2217 ) \u2212 1 = \u222b 0 t e \u2212 ( A + A \u2217 ) r d r ( A + A \u2217 ) \u2212 1 = ( A + A \u2217 ) \u2212 2 ( I m \u2212 e \u2212 ( A + A \u2217 ) t ) .$
- If $L=B=( B 1,\u2026, B n)$ is a standard Brownian motion in $ R n$, it follows that$ d d t \Sigma t=\u2212A \Sigma t\u2212 \Sigma t A \u2217+\sigma \sigma \u2217.$
- Assume that $\u2212A$ is Hurwitz stable. Then, we have $ m t x\u21920$ as $t\u2192\u221e$. Moreover, $ \Sigma t\u2192 \Sigma \u221e$, where $ \Sigma \u221e$ is the unique solution of the matrix Lyapunov equationIt has unique solution when $\sigma \sigma \u2217$ is positive definite. Note that the precise formula (2.13) may be hard to compute explicitly, we refer to Refs. 93 (Theorem 1, p. 443) and 99.$(\u2212A) \Sigma \u221e+ \Sigma \u221e ( \u2212 A ) \u2217+\sigma \sigma \u2217=0.$

## III. CUTOFF STABILITY FOR HURWITZ-STABLE OU SYSTEMS

The main motivation is to first establish the phenomenon with the help of explicit formulas for the Gaussian OU. In the sequel, we then use the ergodicity bounds established in Sec. II to establish the cutoff stability for generic situations of Lévy-OU processes.

### A. Cutoff stability of OU systems with normal drift and Brownian forcing

We apply Theorem 2.6 to establish cutoff stability for this process.

#### (Cutoff stability for W 2 for non-degenerate Gaussian forcing)

**(Cutoff stability for $ W 2$ for non-degenerate Gaussian forcing)**

Assume the hypotheses of Theorem *2.6* and fix some $x\u2208 R m$.

- If $x\u22600$ with $\u27e8x, v 1\u27e9\u22600$, then we have the following cutoff stability for $ t \epsilon := 1 R e ( \lambda 1 ) |ln\u2061(\epsilon ) |$$ lim \epsilon \u2192 0 W 2 ( X \delta \u22c5 t \epsilon ( x ) , \mu ) \epsilon = { \u221e for \delta \u2208 ( 0 , 1 ) , 0 for \delta 1.$
- If $x\u22600$ with $\u27e8x, v 1\u27e9=0$ andthen we have the cutoff stability (3.1) for $ t \epsilon := 1 \rho |ln\u2061(\epsilon ) |$.$\rho :=min{ R e( \lambda j):j\u2208{1,\u2026,m},\u27e8x, v j\u27e9\u22600}<2 R e( \lambda 1),$
If $x=0\u2208 R m$, we have the cutoff stability (3.1) for $ t \epsilon := 1 2 R e ( \lambda 1 ) |ln\u2061(\epsilon ) |$.

The proof of Corollary 3.1 is straightforward with the help of the formulas obtained in Theorem 2.6. In fact, Corollary 3.1 can be further sharpened as follows.

#### (Window cutoff stability)

**(Window cutoff stability)**

*2.6*and fix some $x\u2208 R m$. Then, we have

which is an infinite-dimensional problem. Instead, we only need the spectrum of the matrix $A$.

As mentioned in Remark 2.17, the case of degenerate noise is hard to treat explicitly; in particular, the formulas obtained in Theorem 2.6 are not valid. However, we present the very special case of a damped 1D harmonic oscillator perturbed by a (non-small) Brownian motion, where this applies but where explicit calculations can still be carried out. Nevertheless, it is only in Sec. III B that we can establish cutoff stability, for instance, for the $m$-dimensional damped harmonic oscillator perturbed by a $m$-dimensional Lévy process, including a $m$-dimensional Brownian motion.

#### (Cutoff stability of a harmonic oscillator driven by Brownian motion)

**(Cutoff stability of a harmonic oscillator driven by Brownian motion)**

As a bottom line, we have verified the asymptotics of Theorem 3.7 of order $ e \u2212 2 \gamma t$ by direct calculation for the degenerate case of the harmonic oscillator with moderate Brownian forcing. Similarly to the case of the small noise regime as treated in Ref. 32 (Section 4.2.4), subcritical damping does not exhibit a true limit in (3.3), as clearly seen by the oscillations in Fig. 1.

### B. Cutoff stability of generic OU systems driven with Lévy forcing

In this subsection, we treat general $\sigma \u2208 R m \xd7 n$, $L$ with values in $ R n$ with finite first moment and $A\u2208 R m \xd7 m$ Hurwitz stable. Additionally, we assume that $A$ has the following generic structure.

#### (Generic interaction force)

**(Generic interaction force)**

We say that $A\u2208 R m \xd7 m$ is generic, if it has $m$ different (possibly complex valued) eigenvalues $ \lambda 1,\u2026, \lambda m$.

*2.1*and be generic in the sense of Definition

*3.5*. Then for each $x\u2208 R m$, $x\u22600$, there exist $\rho =\rho (x)>0$ and $ C i(x)>0$, $i=1,2,$ such that

The proof is given in Appendix D. With this result in mind, we now state the main theorem.

#### (Generic cutoff stability for Lévy Ornstein–Uhlenbeck systems)

**(Generic cutoff stability for Lévy Ornstein–Uhlenbeck systems)**

*2.1*and assume that $A$ is generic in the sense of Definition

*3.5*. We assume that $L$ satisfies Hypothesis

*2.12*for some $p\u22651$. In addition, $\sigma $ satisfies Hypothesis

*2.3*. For $x\u22600$, such that $x\u2260 A \u2212 1\sigma E[ L 1]$ choose $ \rho x>0$ as in (3.4) and set

Theorem 3.7 generalizes Corollary 3.1 for any given initial condition $x$ to the case of a generic matrix $A$ and non-Gaussian Lévy noise with first moments. In addition, it covers degenerate noise. For instance, Example 3.4 is covered without any of the lengthy calculations. In Example 4.4, we show how even more complex systems such as coupled chains of oscillators with moderate external heat bath is included. The proof is given after the subsequent corollary.

Since convergence in the WKR distance of order $p\u22651$ is equivalent to the simultaneous convergence in distribution and the convergence of the absolute moments of order $p\u22651$, see Ref. 40 (Theorem 6.9), we also obtain the respective (pre-)cutoff stability for the $p$-th absolute moments.

#### (Observable pre-cutoff stability)

**(Observable pre-cutoff stability)**

*3.7*. Then, for all $1\u2264q\u2264p$ and $x\u22600$, we have for all $\delta >1$

#### Proof of Theorem 3.7:

*Proof of Theorem 3.7:*

In the sequel, we show Corollary 3.8 for which we use the following lemma, shown in Ref. 29 (p. 972, Lemma B.2).

#### Proof of Corollary 3.8:

*Proof of Corollary 3.8*:

In fact, the result can be further sharpened (without proof), as follows.

#### (Window cutoff stability)

**(Window cutoff stability)**

## IV. EXAMPLES

We stress that in this section the matrices $A$ that appears in the examples below are generic in the sense of Definition 3.5, and the quantitative upper-lower bounds given in Theorem 2.15 are valid and available with less effort than lengthy computations, which we illustrate below for specific models. Moreover, our quantitative upper-lower bounds cover the situation of a multidimensional undecoupled Lévy noise with finite first moment and the $ W p$ for any $p\u22651$. By Theorem 3.7, we obtain cutoff stability at explicitly given time scale $ t \epsilon $.

### (A biophysical transcription–translation model in equilibrium)

**(A biophysical transcription–translation model in equilibrium)**

^{102}(p. 1251, left column, first display) for constant DNA–mRNA transcription rate $ k B>0$ and constant internal transcriptional noise level $q$. The positive constants $ \gamma R$ and $ \gamma p$ represent the rate of degradation of the mRNA and the protein, while $ k p>0$ represents the necessary amount of mRNA needed in order to produce a protein.

### (Cutoff stability of a Jacobi chain under fixed amplitude Lévy forcing with first moments)

**(Cutoff stability of a Jacobi chain under fixed amplitude Lévy forcing with first moments)**

### (More general networks)

**(More general networks)**

For more general network topologies of harmonic oscillators with some of the oscillators connected to heat reservoirs at different temperatures, we refer to the works of Refs. 103–106. While the authors there typically work with non-linear interaction potential, our situation only covers the case of quadratic potentials. In Ref. 105, the authors study crystal type extensions of linear Jacobi chains, which were generalized in Refs. 103,104, and 106.

- The admissible network topologies in Refs. 104 and 103 between heat reservoirs and the spring interaction of the springs are hidden in terms of the controllability of $\u2212A$ and $\sigma $, which is equivalent to the well-known Kalman condition of the existence of some $ m \u2217\u2264m$ such that$ s p a n {\sigma e i,A\sigma e i, A 2\sigma e i,\u2026,\sigma A m \u2217 \u2212 1 e i,i=1,\u2026,m}= R m.$
In Ref. 106, the authors give an explicit construction for sufficient conditions on the controllability in terms of the network topology, which turns the graph of connected springs via a linear sequence of “nicely connected” layers of spring masses. Given a finite set of masses $ G$ and the connections $E\u2282 G\xd7 G$. Consider the set $ B\u2282 G$ connected to the heat reservoirs. Then, $ B$ is

*nicely connected*to a vertex $v\u2208 G\u2216 B$ ( $ B\u21ddv$, for short) if there exists $b\u2208 B$ such that $(b,v)\u2208E$, but $b$ is not connected to any other vertes $ v \u2032\u2208 G\u2216 B$. It is worth noting that, for $ B\u21ddv$, it is necessary that at least one $b\u2208 B$ satisfies the preceding condition, while all other connections of $v$ to $ b \u2032\u2208 B$ might violate it. If we denote by $ T B$ (the first layer of) all vertices $v\u2208 G\u2216 B$ to which $ B$ is “nicely connected” to, and if $ G= \u22c3 n \u2265 n 0 T n B$, where $ T n + 1 B= T( T n B)$, $n\u22650$, then condition C1 in Ref. 106 is satisfied. Under additional conditions C2–C5, that is, non-degeneracy of the (possibly nonlinear) interaction potentials (C2), homogeneity and coercivity of the (possibly nonlinear) interaction potentials (C3), the local injectivity of the interaction forces (C4), and the asymptotic domination of the interaction potentials over the pinning potentials (C5), there is an exponential convergence of the convergence in law. Natural applications for these kinds of systems are, for instance, the micromolecular dynamics of the dendritic spine of a neuronal cell, see Ref. 107 (Chapter 5, Subsection 5.2.9) formula (5.27).- We present a simple network of three completely connected oscillators with one heat reservoir connected to the first mass, see Fig. 3, which does not satisfy (C1) in Ref. 106.FIG. 3.The respective stochastic differential equation satisfieswhere$ d X t(x)=\u2212A X t(x) dt+\sigma dB, X 0(x)=x,$It is clear by definition of “nicely connectedness” that the node $1$ does not control the complete graph. However, the real parts of the spectrum ${ \lambda 1, \lambda \xaf 1, \lambda 2, \lambda \xaf 2, \lambda 3, \lambda \xaf 3}$ are strictly negative,$\u2212A= ( 1 0 0 3 \u2212 1 \u2212 2 0 0 0 \u2212 1 3 \u2212 1 0 0 0 \u2212 2 \u2212 1 3 \u2212 1 0 0 0 0 0 0 \u2212 1 0 0 0 0 0 0 \u2212 1 0 0 0 ).$such that $\u2212A$ is Hurwitz stable and generic in the sense of Definition 3.5. After the lengthy but explicit calculations for the Brownian gyrator and the oscillator in Example 3.4, it is obvious that symbolic calculations could still be carried out, but become increasingly infeasible.$ \lambda 1 \u2248 \u2212 0.250 39 + 2.126 88 i , \lambda 3 \u2248 \u2212 0.041 39 + 1.960 62 i , \lambda 5 \u2248 \u2212 0.208 261 + 0.490 01 i ,$
Note that even if we generalize $B=L$ being a scalar Lévy process, the (suboptimal) ergodicity (upper and lower) bounds of Theorem 2.15 and the Gaussian (upper and lower) bounds in Theorem 2.16 remain valid and yield an exponential convergence toward the invariant measure at a rate which is proportional to $ e \u2212 0.041 39 t$.

In addition, Theorems 3.7 and 3.7 yield (simple) cutoff stability and window cutoff stability in the sense of items (1) and (2) in Sec. I, for generic initial values $x$ along the asymptotic time scale $ t \epsilon :=\u2212ln\u2061(\epsilon )/0.04139$, $\epsilon \u2208(0,1)$. Corollary 3.8 implies precutoff for all existing higher absolute moments of the $X$ along the same time scale $ t \epsilon $.

The preceding result highlights the advantage of the WKR distance, since for our results in Secs. II and III we need not satisfy any of controllability (or irreducibility) properties, in contrast to typical for the total variation or the relative entropy.

## V. CONCLUSION

This article provides upper and lower bounds on the WKR-p distance between the time $t$ marginal of a multidimensional Ornstein–Uhlenbeck process with fixed (non-small) (Brownian or Lévy) noise amplitude and their respective dynamic equilibria, see Theorem 2.15. We also establish a new identity for WKR between Ornstein–Uhlenbeck systems driven by non-degenerate Brownian motion $\sigma \sigma \u2217=I$ with normal (or diagonalizable) interaction matrix, see Theorem 2.6. Such identity shows the following thermalization scenario as time $t$ grows: fast adaptation of the scale at the scale of the limiting distribution followed by a subsequent recentering of the location at a slower pace. This type of behavior is conjectured to be true for more general Lévy driven systems.

These non-asymptotic results are applied for cutoff stability, that is, abrupt thermalization to $\epsilon $ small distances in WKR along a particular $\epsilon $-dependent time scale in Theorems 3.7 and 3.10. In Corollary 3.8, it is shown that the observables in our general setting also converge abruptly to the moments of the limiting distribution.

Applications are the Brownian or Lévy gyrator, a single harmonic oscillator, for instance, in a genetic transcription–translation model, Jacobi chains of linear oscillators with a heat bath in the extremes and more general network topologies. For the single harmonic oscillator and the Brownian gyrator, the WKR-2 distances are calculated explicitly illustrating the limitations of explicit formulas.

## ACKNOWLEDGMENTS

G.B. would like to express his gratitude to University of Helsinki, Department of Mathematics and Statistics, for all the facilities used along the realization of this work. The authors thank Professor Juan Manuel Pedraza, Physics Department at Universidad de los Andes, for helpful discussions, which have led to Examples 4.3 and 4.5. They also thank the anonymous referees for the careful reading and helpful suggestions which have improved the quality of the manuscript.

The research of G.B. has been supported by the Academy of Finland, via an Academy project (Project No. 339228) and the Finnish Centre of Excellence in Randomness and Structures (Project No. 346306). The research of M.A.H. has been supported by the project “Mean deviation frequencies and the cutoff phenomenon” (No. INV-2023-162-2850) of the School of Sciences (Facultad de Ciencias) at Universidad de los Andes.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

All authors have contributed equally to the paper.

### Author Contributions

**Gerardo Barrera:** Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Michael A. Högele:** Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

Data sharing is not applicable to this article as no datasets were generated or analyzed in this study.

### APPENDIX A: PROPERTIES OF THE WKR DISTANCE

Recall the WKR distance $ W p$ of order $p$ given in Definition 2.5.

#### (Properties of the WKR distance)

**(Properties of the WKR distance)**

Let $p>0$, $x,y\u2208 R m$ be deterministic vectors, $c\u2208 R$ and $X,Y$ be random vectors in $ R m$ with finite $p$-th moment. Then, we have

The WKR distance is a

**metric**(or distance), in the sense of being definite, symmetric and satisfying the triangle inequality.**Translation invariance:**$ W p(x+X,y+Y)= W p(x\u2212y+X,Y)$.**Homogeneity:**$ W p(cX,cY)= { | c | W p ( X , Y ) , if p \u2208 [ 1 , \u221e ) , | c | p W p ( X , Y ) , if p \u2208 ( 0 , 1 ) .$**Shift linearity:**For $p\u22651$ it followsFor $p\u2208(0,1)$ equality (A1) is false in general. However, it holds the following inequality:$ W p(x+X,X)= |x |.$$max{ |x | p\u22122 E[ |X | p],0}\u2264 W p(x+X,X)\u2264 |x | p.$**Domination:**For any given coupling $ T$ between $X$ and $Y$, it follows$ W p(X,Y)\u2264 ( \u222b R m \xd7 R m | u \u2212 v | p T ( d u , d v ) ) min { 1 / p , 1}.$**Characterization:**Let $ ( X n ) n \u2208 N$ be a sequence of random vectors with finite $p$-th moments and $X$ a random vector with finite $p$-th moment. Then, the following statements are equivalent $:$$ W p( X n,X)\u21920$ as $n\u2192\u221e$.

$ X n \u27f6 dX$ as $n\u2192\u221e$ and $ E[ | X n | p]\u2192 E[ |X | p]$ as $n\u2192\u221e$.

**Contractivity:**Let $F: R m\u2192 R k$, $k\u2208 N$, be Lipschitz continuous with Lipschitz constant $1$. Then for any $p>0$$ W p(F(X),F(Y))\u2264 W p(X,Y).$

### APPENDIX B: PROOF OF THEOREM 2.15

#### Proof of Theorem 2.15

*Proof of Theorem 2.15*

### APPENDIX C: PROOF OF LEMMA 2.4

^{95}(Chapter 5), we have $ X t= e \u2212 A t e \u2212 B tx$. Then, for any $\u03f5>0$, the submultiplicativity of the norm implies

### APPENDIX D: PROOF OF LEMMA 3.6

## REFERENCES

*An Invitation to Statistics in Wasserstein Space*, Springer Briefs in Probability and Mathematical Statistics (Springer, 2020).

*An Invitation to Optimal Transport, Wasserstein Distances, and Gradient Flows*, EMS Textbook in Mathematics (EMS Press, Berlin, 2021).

*Optimal Transportation. Theory and Applications*, London Mathematical Society Lecture Note Series Vol. 413, edited by Y. Ollivier, H. Pajot, and C. Villani (Cambridge University Press, Cambridge, 2014).

*Numerical Analysis 1997 (Dundee, 1997)*(Addison Wesley Longman, Harlow, 1998), pp. 150–178.

*Markov Chains and Mixing Times*

*Seminar on Probability, XVII*, Lecture Notes in Mathematics Vol. 986 (Springer, Berlin, 1983), pp. 243–297.

*Proceedings of the ASME 2008 Dynamic Systems and Control Conference, Parts A and B, Ann Arbor, Michigan, 20–22 October*(ASME, 2008), pp. 1405–1412.

*Lévy Processes and Stochastic Calculus*

*Stochastic Differential Equations and Applications*

*Stochastic Integration and Differential Equations. A New Approach*, Applications of Mathematics Vol. 21 (Springer-Verlag, Berlin, 1990).

*Lévy Processes and Infinitely Divisible Distributions*

*Stochastic Processes and Applications, Diffusion Processes, the Fokker-Planck and Langevin Equations*

*Operator Theory: Advances and Applications*, edited by R. Nagel and U. Schlotterbeck (Birkhäuser/Springer, Cham, 2017), Vol. 257.

*Positive Definite Matrices*, Princeton Series in Applied Mathematics (Princeton University Press, Princeton, 2007).

*The Theory of Matrices*, 2nd ed., Computer Science and Applied Mathematics (Academic Press, Orlando, 1985).

*Lie Groups, Lie Algebras, and Representations. An Elementary Introduction*, 2nd ed., Springer Graduate Texts in Mathematics Vol. 222 (Springer, 2015).

*Stochastic Analysis and Diffusion Processes*, Oxford Graduate Texts in Mathematics Vol. 24 (Oxford University Press, Oxford, 2014), xii+352, p. MR-3156223.

*et al.*, “

*Physics, Chemistry, and Biology*(Springer, New York, 2013).