We prove the circular law for a class of non-Hermitian random block band matrices with genuinely sublinear bandwidth. Namely, we show that there exists τ ∈ (0, 1) so that if the bandwidth of the matrix X is at least n1−τ and the nonzero entries are iid random variables with mean zero and slightly more than four finite moments, then the limiting empirical eigenvalue distribution of X, when properly normalized, converges in probability to the uniform distribution on the unit disk in the complex plane. The key technical result is a least singular value bound for shifted random band block matrices with genuinely sublinear bandwidth, which improves on a result of Cook [Ann. Probab. 46, 3442 (2018)] in the band matrix setting.
I. INTRODUCTION
Random band matrices play an important role in mathematics and physics. Unlike many classical matrix ensembles, band matrices with small bandwidth are not of mean-field type and involve short-range interactions. As such, band matrices interpolate between classical mean-field models with delocalized eigenvectors (when the bandwidth is large) and models with localized eigenvectors and Poisson eigenvalue statistics (when the bandwidth is small).22 In addition, random band matrices have been studied in the context of nuclear physics, quantum chaos, theoretical ecology, systems of interacting particles, and neuroscience.3–6,29,49,50,83 Many mathematical results have been established for the eigenvalues and eigenvectors of random band matrices, especially Hermitian models; we refer the reader to Refs. 3, 8, 12, 13, 15, 16, 24, 27–29, 36, 38–41, 52–56, 58, 63, 64, 68, 74–76, 78, and 86 and references therein.
In this paper, we focus on non-Hermitian random block band matrices. Before we introduce the model, we define some notation and recall some previous results for non-Hermitian random matrices with independent entries. For an n × n matrix A, we let denote the eigenvalues of A (counted with algebraic multiplicity). μA is the empirical spectral measure of A, defined as
where δz denotes a point mass at z.
The circular law describes the limiting empirical spectral measure for a class of random matrices with independent and identically distributed (iid) entries.
(iid matrix). Let ξ be a complex-valued random variable. An n × n matrix X is called an iid random matrix with atom variable (or atom distribution) ξ if the entries of X are iid copies of ξ.
The circular law asserts that if X is an n × n iid random matrix with atom variable ξ having mean zero and unit variance, then the empirical spectral measure of converges almost surely to the uniform probability measure on the unit disk centered at the origin in the complex plane. This was proved by Tao and Vu in Refs. 80 and 81 and is the culmination of a large number of results by many authors.10,37,42,43,45,47,61,62 We refer the reader to the survey20 for more complete bibliographic and historical details. Local versions of the circular law have also been established.7,25,26,85,87 The eigenvalues of other models of non-Hermitian random matrices have been studied in recent years; see, for instance, Refs. 1, 2, 14, 17–19, 31, 33–35, 44, 46, 57, 65–67, 69, 70, 72, and 84 and references therein.
Another model of non-Hermitian random matrices takes the form X ⊙ A, where the entries of the n × n matrix X are iid random variables with mean zero and unit variance and A is a deterministic matrix. Here, A ⊙ B denotes the Hadamard product of the matrices A and B, with elements given by (A ⊙ B)ij = AijBij. The matrix A provides the variance profile for the model, and this model includes band matrices when A has a band structure. The empirical eigenvalue distribution of such matrices was studied in Ref. 34. For example, the following result from Ref. 34 describes sufficient conditions for the limiting empirical spectral distribution to be given by the circular law.
More generally, the results in Ref. 34 also apply to cases when conditions (1) and (2) are relaxed and the limiting empirical spectral measure is not given by the circular law. However, the results in Ref. 34, unlike the results in this paper, require the number of non-zero entries to be proportional to n2 for the limit to be non-trivial.
A. The model and result
In this paper, we focus on a model where the number of non-zero entries is polynomially smaller than n2. We now introduce the model of random block band matrices we will study.
Note that each row and column of has 3bn many nonzero random variables. Using the notation for the discrete interval, we define
One motivation for the periodic block band matrix introduced above comes from theoretical ecology. Population densities and food webs, for example, can be modeled by a system involving a large random matrix.6,59 The eigenvalues of this random matrix play an important role in the analysis of the stability of the system, and the circular law and elliptic law have previously been exploited for this purpose.6 It has been observed that many of these systems correspond to sparse random matrices with block structures (known as “modules” or “compartments”).6,79 The periodic block band matrix introduced above is one such model with a very specific network structure.
Our main result below establishes the circular law for the periodic block band model defined above when bn is genuinely sublinear. To the best of our knowledge, this is the first result to establish the circular law as the limiting spectral distribution for matrices with genuinely sublinear bandwidth.
(Circular law for random block band matrices). There exists c > 0 such that the following holds. Let ξ be a complex-valued random variable with mean zero, unit variance, and for some ϵ > 0. Assume that is an n × n periodic block band matrix with atom variable ξ and bandwidth bn, where cn ≥ bn ≥ n32/33 log n. Then, the empirical spectral measure of converges in probability as n → ∞ to the uniform probability measure on the unit disk in the complex plane centered at the origin.
We prove Theorem 1.4 by showing that there exist constants c, τ > 0 so that the empirical spectral measure of X converges to the circular law under the assumption that the bandwidth bn satisfies cn ≥ bn ≥ n1−τ log n. In fact, the proof reveals that τ can be taken to be τ ≔ 1/33, as stated in Theorem 1.4, although this particular value can likely be improved by optimizing some of the exponents in the proof.
A few remarks concerning the assumptions of Theorem 1.4 are in order. First, the restriction on the bandwidth bn ≥ n1−τ log(n) with τ = 1/33 is of a technical nature and we believe that this condition can be significantly relaxed. For instance, we give an exponential lower bound on the least singular value of X − zI for in Theorem 2.1 below. If this bound could be improved to say polynomial in n, then we could improve the value of τ to 1/2. It is possible that other methods could also improve this restriction even further. Second, the assumption that the entries have finite 4 + ϵ moments is due to the sublinear bandwidth growth rate. Our calculation requires higher moment assumptions for slower bandwidth growth, as can be seen from the Proof of Theorem 3.1.
A numerical simulation of Theorem 1.4 is presented in Fig. 1.
Numerical simulations for the eigenvalues of when is an n × n period block band matrix with bandwidth bn for various atom distributions. (a) has Gaussian atom variable with n = 10 000 and bn = 100. (b) has Rademacher atom variable with n = 10 000 and bn = 100. (c) has Gaussian atom variable with n = 10 000 and bn = 10. (d) has Rademacher atom variable with n = 10 000 and bn = 10.
Numerical simulations for the eigenvalues of when is an n × n period block band matrix with bandwidth bn for various atom distributions. (a) has Gaussian atom variable with n = 10 000 and bn = 100. (b) has Rademacher atom variable with n = 10 000 and bn = 100. (c) has Gaussian atom variable with n = 10 000 and bn = 10. (d) has Rademacher atom variable with n = 10 000 and bn = 10.
B. Notation and overview
We use asymptotic notation under the assumption that n → ∞. The notations X = O(Y) and Y = Ω(X) denote the estimate |X| ≤ CY for some constant C > 0 and all n ≥ C. We write X = o(Y) if |X| ≤ cnY for some cn that goes to zero as n tends to infinity.
For convenience, we do not always indicate the size of a matrix in our notation. For example, to denote an n × n matrix A, we simply write A instead of An when the size is clear. We use bn to denote the size of each block matrix and cn ≔ 3bn for the number of non-zero entries per row and column. We let and e1, e2, …, en be the standard basis elements of . For a matrix A, aij will be the (i, j)-th entry, ak will be the kth column, A(k) represents the matrix A with its kth column set to zero, and will be the span of the columns of A(k). Furthermore, A* is the complex conjugate transpose of the matrix A, and when A is a square matrix, we let
where I denotes the identity matrix and .
For the spectral information of an n × n matrix A, we designate
to be the eigenvalues of A (counted with algebraic multiplicity) and
to be the empirical measure of the eigenvalues. Here, δz represents a point mass at . Similarly, we denote the singular values of A by
and the empirical measure of the squared-singular values as
Additionally, we use ‖A‖ to mean the standard ℓ2 → ℓ2 operator norm of A.
For a vector ,
Finally, we use the following standard notation from analysis and linear algebra. The set of unit vectors in will be denoted by Sn−1, i.e., , and the disk of radius r by . For any set and ,
|S| denotes the cardinality of the finite set S.
The rest of this paper is devoted to the Proof of Theorem 1.4. The proof proceeds via Girko’s Hermitization procedure (see Ref. 20), which is now a standard technique in the study of non-Hermitian random matrices. Following Ref. 54, we study the empirical eigenvalue distribution of for . In particular, we establish a rate of convergence for the Stieltjes transform to the Stieltjes transform of the limiting measure in Sec. III. The key technical tool in our proof is a lower bound on the least singular value of Xz presented in Sec. II. In Sec. IV, following the method of Bai,10 these two key ingredients are combined and the Proof of Theorem 1.4 is given. The Appendix contains a number of auxiliary results.
II. LEAST SINGULAR VALUE
In this section, we present our key least singular value bound, Theorem 2.1. The crucial feature of our result is that the lower bound on the least singular value is only singly exponentially small in m. While this is most likely suboptimal and, indeed, we conjecture that our bound can be substantially improved, it is still significantly better than previous results in the literature. Notably, the work of Cook32 provides lower bounds on the least singular value for more general structured sparse random matrices; however, specialized to our setting, the lower bound there is doubly exponentially small in m [see Eq. (3.8) in Ref. 32], which only translates to a circular law for bandwidth (at best) Ω(n/log n).
We consider the translated periodic block band model Xz = X − zI, where X is as defined in (4) and is fixed. Recall that m = n/bn. Throughout this section, we will assume that bn ≥ m ≥ m0, where m0 is a sufficiently large constant. Recall that for an n × n matrix A, we let s1(A) ≥ s2(A) ≥⋯ ≥ sn(A) ≥ 0 denote its singular values.
Let us define the event
We begin by showing (Lemma 2.4) that . This will allow us to restrict ourselves to the event for the remainder of this section.
In order to bound the probability of the event , we will need the following two results on the smallest and largest singular values of (shifts of) complex random matrices with iid entries.
The next proposition can be readily deduced from Theorem 5.9 in Ref. 9 along with the standard Chernoff bound.
Applying the above two propositions [along with the triangle inequality for ] and using the union bound, we immediately obtain the following:
For the remainder of this section, we will restrict ourselves to the event . For any , we let
be the division of the coordinates into m vectors . We will use vi to denote the ith coordinate of v. For convenience, we use the convention that the indices wrap around, meaning, for example, that v[m+1] = v[1].
For α, β ∈ (0, 1), let
i.e., Lα,β consists of those unit vectors that have sufficiently many large coordinates. For us, α and β are constants depending on K, which will be specified later. Then, as , we can decompose the least singular value problem into two terms,
A. Reduction to the distance problem
We begin with a lemma due to Rudelson and Vershynin, which converts the first term in (5) into a question about the distance of a random vector to a random subspace.
The distance of xk − zek to can be bounded from below by , where is a unit vector orthogonal to . Our next goal is to obtain some structural information about any vector normal to . For convenience of notation, we will henceforth assume that k = 1; the same arguments are readily seen to hold for other values of k as well. Moreover, since the distribution of Xz is invariant under transposition, we may also assume that x1 − ze1 is the first row of Xz and that is the subspace spanned by all the rows except for the first.
B. Structure of normal vectors and approximately null vectors
Recall that is the subspace generated by all the rows of Xz except for the first row. The next proposition establishes that if v is normal to , then there are sufficiently many v[i] with large enough norm. Our approach to lower bounding the coordinates of v is similar to the methods used in Ref. 23; our proof is also similar in spirit to the Proof of Proposition 2.9 in Ref. 30.
Note that in the above proof, it is not important that v is precisely normal to . Indeed, exactly the same proof allows us to obtain a similar conclusion for approximately null vectors as well.
Our next goal is to show that for α, β sufficiently small depending on K [indeed, the proof shows that we can take α < γ′/(K2 log K) and β < γ′/K, where γ′ > 0 is a constant depending only on the distribution of the random variable ξ], we have
where γ ∈ (0, 1) is a constant depending only on the distribution of the random variable ξ.
For this, we begin with a standard decomposition of the unit sphere due to Rudelson and Vershynin.73
We will also need the following lemma from Ref. 73.
(Lemma 3.4 from Ref. 73). If v ∈ Incompk(a, κ), then there exist constants γ1 and γ2 depending only on a and κ such that there are at least γ1k coordinates with γ3k−1/2 ≥ |vi| ≥ γ2k−1/2. In fact, we can take γ1 = κ2a/2, , and γ3 = κ−1/2.
Now, we are ready to prove (8). Consider a vector v ∈ Sn−1 such that , where 0 ≤ t ≤ 1. Then, on the event , it follows from Proposition 2.7 that for any i ∈ [m],
Moreover, since for every i ∈ [m],
it follows that
Let denote the event , where a, κ, and γ are as in Lemma 2.9. On the event , if t ≤ γ, then
Therefore, we can conclude from Lemma 2.10 that on the event , any vector v ∈ Sn−1 such that will have at least αn coordinates larger than , where α = γ′/(K2 log K), β = γ′/K, and γ′ > 0 is a constant depending only on γ.
Hence, with this choice of α, β, γ′, the probability of the event in (8) is bounded by
where the last inequality follows by Lemma 2.9. This proves (8).
The next lemma is a direct consequence of Lemma 2.10 and Lemmas 2.5 and 2.7 from Ref. 32.
C. Proof of Theorem 2.1
III. CONVERGENCE OF
In this section, we establish a rate of convergence for the Stieltjes transform of the empirical eigenvalue distribution of .
We state and prove the above theorem under more general conditions than those of Theorem 1.4. In particular, we allow random variables with no moments above four, although the quantitative estimate improves with the number of existing moments. Furthermore, we do not make use of the lower bound on cn in Theorem 1.4.
We follow the proof strategy from Ref. 54. This previous work demonstrated the convergence of the Stieltjes transform for band matrices rather than block band matrices, so we necessarily make some adaptations. More significantly, we deduce an explicit rate of convergence, which does not appear in Ref. 54.
Our main object of study will be
Define to be the matrix Xz with the kth column set to zero. We define
We also denote
Additionally, we use the shorthand
as this term appears repeatedly in our initial calculations.
For sz(ζ) = mn,z(ζ) or mz(ζ), let us define
The motivation for this definition is that mz(ζ) is known to be a fixed point of this function when the spectrum obeys the circular law; see Sec. 11.4 in Ref. 9. The Proof of Theorem 3.1 can be divided into several key computations. Since we expect mn,z(ζ) to also converge to the fixed point of f, we first relate mn,z(ζ) − mz(ζ) to f(mn,z(ζ)) − mn,z(ζ).
The strategy of our proof is to control the moments of mn,z(ζ) − f(mn,z(ζ)) and then provide a deterministic bound for .
We begin with the moments of f(mn,z(ζ)) − mn,z(ζ).
To complete the estimates of (19), we need to lower bound and αk [recall that αk is defined in (10)].
Next, we provide a deterministic upper bound on |1 − rn,z(ζ)|.
Theorem 3.1 follows easily from the above calculations.
IV. PROOF OF THEOREM 1.4
A. Spectral norm bound
Before proving Theorem 1.4, we note the following spectral norm bound on X.
(Spectral norm bound). There exists a constant K > 0 such that ‖X‖ ≤ K with probability 1 − o(1).
B. Proof of Theorem 1.4
In order to complete the Proof of Theorem 1.4, we will use the following replacement principle from Ref. 81. Let ‖A‖2 denote the Hilbert–Schmidt norm of the matrix A defined by the formula
(Replacement principle; Theorem 2.1 from Ref. 81). Suppose for each n that G and X are n × n ensembles of random matrices. Assume that the following holds:
- The expressionis bounded in probability (respectively, almost surely).
- For almost all complex numbers z,converges in probability (respectively, almost surely) to zero, and in particular, for fixed z, these determinants are nonzero with probability 1 − o(1) (respectively, almost surely nonzero for all but finitely many n).Then,converges in probability (respectively, almost surely) to zero.
We will apply the replacement principle to the normalized band matrix X, while the other matrix is taken to be , where the entries of the n × n matrix are iid standard Gaussian random variables, i.e., is a Ginibre matrix. As the limiting behavior of μG is known to be almost surely the circular law,81 it will suffice, in order to complete the Proof of Theorem 1.4, to check the two conditions of Theorem 4.2.
Condition (i) from Theorem 4.2 follows by the law of large numbers. Thus, it suffices to verify the second condition. To do so, we introduce the following notation inspired by Chapter 11 of Ref. 9. For , we define the following empirical distributions constructed from the squared-singular values of Xz and Gz:
and
It follows that
By Theorem 2.1 and Proposition 4.1, there exists constant K > 0 (depending on z) such that
with probability 1 − o(1). Here, the largest and smallest singular values of Gz can be controlled by the results in Refs. 80 and 82. We will apply the following lemma.
Returning to (37) and applying the above lemma, we find that
for a constant C > 0, where
for any probability measures μ and ν on . Let νz(·) be the probability measure on [0, ∞) from Theorem 3.1 (or equivalently, the probability measure defined in Sec. 11.4 of Ref. 9). By the triangle inequality, it suffices to show that
and
with probability 1 − o(1). The convergence in (40) follows from Lemma 11.16 from Ref. 9; in fact, the results in Ref. 9 provide a much better error bound, which holds almost surely. Thus, it remains to establish (39), which is a consequence of the following lemma.
ACKNOWLEDGMENTS
The authors thank the anonymous referees for useful feedback and corrections. K. Luh was supported, in part, by the NSF under Grant No. DMS-1702533. S. O’Rourke was supported, in part, by the NSF under Grant Nos. ECCS-1913131 and DMS-1810500.
DATA AVAILABILITY
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
APPENDIX: AUXILIARY TOOLS
Let us define an n × n matrix as , where aij = (A)ij. Then, . In addition, . Therefore, using Lemma A.3 and the fact that , the claim of the corollary follows.□
A simple consequence of the previous concentration inequality is a bound on the moments.
Our final lemma is a technical observation, which is of use in Sec. III.