We prove the circular law for a class of non-Hermitian random block band matrices with genuinely sublinear bandwidth. Namely, we show that there exists τ ∈ (0, 1) so that if the bandwidth of the matrix X is at least n1−τ and the nonzero entries are iid random variables with mean zero and slightly more than four finite moments, then the limiting empirical eigenvalue distribution of X, when properly normalized, converges in probability to the uniform distribution on the unit disk in the complex plane. The key technical result is a least singular value bound for shifted random band block matrices with genuinely sublinear bandwidth, which improves on a result of Cook [Ann. Probab. 46, 3442 (2018)] in the band matrix setting.

Random band matrices play an important role in mathematics and physics. Unlike many classical matrix ensembles, band matrices with small bandwidth are not of mean-field type and involve short-range interactions. As such, band matrices interpolate between classical mean-field models with delocalized eigenvectors (when the bandwidth is large) and models with localized eigenvectors and Poisson eigenvalue statistics (when the bandwidth is small).22 In addition, random band matrices have been studied in the context of nuclear physics, quantum chaos, theoretical ecology, systems of interacting particles, and neuroscience.3–6,29,49,50,83 Many mathematical results have been established for the eigenvalues and eigenvectors of random band matrices, especially Hermitian models; we refer the reader to Refs. 3, 8, 12, 13, 15, 16, 24, 2729, 36, 3841, 5256, 58, 63, 64, 68, 7476, 78, and 86 and references therein.

In this paper, we focus on non-Hermitian random block band matrices. Before we introduce the model, we define some notation and recall some previous results for non-Hermitian random matrices with independent entries. For an n × n matrix A, we let λ1(A),,λn(A)C denote the eigenvalues of A (counted with algebraic multiplicity). μA is the empirical spectral measure of A, defined as

μA1ni=1nδλi(A),

where δz denotes a point mass at z.

The circular law describes the limiting empirical spectral measure for a class of random matrices with independent and identically distributed (iid) entries.

Definition 1.1

(iid matrix). Let ξ be a complex-valued random variable. An n × n matrix X is called an iid random matrix with atom variable (or atom distribution) ξ if the entries of X are iid copies of ξ.

The circular law asserts that if X is an n × n iid random matrix with atom variable ξ having mean zero and unit variance, then the empirical spectral measure of X/n converges almost surely to the uniform probability measure on the unit disk centered at the origin in the complex plane. This was proved by Tao and Vu in Refs. 80 and 81 and is the culmination of a large number of results by many authors.10,37,42,43,45,47,61,62 We refer the reader to the survey20 for more complete bibliographic and historical details. Local versions of the circular law have also been established.7,25,26,85,87 The eigenvalues of other models of non-Hermitian random matrices have been studied in recent years; see, for instance, Refs. 1, 2, 14, 1719, 31, 3335, 44, 46, 57, 6567, 69, 70, 72, and 84 and references therein.

Another model of non-Hermitian random matrices takes the form XA, where the entries of the n × n matrix X are iid random variables with mean zero and unit variance and A is a deterministic matrix. Here, AB denotes the Hadamard product of the matrices A and B, with elements given by (AB)ij = AijBij. The matrix A provides the variance profile for the model, and this model includes band matrices when A has a band structure. The empirical eigenvalue distribution of such matrices was studied in Ref. 34. For example, the following result from Ref. 34 describes sufficient conditions for the limiting empirical spectral distribution to be given by the circular law.

Theorem 1.2
(Theorem 2.4 from Ref. 34). Letξbe a complex-valued random variable with mean zero, unit variance, andE|ξ|4+ϵ<for some ϵ > 0. LetXbe ann × niid matrix with atom variableξ, and letA=(σij(n))be ann × nmatrix with non-negative entries, which satisfy
supn1max1i,jnσij(n)σmax
(1)
for someσmax ∈ (0, ) and
1ni=1nσijn2=1nj=1nσijn2=1
(2)
for all 1 ≤ i, jn. Then, the empirical spectral measure of1nXAconverges in probability asnto the uniform probability measure on the unit disk in the complex plane centered at the origin.

More generally, the results in Ref. 34 also apply to cases when conditions (1) and (2) are relaxed and the limiting empirical spectral measure is not given by the circular law. However, the results in Ref. 34, unlike the results in this paper, require the number of non-zero entries to be proportional to n2 for the limit to be non-trivial.

In this paper, we focus on a model where the number of non-zero entries is polynomially smaller than n2. We now introduce the model of random block band matrices we will study.

Definition 1.3
(Periodic block band matrix). Let bn ≥ 1 be an integer that divides n, and let ξ be a complex-valued random variable. We consider the n × nperiodic block band matrixX̃with atom variable (or atom distribution)ξand bandwidthbn defined to be the tri-diagonal periodic block band matrix X̃ given by
X̃D̃1Ũ2T̃mT̃1D̃2Ũ3T̃2D̃3ŨmŨ1T̃m1D̃m,
(3)
where the entries not displayed are taken to be zero. Here, D̃1,Ũ1,T̃1,,D̃m,Ũm,T̃m are bn × bn independent iid random matrices each having atom variable ξ and mn/bn. For convenience, we use the convention that the indices wrap around, meaning, for example, that Ũ1=Ũm.

Note that each row and column of X̃ has 3bn many nonzero random variables. Using the notation [m]1,,m for the discrete interval, we define

cn3bn,Di1cnD̃i,i[m],Ui1cnŨi,i[m],Ti1cnT̃i,i[m],X1cnX̃.
(4)

One motivation for the periodic block band matrix introduced above comes from theoretical ecology. Population densities and food webs, for example, can be modeled by a system involving a large random matrix.6,59 The eigenvalues of this random matrix play an important role in the analysis of the stability of the system, and the circular law and elliptic law have previously been exploited for this purpose.6 It has been observed that many of these systems correspond to sparse random matrices with block structures (known as “modules” or “compartments”).6,79 The periodic block band matrix introduced above is one such model with a very specific network structure.

Our main result below establishes the circular law for the periodic block band model defined above when bn is genuinely sublinear. To the best of our knowledge, this is the first result to establish the circular law as the limiting spectral distribution for matrices with genuinely sublinear bandwidth.

Theorem 1.4

(Circular law for random block band matrices). There existsc > 0 such that the following holds. Letξbe a complex-valued random variable with mean zero, unit variance, andE|ξ|4+ϵ<for some ϵ > 0. Assume thatX̃is ann × nperiodic block band matrix with atom variableξand bandwidthbn, wherecnbnn32/33 log n. Then, the empirical spectral measure ofXX̃/3bnconverges in probability asnto the uniform probability measure on the unit disk in the complex plane centered at the origin.

We prove Theorem 1.4 by showing that there exist constants c, τ > 0 so that the empirical spectral measure of X converges to the circular law under the assumption that the bandwidth bn satisfies cnbnn1−τ log n. In fact, the proof reveals that τ can be taken to be τ ≔ 1/33, as stated in Theorem 1.4, although this particular value can likely be improved by optimizing some of the exponents in the proof.

A few remarks concerning the assumptions of Theorem 1.4 are in order. First, the restriction on the bandwidth bnn1−τ log(n) with τ = 1/33 is of a technical nature and we believe that this condition can be significantly relaxed. For instance, we give an exponential lower bound on the least singular value of XzI for zC in Theorem 2.1 below. If this bound could be improved to say polynomial in n, then we could improve the value of τ to 1/2. It is possible that other methods could also improve this restriction even further. Second, the assumption that the entries have finite 4 + ϵ moments is due to the sublinear bandwidth growth rate. Our calculation requires higher moment assumptions for slower bandwidth growth, as can be seen from the Proof of Theorem 3.1.

A numerical simulation of Theorem 1.4 is presented in Fig. 1.

FIG. 1.

Numerical simulations for the eigenvalues of XX̃/3bn when X̃ is an n × n period block band matrix with bandwidth bn for various atom distributions. (a) X̃ has Gaussian atom variable with n = 10 000 and bn = 100. (b) X̃ has Rademacher atom variable with n = 10 000 and bn = 100. (c) X̃ has Gaussian atom variable with n = 10 000 and bn = 10. (d) X̃ has Rademacher atom variable with n = 10 000 and bn = 10.

FIG. 1.

Numerical simulations for the eigenvalues of XX̃/3bn when X̃ is an n × n period block band matrix with bandwidth bn for various atom distributions. (a) X̃ has Gaussian atom variable with n = 10 000 and bn = 100. (b) X̃ has Rademacher atom variable with n = 10 000 and bn = 100. (c) X̃ has Gaussian atom variable with n = 10 000 and bn = 10. (d) X̃ has Rademacher atom variable with n = 10 000 and bn = 10.

Close modal

We use asymptotic notation under the assumption that n. The notations X = O(Y) and Y = Ω(X) denote the estimate |X| ≤ CY for some constant C > 0 and all nC. We write X = o(Y) if |X| ≤ cnY for some cn that goes to zero as n tends to infinity.

For convenience, we do not always indicate the size of a matrix in our notation. For example, to denote an n × n matrix A, we simply write A instead of An when the size is clear. We use bn to denote the size of each block matrix and cn ≔ 3bn for the number of non-zero entries per row and column. We let [n]1,2,3,,n and e1, e2, …, en be the standard basis elements of Cn. For a matrix A, aij will be the (i, j)-th entry, ak will be the kth column, A(k) represents the matrix A with its kth column set to zero, and Hk will be the span of the columns of A(k). Furthermore, A* is the complex conjugate transpose of the matrix A, and when A is a square matrix, we let

AzAzI,

where I denotes the identity matrix and zC.

For the spectral information of an n × n matrix A, we designate

λ1(A),λ2(A),,λn(A)C

to be the eigenvalues of A (counted with algebraic multiplicity) and

μA1ni=1nδλi(A)

to be the empirical measure of the eigenvalues. Here, δz represents a point mass at zC. Similarly, we denote the singular values of A by

s1(A)s2(A)sn(A)0

and the empirical measure of the squared-singular values as

νA1ni=1nδsi2(A).

Additionally, we use ‖A‖ to mean the standard 22 operator norm of A.

For a vector vCn,

vk=1n|vk|21/2andvmaxk|vk|.

Finally, we use the following standard notation from analysis and linear algebra. The set of unit vectors in Cn will be denoted by Sn−1, i.e., Sn1vCn:v=1, and the disk of radius r by DrzC:|z|<r. For any set SCn and uCn,

dist(u,S)infvSuv.

|S| denotes the cardinality of the finite set S.

The rest of this paper is devoted to the Proof of Theorem 1.4. The proof proceeds via Girko’s Hermitization procedure (see Ref. 20), which is now a standard technique in the study of non-Hermitian random matrices. Following Ref. 54, we study the empirical eigenvalue distribution of XzXz* for zC. In particular, we establish a rate of convergence for the Stieltjes transform XzXz* to the Stieltjes transform of the limiting measure in Sec. III. The key technical tool in our proof is a lower bound on the least singular value of Xz presented in Sec. II. In Sec. IV, following the method of Bai,10 these two key ingredients are combined and the Proof of Theorem 1.4 is given. The  Appendix contains a number of auxiliary results.

In this section, we present our key least singular value bound, Theorem 2.1. The crucial feature of our result is that the lower bound on the least singular value is only singly exponentially small in m. While this is most likely suboptimal and, indeed, we conjecture that our bound can be substantially improved, it is still significantly better than previous results in the literature. Notably, the work of Cook32 provides lower bounds on the least singular value for more general structured sparse random matrices; however, specialized to our setting, the lower bound there is doubly exponentially small in m [see Eq. (3.8) in Ref. 32], which only translates to a circular law for bandwidth (at best) Ω(n/log n).

We consider the translated periodic block band model Xz = XzI, where X is as defined in (4) and zC is fixed. Recall that m = n/bn. Throughout this section, we will assume that bnmm0, where m0 is a sufficiently large constant. Recall that for an n × n matrix A, we let s1(A) ≥ s2(A) ≥⋯ ≥ sn(A) ≥ 0 denote its singular values.

Theorem 2.1.
Fix ϵ, K′ > 0. Suppose thatX̃is ann × nperiodic block band matrix [as defined in(3)] with atom variableξsatisfyingE[ξ]=0,E[|ξ|2]=1, andE[|ξ|4+ϵ]C, for some absolute constantC > 0. Then, for anyzCsuch that |z| ≤ K,
Psn(Xz)cn25mCξcn,
whereCξis a constant depending only on ϵ,C, andK.

Let us define the event

EK=i[m]:Ui,(Di)z,TiK, and sbn(Ui),sbn(Ti)bn5.

We begin by showing (Lemma 2.4) that P(EKc)=O(1/cn). This will allow us to restrict ourselves to the event EK for the remainder of this section.

In order to bound the probability of the event EKc, we will need the following two results on the smallest and largest singular values of (shifts of) complex random matrices with iid entries.

Proposition 2.2
(Theorem 1.1 from Ref. 51). LetAbe ann × nmatrix whose entries are iid copies of a complex random variableξsatisfyingE[ξ]=0andE[|ξ|2]=1. LetFbe a fixedn × ncomplex matrix whose operator norm is at mostn0.51. Then, for anyɛ ≥ 0,
PsnF+Aεn5/2Cε+Cexpγn1/50
for two constantsC > 0, γ ∈ (0, 1) depending only on the distribution of the random variableξ.

The next proposition can be readily deduced from Theorem 5.9 in Ref. 9 along with the standard Chernoff bound.

Proposition 2.3.
Fix ϵ > 0. LetAbe ann × nmatrix whose entries are iid copies of a complex random variableξsatisfyingE[ξ]=0,E[|ξ|2] = 1 andE[|ξ|4+ϵ]M. Then,
PA>KnKn2,
whereK > 0 is a sufficiently large constant depending only onξ(and hence, also the parameter ϵ > 0).

Applying the above two propositions [along with the triangle inequality for (Di)z] and using the union bound, we immediately obtain the following:

Lemma 2.4.
There exists a constantK > 0 depending only on |z| and the random variableξ(and hence also on the parameter ϵ > 0) such that
P(EKc)Kbn1.

For the remainder of this section, we will restrict ourselves to the event EK. For any vCn, we let

v=v[1]v[2]v[m]

be the division of the coordinates into m vectors v[i]Cbn. We will use vi to denote the ith coordinate of v. For convenience, we use the convention that the indices wrap around, meaning, for example, that v[m+1] = v[1].

For α, β ∈ (0, 1), let

Lα,βvSn1:i[n]:|vi|βbn10mn1/2αn,

i.e., Lα,β consists of those unit vectors that have sufficiently many large coordinates. For us, α and β are constants depending on K, which will be specified later. Then, as sn(Xz)=infvSn1Xzv, we can decompose the least singular value problem into two terms,

PEKsnXztbn10mn1/2PEKinfvLα,βXzvtbn10mn1/2+PEKinfvLα,βcXzvtbn10mn1/2.
(5)

We begin with a lemma due to Rudelson and Vershynin, which converts the first term in (5) into a question about the distance of a random vector to a random subspace.

Lemma 2.5
(Lemma 3.5 from Ref. 73). Letx1ze1, …, xnzenbe the columns ofXz, and letHibe the span of all the columns excepti-th. Then,
PEKinfvLα,βXzvtbn10mn1/21αnk=1nPEKdistxkzek,Hkβ1t.

Proof.
Let
pkPEKdistxkzek,Hkβ1t.
By the linearity of expectation, we have
E|k[n]:EK and dist(xkzek,Hk)β1t|=k=1npk.
Therefore, if we let
Ξ=EK|k[n]:dist(xkzek,Hk)β1t|<αn,
it follows from Markov’s inequality that
P(EKΞc)k=1npkαn.
By definition, any vector vLα,β has at least αn coordinates with absolute value larger than βbn10mm1/2bn1/2. Therefore, on the event Ξ, for any vLα,β, there exists some k ∈ [n] such that |vk|βbn10mm1/2bn1/2 and dist(xkzek,Hk)>β1t. Hence, on the event Ξ, for all vLα,β,
Xzv|vk|dist(xkzek,Hk)tbn10mm1/2bn1/2.
Thus, we see that the probability of the event in the statement of the lemma is at most the probability of EKΞc, which gives the desired conclusion.□

The distance of xkzek to Hk can be bounded from below by |xkzek,n̂|, where n̂ is a unit vector orthogonal to Hk. Our next goal is to obtain some structural information about any vector normal to Hk. For convenience of notation, we will henceforth assume that k = 1; the same arguments are readily seen to hold for other values of k as well. Moreover, since the distribution of Xz is invariant under transposition, we may also assume that x1ze1 is the first row of Xz and that H1 is the subspace spanned by all the rows except for the first.

Recall that H1 is the subspace generated by all the rows of Xz except for the first row. The next proposition establishes that if v is normal to H1, then there are sufficiently many v[i] with large enough norm. Our approach to lower bounding the coordinates of v is similar to the methods used in Ref. 23; our proof is also similar in spirit to the Proof of Proposition 2.9 in Ref. 30.

Proposition 2.6.
On the eventEK, for any vectorvSn−1that is orthogonal toH1and for all sufficiently largen(depending onK), either
v[i]bn10mm1/2 or v[i+1]bn10mm1/2
for alli ∈ [m − 1].

Proof.
By definition, v must satisfy the following collection of equations:
T1v[1]+(D2)zv[2]+U3v[3]=0,Ti1v[i1]+(Di)zv[i]+Ui+1v[i+1]=0,Tm2v[m2]+(Dm1)zv[m1]+Umv[m]=0,Tm1v[m1]+(Dm)zv[m]+U1v[1]=0.
(6)
Moreover, since vSn−1, there exists a smallest index j0 ∈ [m] such that ‖v[j]‖ ≥ m−1/2. If j0 ≥ 3, then the equation [which is a part of (6)]
Tj02v[j02]+(Dj01)zv[j01]+Uj0v[j0]=0
implies that
Tj02v[j02]+(Dj01)zv[j01]=Uj0v[j0].
On the event EK, we have from the triangle inequality that
Tj02v[j02]+Dj01zv[j01]Kv[j02]+v[j01]
and
Uj0v[j0]bn5v[j0]bn5m1/2.
Therefore, for n sufficiently large compared to K, either
v[j02]bn10m1/2 or v[j01]bn10m1/2.
(7)
Now, let j−1 be the smaller of the two indices j0 − 1 and j0 − 2 that satisfies (7). Recall that, for convenience, we are considering indices modulo m. If j−1 ≥ 3, then iterating the argument with j−1 and the equation
Tj12v[j12]+(Dj11)zv[j11]+Uj1v[j1]=0,
we can find that j2j11,j12 such that
v[j2]bn20m1/2.
Continuing in this manner, we will generate a sequence of indices j0, j−1, …, jk, km, such that jk1,2 and such that for all i ∈ [k],
|jiji1|2    and    v[ji]bn10im1/2.
We may apply a similar argument to handle indices larger than j0. Indeed, if j0m − 3, then we have from (6) that
Tj0v[j0]+(Dj0+1)zv[j0+1]+Uj0+2v[j0+2]=0.
Once again, on the event EK,
(Dj0+1)zv[j0+1]+Uj0+2v[j0+2]K(v[j0+1]+v[j0+2])
and
Tj0v[j0]bn5m1/2.
As mentioned before, this implies that either
v[j0+1]bn10m1/2 or v[j0+2]bn10m1/2.
By iterating this process as mentioned above, we obtain a sequence of indices such that j0, j1, …, jk, k′ ≤ m, such that jkm1,m and such that for all i ∈ [k′],
|jiji1|2 and v[ji]bn10im1/2.
This completes the proof.□

Note that in the above proof, it is not important that v is precisely normal to H1. Indeed, exactly the same proof allows us to obtain a similar conclusion for approximately null vectors as well.

Proposition 2.7.
Restricted toEK, for any vectorvSn−1such thatXzvbn10mm1/2and for all sufficiently largen(depending onK), either
v[i]bn10mm1/2 or v[i+1]bn10mm1/2
for alli ∈ [m − 1].

Our next goal is to show that for α, β sufficiently small depending on K [indeed, the proof shows that we can take α < γ′/(K2 log K) and β < γ′/K, where γ′ > 0 is a constant depending only on the distribution of the random variable ξ], we have

PEKinfvLα,βcXzvγbn10mm1/2mexp(γbn),
(8)

where γ ∈ (0, 1) is a constant depending only on the distribution of the random variable ξ.

For this, we begin with a standard decomposition of the unit sphere due to Rudelson and Vershynin.73 

Definition 2.8.
For kN and a, κ ∈ (0, 1), let Sparsek(a) denote the sparse vectors vSk1:|supp(v)|ak. We define compressible vectors by
Compk(a,κ)vSk1:uSparsek(a) such that vuκ
and incompressible vectors by
Incompk(a,κ)Sk1\Compk(a,κ).

Lemma 2.9.
LetMidenote thebn × cnblock matrix given by(Ti1(Di)zUi+1). There exists a constantγ ∈ (0, 1), depending only on the distribution of the random variableξ, such that
PEKinfwCompcn(a,κ)Miwγexp(γbn),
wherea = γ/log Kandκ = γ/K.

Proof.
This is (by now) a standard argument; we include the short proof for the reader’s convenience. We begin with the set Sparsecn(a). For any vector vScn1, there exist positive constants γ, γ′, depending only on the distribution of the entries of Mi such that
P(Mivγ)eγbn
(cf. Lemma 2.4 in Ref. 51). Recall that an ɛ-net of a set U is a subset NU such that for any wU, there exists wN satisfying ‖ww′‖ ≤ ɛ. By a simple volumetric argument, one can construct an ɛ-net N of Sparsecn(a) with
|N|cnacn3εacnexp(acnlog(e/a)+acnlog(3/ε)).
We set ε=γ20K. Then, by a union bound,
PinfvNMivγvNPMivγexpacnloge/a+acnlog3/εγbnexpγ̃bn,
where the last inequality holds for a < γ″/log K (for an absolute constant γ″ > 0). Let vSparsecn(a). Then, by definition, there exists some vN such that ‖vv′‖ ≤ ɛ. Therefore, on the event infvNMiv>γ, we have for any vSparsecn(a) that
MivMivvvMiγγ20K10K=γ2.
We can then conclude that
PinfvSparsecn(a)Mivγ2exp(γ̃bn).
To extend this to compressible vectors, we simply choose κ=γ40K. For any yCompcn(a,κ), there exists vSparsecn(a) such that ‖yv‖ ≤ κ. Thus, if ‖Miv‖ ≥ γ/2, then
MiyMivMivyγ210Kγ40Kγ4.

We will also need the following lemma from Ref. 73.

Lemma 2.10

(Lemma 3.4 from Ref. 73). Ifv ∈ Incompk(a, κ), then there exist constantsγ1andγ2depending only onaandκsuch that there are at leastγ1kcoordinates withγ3k−1/2 ≥ |vi| ≥ γ2k−1/2. In fact, we can takeγ1 = κ2a/2,γ2=κ/2, andγ3 = κ−1/2.

Now, we are ready to prove (8). Consider a vector vSn−1 such that Xzvtbn10mm1/2, where 0 ≤ t ≤ 1. Then, on the event EK, it follows from Proposition 2.7 that for any i ∈ [m],

(v[i1],v[i],v[i+1])bn10mm1/2.

Moreover, since for every i ∈ [m],

Ui1,Diz,Ti+1v[i1],v[i],v[i+1]Tv[i1],v[i],v[i+1]Ttbn10mm1/2v[i1],v[i],v[i+1]T,

it follows that

(Ui1,(Di)z,Ti+1)(v[i1],v[i],v[i+1])T(v[i1],v[i],v[i+1])Tt.

Let E denote the event EKi[m]infwCompcn(a,κ)Miw>γ, where a, κ, and γ are as in Lemma 2.9. On the event E, if tγ, then

(v[i1],v[i],v[i+1])T(v[i1],v[i],v[i+1])TIncompcn(a,κ).

Therefore, we can conclude from Lemma 2.10 that on the event E, any vector vSn−1 such that Xzvγbn10mm1/2 will have at least αn coordinates larger than βbn10mm1/2bn1/2, where α = γ′/(K2 log K), β = γ′/K, and γ′ > 0 is a constant depending only on γ.

Hence, with this choice of α, β, γ′, the probability of the event in (8) is bounded by

P(EKEc)i=1mPEKinfwCompcn(a,κ)Miwγmexp(γbn),

where the last inequality follows by Lemma 2.9. This proves (8).

The next lemma is a direct consequence of Lemma 2.10 and Lemmas 2.5 and 2.7 from Ref. 32.

Lemma 2.11.
Letξ1, …, ξkbe independent copies of a complex random variableξsatisfyingE[|ξ|2]=1. Then, for anyvIncompk(a, κ) and for allɛ ≥ 0,
suprRPi=1kviξirεCκ2aε+1κk,
whereCis a constant depending only onξ.

Proof of Theorem 2.1.
By (5) and (8), it suffices to bound
PEKinfvLα,βXzvtbn10mm1/2
for t=bn11m. By Lemma 2.5,
PEKinfvLα,βXzvtb10mm1/21αmaxk[n]PEKdist(xkzek,Hk)β1t.
We will obtain a uniform (in k) bound on P(EKdist(xkzek,Hk)β1t). For convenience of notation, we show this bound for k = 1. In addition, recall from before that we may assume that x1ze1 is the first row of the matrix and that H1 is the span of all the rows except for the first row.
Let E denote the event that
infwCompcn(a,κ)M1wγ.
Then, by Lemma 2.9, P(EcEK)exp(γbn). Let n̂ denote a unit normal vector to H1, let v(n̂[1],n̂[2],n̂[m]), and let v̂v/v. If v̂Compcn(a,κ), then on the event EEK, we have
|x1ze1,n̂|=|x1ze1,v|=M1v=M1v̂vγvγbn10mm1/2.
On the other hand, if v̂Incompcn(a,κ), then it follows from Lemma 2.11 that
P(|x1ze1,n̂|δ)=P(|x1ze1,v|δ)=P(|x1ze1,v̂|δ/v)Cκ2aδv+1κbnCκ2aδb10mm+1κbn.
Taking δ=β1bn11m and combining with the compressible case, we may conclude that
PEKdistx1ze1,H1β1bn11mCK1bn.
The same argument can be used to conclude that
maxk[n]PEKdistxkzek,Hkβ1bn11mCK1bn,
which completes the proof.□

In this section, we establish a rate of convergence for the Stieltjes transform of the empirical eigenvalue distribution of XzXz*.

Theorem 3.1.
LetX̃be ann × nperiodic block band matrix as defined in Definition 1.3 with atom variableξ. TakeA > 1, and letzCbe a fixed complex number. Assume thatmn,z(ζ)=1ni=1n[λi(XzXz*)ζ]1is the Stieltjes transform for the empirical spectral measure ofXzXz*. Suppose thatξis centered with variance one andω4pE[|ξ|4p]<for some integerp ≥ 1. Then, there exists a non-random probability measureνzon [0, ) such that for anyζζC:A<R(ζ)<A,0<I(ζ)<1,
E|mn,z(ζ)mz(ζ)|2pC(p)Apω4p|I(ζ)|8pncn2p+1cnp/2,
wheremz(ζ)=Rdνz(x)xζandC(p) > 0 is a constant that depends only onp. Moreover,mz(ζ) is the unique solution to the equation
mz(ζ)=|z|21+mz(ζ)(1+mz(ζ))ζ1,
(9)
satisfyingI(ζmz(ζ2))>0andI(mz(ζ))>0whenI(ζ)>0.

Remark 3.2.

We state and prove the above theorem under more general conditions than those of Theorem 1.4. In particular, we allow random variables with no moments above four, although the quantitative estimate improves with the number of existing moments. Furthermore, we do not make use of the lower bound on cn in Theorem 1.4.

We follow the proof strategy from Ref. 54. This previous work demonstrated the convergence of the Stieltjes transform for band matrices rather than block band matrices, so we necessarily make some adaptations. More significantly, we deduce an explicit rate of convergence, which does not appear in Ref. 54.

Our main object of study will be

Pz,ζ(XzXz*)ζ=(XzI)(XzI)*ζI.

Define Xz(k) to be the matrix Xz with the kth column set to zero. We define

Pz,ζkXzkXzk*ζI=XzIxkzekekTXzIxkzekekT*ζI=XzIXzI*ζIxkzekxkzek*=Pz,ζxkzekxkzek*.

We also denote

mn,z(k)(ζ)1ntrPz,ζ(k)1.

Additionally, we use the shorthand

αk1+(xkzek)*Pz,ζ(k)1(xkzek)
(10)

as this term appears repeatedly in our initial calculations.

For sz(ζ) = mn,z(ζ) or mz(ζ), let us define

f(sz)|z|21+sz(ζ)(1+sz(ζ))ζ1.

The motivation for this definition is that mz(ζ) is known to be a fixed point of this function when the spectrum obeys the circular law; see Sec. 11.4 in Ref. 9. The Proof of Theorem 3.1 can be divided into several key computations. Since we expect mn,z(ζ) to also converge to the fixed point of f, we first relate mn,z(ζ) − mz(ζ) to f(mn,z(ζ)) − mn,z(ζ).

Lemma 3.3.
Under the assumptions of Theorem 3.1,
mn,z(ζ)mz(ζ)=1rn,z(ζ)1mn,z(ζ)f(mn,z(ζ)),
(11)
where
rn,z(ζ)=f(mn,z(ζ))f(mz(ζ))|z|2(1+mn,z(ζ))(1+mz(ζ))+ζ.
(12)

Proof.
We have that
mn,z(ζ)mz(ζ)=mn,z(ζ)f(mn,z(ζ))+f(mn,z(ζ))f(mz(ζ)),
(13)
where we have used the fact that f(mz(ζ)) = mz(ζ), which is known to characterize the circular law; see Secs. 11.4 and (11.4.1) in Ref. 9. On the other hand,
f(mn,z(ζ))f(mz(ζ))=f(mn,z(ζ))f(mz(ζ))1f(mz(ζ))1f(mn,z(ζ))=f(mn,z(ζ))f(mz(ζ))|z|2(mz(ζ)mn,z(ζ))(1+mn,z(ζ))(1+f(mz(ζ)))+ζ(mn,z(ζ)mz(ζ))=[mn,z(ζ)mz(ζ)]f(mn,z(ζ))f(mz(ζ))|z|2(1+mn,z(ζ))(1+mz(ζ))+ζ=:rn,z(ζ)[mn,z(ζ)mz(ζ)].
Therefore, by (13),
mn,z(ζ)mz(ζ)=[1rn,z(ζ)]1[mn,z(ζ)f(mn,z(ζ))]
with rn,z(ζ) given in (12).□

The strategy of our proof is to control the moments of mn,z(ζ) − f(mn,z(ζ)) and then provide a deterministic bound for [1rn,z(ζ)]1.

We begin with the moments of f(mn,z(ζ)) − mn,z(ζ).

Lemma 3.4.
Under the assumptions of Theorem 3.1,
E[|f(mn,z(ζ))mn,z(ζ)|2p]C(p)ω4p|I(ζ)|6pncn2p+1cnp/2.
(14)

Proof.
We begin by finding a convenient expression to allow us to compute the moments. By the resolvent identity,88 
f(mn,z(ζ))IPz,ζ1=f(mn,z(ζ))Pz,ζf(mn,z(ζ))1IPz,ζ1=f(mn,z(ζ))(XzI)(XzI)*|z|21+mn,z(ζ)I+ζmn,z(ζ)IPz,ζ1.
(15)
To simplify this expression, we make the following observation. Since Pz,ζ=XzXz*ζI, by Lemma A.1,
I+ζPz,ζ1=XzXz*Pz,ζ1=k=1n(xkzek)(xkzek)*Pz,ζ1=k=1n(xkzek)(xkzek)*Pz,ζ(k)1αk1,
(16)
where αk is defined in (10). Taking the normalized trace of (16) yields
1+ζmn,zζ=1nk=1n1αktrxkzekxkzek*Pz,ζk1=1nk=1n1αkxkzek*Pz,ζk1xkzek=1nk=1nαk1αk=11nk=1n1αk.
From this, we can conclude that
ζmn,z(ζ)=1nk=1n1αk.
(17)
Plugging (17) into (15) gives
f(mn,z(ζ))IPz,ζ1=f(mn,z(ζ))(XzI)(XzI)*|z|21+mn,z(ζ)I1nk=1n1αkIPz,ζ1.
Taking the normalized trace of this equation, we find that
f(mn,z(ζ))mn,z(ζ)=1nf(mn,z(ζ))k=1n(xkzek)*Pz,ζ1(xkzek)|z|21+mn,z(ζ)ekTPz,ζ1ek1αkmn,z(ζ).
(18)
We will take the 2p-th moment of this expression.
Let us introduce the following notation to organize the terms on the right-hand side of (18). Let
βkxk*Pz,ζ(k)1ek,γkekTPz,ζ(k)1xk,δkekTPz,ζ(k)1ek,τkxk*Pz,ζ(k)1xk.
Recall the definition of αk given in (10). Since
αk=1+(xkzek)*Pz,ζ(k)1(xkzek)=1+τkzβkz̄γk+|z|2δk,
again by Lemma A.1, we can write
(xkzek)*Pz,ζ1(xkzek)=αk1(xkzek)*Pz,ζ(k)1(xkzek)=αk1τkzβkz̄γk+|z|2δk.
Expanding similarly,
ekTPz,ζ1ek=ekTPz,ζ(k)1ekαk1ekTPz,ζ(k)1(xkzek)(xkzek)*Pz,ζ(k)1ek=δkαk1(γkzδk)(βkz̄δk)=αk1(1+τkzβkz̄γk+|z|2δk)δk(γkzδk)(βkz̄δk)=αk1(1+τk)δkγkβk.
Therefore, (18) can be more succinctly written as
f(mn,z(ζ))mn,z(ζ)=1nf(mn,z(ζ))k=1n1αk(τkzβkz̄γk+|z|2δk)|z|21+mn,z(ζ)(1+τk)δkγkβkmn,z(ζ)=1nf(mn,z(ζ))k=1n1αk(τkmn,z(ζ))1|z|2δk1+mn,z(ζ)zβkz̄γk+|z|21+mn,z(ζ)βkγk.
(19)
For any z1,,znC and N, by Jensen’s inequality,
1ni=1nzi1n|zi|.
(20)
As we plan to invoke this inequality, it suffices for our purposes to bound the moment of each summand in (19). Using Corollary A.4,
E|βk|2p=Exk*Pz,ζ(k)1ekekTPz,ζ(k*)1xkp2p1cnp/2Ecnxk*Pz,ζ(k)1ekekTPz,ζ(k*)1(cnxk)trPz,ζ(k)1ekekTPz,ζ(k*)1p+2p1cnp/2trPz,ζ(k)1ekekTPz,ζ(k*)1pC(p)ω2p+1cnp/2|I(ζ)|2pC(p)ω4pcnp/2|I(ζ)|2p,
(21)
where C(p) is a constant that only depends on p and may vary from line to line. An identical computation yields
E[|γk|2p]C(p)ω4pcnp/2|I(ζ)|2p.
(22)
By Lemma A.2, we have
mn,z(ζ)1ntrPz,ζ(k)1=1ntrPz,ζ1Pz,ζ(k)11n|I(ζ)|.
Therefore,
Eτkmn,z(ζ)2p22pEτk1ntrPz,ζ(k)12p+22pn2p|I(ζ)|2p24pEτk1cniIkPz,ζ(k)ii12p+24pE1cniIkPz,ζ(k)ii11ntrPz,ζ(k)12p+22pn2p|I(ζ)|2p.
(23)
We recall that τk=xk*[Pz,ζ(k)]1xk, where xk is a band vector already scaled by 1/cn. Hence, from Corollary A.4, we can conclude that
Eτk1cniIkPz,ζ(k)ii12pC(p)ω4pcnp/2|I(ζ)|2p,
where Ik denotes the indices in the support of xk.
To estimate the second term of (23), we use Lemma A.5 to write
E1cniIkPz,ζ(k)ii11ntrPz,ζ(k)12pE1cniIkPz,ζii11ntrPz,ζ12p+22pcn2pI(ζ)2p.
(24)
The first expectation on the right-hand side can be further decomposed as
E1cniIkPz,ζii11ntrPz,ζ12p22pcn2pEiIkPz,ζii1iIkEPz,ζii12p+22pn2pEtrPz,ζ1EtrPz,ζ12p.
(25)
In the above estimate, we have used the fact that we have a periodic block band matrix with iid entries; therefore, E[Pz,ζii1]=E[Pz,ζ111] for all 1 ≤ in, which is the conclusion of Lemma A.8. Now, we estimate the first term of (25) via a simple martingale decomposition.
Let Fk=σxi:1ik be the sigma algebra generated by the first k columns of X. Let us define
h(X)=iIkPz,ζ(k)ii1.
(26)
Then, we have the telescoping sum
h(X)E[h(X)]=k=1nE[h(X)|Fk]E[h(X)|Fk1],
where F0 is the trivial sigma algebra. Using Lemma A.5, we have
E[h(X)|Fk]E[h(X)|Fk1]2/|I(ζ)|.
Now, by Corollary A.7,
E|h(X)E[h(X)]|2pC(p)np|I(ζ)|4p,
where C(p) is a constant that depends only on p.
As mentioned above, using Lemma A.5 and Result A.6, we estimate the second term of (25) by
EEtrPz,ζ(k)1trPz,ζ(k)12pC(p)np|I(ζ)|4p.
Using the above estimates in (23), we obtain
Eτkmn,z(ζ)2pC(p)|I(ζ)|4pncn2p.
(27)

To complete the estimates of (19), we need to lower bound (f(mn,z(ζ)))1 and αk [recall that αk is defined in (10)].

Since I(ζ)>0, it follows that
δ01|λζ|2dμXzXz*(λ)>0.
As a result, for any ζC with I(ζ)>0,
I(ζmn,z(ζ))=0I(ζ)λ|λζ|2dμXzXz*(λ)0,I(mn,z(ζ))=0I(ζ)|λζ|2dμXzXz*(λ)I(ζ)δ>0.
Using the above estimates, we have
|I(f(mn,z(ζ))1)|=I|z|21+mn,z(ζ)(1+mn,z(ζ))ζ=|z|2I(mn,z(ζ)̄)|1+mn,z(ζ)|2I(ζ)I(ζmn,z(ζ))|I(ζ)|.
Therefore,
|f(mn,z(ζ))1||I(f(mn,z(ζ))1)||I(ζ)|.
(28)
Following the similar computation as (A2), we can also conclude that
|αk|δ|I(ζ)|.
(29)
Finally, plugging (29), (28), (21), (22), and (27) into (19) gives the desired bound (14).□

Next, we provide a deterministic upper bound on |1 − rn,z(ζ)|.

Lemma 3.5.
Under the assumptions of Theorem 3.1,
|1rn,z(ζ)||I(ζ)|4A.
(30)

Proof.
Let us denote
An,z(ζ)1+mn,z(ζ),Az(ζ)1+mz(ζ),Bn,z(ζ)|z|2ζAn,z(ζ)2,Bz(ζ)|z|2ζAz(ζ)2,ϵn,z(ζ)mn,z(ζ)f(mn,z(ζ)).
Let mz(ζ) be the solution of the equation mz(ζ) = Az(ζ)/Bz(ζ) satisfying I(ζmz(ζ))>0 when I(ζ)>0, where we have used the negative real axis for the branch cut of the square root function. The existence of such a solution is well-known in the circular law literature (see Sec. 11.4 in Ref. 9).
Observe that as per the above notations, we may write
f(mn,z(ζ))=An,z(ζ)Bn,z(ζ),mn,z(ζ)=An,z(ζ)Bn,z(ζ)+ϵn,z(ζ).
Using the fact that |ab|12(|a|2+|b|2) for a,bC and employing a similar calculation as in Ref. 47, we write
|1rn,z(ζ)|=1|z|2+ζAz(ζ)An,z(ζ)Bz(ζ)Bn,z(ζ)1|z|2+ζAz(ζ)An,z(ζ)Bz(ζ)Bn,z(ζ)1|z|2+|ζAz(ζ)An,z(ζ)||Bz(ζ)Bn,z(ζ)|121|z|2+|ζAz(ζ)|2|Bz(ζ)|2+121|z|2+|ζAn,z(ζ)|2|Bn,z(ζ)|2.
(31)
Now, we estimate lower bounds for each expression of (31). We proceed as follows:
IζAn,zζ=Iζmn,zζ+Iζ=IζAn,zζB̄n,zζ|Bn,zζ|2+Iζϵn,zζ+Iζ=IζAn,zζ|z|2ζAn,zζ2̄|Bn,zζ|2+Iζϵn,zζ+Iζ=IζAn,zζ|z|2|ζAn,zζ|2ζAn,zζ̄|Bn,zζ|2+Iζϵn,zζ+Iζ=IζAn,zζ|z|2+|ζAn,zζ|2|Bn,zζ|2+Iζϵn,zζ+Iζ.
Consequently, we have
1|z|2+|ζAn,zζ|2|Bn,zζ|2=Iζϵn,zζ+IζIζAn,zζ=Iζϵn,zζ+IζIζ+Iζmn,zζ.
(32)
Similarly,
1|z|2+|ζAzζ|2|Bzζ|2=IζIζAzζ=IζIζ+Iζmzζ.
(33)
Recall that we have chosen the solution mz(ζ) such that I(ζmz(ζ)) and I(ζ) have the same sign. Therefore,
0IζIζ+Iζmzζ=1|z|2+|ζAzζ|2|Bzζ|2=1|z|2|Bzζ|2|ζmzζ|2.
As a result,
|ζmz(ζ)|1.
Using the above estimate in (33) and the fact that I(ζmz(ζ)) and I(ζ) have the same sign, we obtain
1|z|2+|ζAzζ|2|Bzζ|2=IζIζ+Iζmzζ=|Iζ||Iζ|+|Iζmzζ||Iζ||Iζ|+1=11+|Iζ|111+3A|Iζ|1|Iζ|4A,
(34)
where the second to last inequality follows from the fact that |I(ζ)|>|I(ζ)|/3A, which is implied by the assumption ζζC:A<R(ζ)<A,0<I(ζ)<1 and A > 1.
Similarly,
1|z|2+|ζAn,z(ζ)|2|Bn,z(ζ)|2|I(ζ)|4A.
(35)
Using the estimates (35) and (34) in (31), we have
|1rn,z(ζ)||I(ζ)|4A.

Theorem 3.1 follows easily from the above calculations.

Proof of Theorem 3.1.
By Lemma 3.3,
E|mn,z(ζ)mz(ζ)|2p=E|1rn,z(ζ)1mn,z(ζ)f(mn,z(ζ))|2p.
(36)
Therefore, by Lemmas 3.4 and 3.5,
E|mn,z(ζ)mz(ζ)|2pC(p)Apω4p|I(ζ)|8pncn2p+1cnp/2.

Since mz is the Stieltjes transform of νz, it is a well-known property [see, for example, Secs. 11.4 and (11.4.1) in Ref. 9] that mz is the unique solution of (9) satisfying I(ζmz(ζ2))>0 and I(mz(ζ))>0 when I(ζ)>0.□

Before proving Theorem 1.4, we note the following spectral norm bound on X.

Proposition 4.1

(Spectral norm bound). There exists a constantK > 0 such thatX‖ ≤ Kwith probability 1 − o(1).

Proof.
For any vector vCn, it follows from the block structure of X that
XvCvmax1imTi+max1imUi+max1imDi,
where C > 0 is an absolute constant. The claim then follows from Lemma 2.4.□

In order to complete the Proof of Theorem 1.4, we will use the following replacement principle from Ref. 81. Let ‖A2 denote the Hilbert–Schmidt norm of the matrix A defined by the formula

A2tr(AA*)=tr(A*A).

Theorem 4.2

(Replacement principle; Theorem 2.1 from Ref. 81). Suppose for eachnthatGandXaren × nensembles of random matrices. Assume that the following holds:

  • The expression
    1nG22+1nX22
    is bounded in probability (respectively, almost surely).
  • For almost all complex numbersz,
    1nlogdetGz1nlogdetXz
    converges in probability (respectively, almost surely) to zero, and in particular, for fixedz, these determinants are nonzero with probability 1 − o(1) (respectively, almost surely nonzero for all but finitely manyn).
    Then,
    μGμX
    converges in probability (respectively, almost surely) to zero.

We will apply the replacement principle to the normalized band matrix X, while the other matrix is taken to be G1nG̃, where the entries of the n × n matrix G̃ are iid standard Gaussian random variables, i.e., G̃ is a Ginibre matrix. As the limiting behavior of μG is known to be almost surely the circular law,81 it will suffice, in order to complete the Proof of Theorem 1.4, to check the two conditions of Theorem 4.2.

Condition (i) from Theorem 4.2 follows by the law of large numbers. Thus, it suffices to verify the second condition. To do so, we introduce the following notation inspired by Chapter 11 of Ref. 9. For zC, we define the following empirical distributions constructed from the squared-singular values of Xz and Gz:

νXz()1ni=1nδsi2(Xz)()

and

νGz()1ni=1nδsi2(Gz)().

It follows that

1nlogdetXz1nlogdetGz=120logxνXz(dx)120logxνGz(dx).

By Theorem 2.1 and Proposition 4.1, there exists constant K > 0 (depending on z) such that

0logxνXz(dx)0logxνGz(dx)=cn25mKlogxνXz(dx)cn25mKlogxνGz(dx)
(37)

with probability 1 − o(1). Here, the largest and smallest singular values of Gz can be controlled by the results in Refs. 80 and 82. We will apply the following lemma.

Lemma 4.3.
For any probability measureμandνonRand any 0 < a < b,
ablog(x)dμ(x)ablog(x)dν(x)2[|logb|+|loga|]μν[a,b],
where
μν[a,b]supx[a,b]|μ([a,x])ν([a,x])|.

Proof.
We rewrite
ablog(x)dμ(x)=log(b)μ([a,b])abxb1tdtdμ(x).
Applying Fubini’s theorem, we deduce that
abxb1tdtdμ(x)=abμ([a,t])tdt.
Similarly, the same equalities apply to ν. Thus, we obtain that
ablog(x)dμ(x)ablog(x)dν(x)|log(b)||μ([a,b])ν([a,b])|+abμ([a,t])ν([a,t])tdt|logb|μν[a,b]+μν[a,b]ab1tdt,
from which the conclusion follows.□

Returning to (37) and applying the above lemma, we find that

1nlogdetXz1nlogdetGzCnbnlog(n)νXz()νGz()[0,)
(38)

for a constant C > 0, where

μν[0,)supx0|μ((,x])ν((,x])|

for any probability measures μ and ν on R. Let νz(·) be the probability measure on [0, ) from Theorem 3.1 (or equivalently, the probability measure defined in Sec. 11.4 of Ref. 9). By the triangle inequality, it suffices to show that

νXz()νz()[0,)=Onlognbn21/31
(39)

and

νGz()νz()[0,)=Onlognbn21/31
(40)

with probability 1 − o(1). The convergence in (40) follows from Lemma 11.16 from Ref. 9; in fact, the results in Ref. 9 provide a much better error bound, which holds almost surely. Thus, it remains to establish (39), which is a consequence of the following lemma.

Lemma 4.4.
LetX̃andXbe as in Theorem 1.4 withbnn32/33 log n. Then, for any fixedzC,
νXz()νz()[0,)=Onlognbn21/31
with probability 1 − o(1).

Proof.
Fix zC. For notational simplicity, define
qnnlognbn2.
Let mn,z be the Stieltjes transform of νXz() and mz be the Stieltjes transform of νz(·). We consider both Stieltjes transforms only on the upper-half plane C+. On the upper-half plane, both Stieltjes transforms are Lipschitz,
|mn,z(ζ)mn,z(ξ)||ζξ|IζIξ,|mz(ζ)mz(ξ)||ζξ|IζIξ.
(41)
Fix A > 0 sufficiently large to be chosen later. Define the line segment in the complex plane,
Lζ=θ+iqn2/31C+:AθA.
(42)
Applying Theorem 3.1 and Markov’s inequality, for any ζL, we have
P|mn,z(ζ)mz(ζ)|qn5/31Cqn26/31nbn2
for a constant C > 0, which depends only on the moments of the atom variable ξ and A. Let N be a qn5/31-net of L. By a simple covering argument, N can be chosen so that |N|=O(qn5/31). Thus, by the union bound,
PsupζN|mn,z(ζ)mz(ζ)|qn5/31Cqnnbn2=Clogn=o(1).
Using the Lipschitz continuity (41), this bound can be extended to all of L, and we obtain
supζL|mn,z(ζ)mz(ζ)|=Oqn1/31
(43)
with probability 1 − o(1).
To complete the proof, we will use Corollary B.15 from Ref. 9 and (43) to bound νXz()νz()[0,). Indeed, take K > 0 sufficiently large so that νXz([0,K])=1 with probability 1 − o(1) and νz([0, K]) = 1. Such a choice is always possible by Proposition 4.1 since νz has compact support (a fact which can also be deduced from Proposition 4.1). Recall the parameter A > 0 used to define the line segment L [see (42)]. Taking A, a > 0 sufficiently large, setting ηqn2/31, and letting ζθ + , Corollary B.15 from Ref. 9 implies that
νXz()νz()[0,)CAA|mn,z(ζ)mz(ζ)|dθ+1ηsupx|y|2ηa|νz((,x+y])νz((,x])|dy,
where C > 0 depends only on the choice of A, K, a. The second term is bounded by Lemma 11.9 from Ref. 9,
1ηsupx|y|2ηa|νz((,x+y])νz((,x])|dyCη
for a constant C′ > 0 depending only on a. For the first term, we apply (43) to obtain
AA|mn,z(ζ)mz(ζ)|dθ=Oqn1/31
with probability 1 − o(1). Combining the two bounds above, we conclude that, with probability 1 − o(1),
νXz()ν(,z)[0,)=Oqn1/31,
which completes the proof of the lemma.□

Lemma 4.4 establishes (39). Combining (39) and (40) with (38) and taking bnn32/33 log n complete the Proof of Theorem 1.4.

The authors thank the anonymous referees for useful feedback and corrections. K. Luh was supported, in part, by the NSF under Grant No. DMS-1702533. S. O’Rourke was supported, in part, by the NSF under Grant Nos. ECCS-1913131 and DMS-1810500.

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Lemma A.1
(Sherman–Morrison formula; see Sec. 0.7.4 in Ref. 48). LetAandA + vv*be two invertible matrices, wherevCn. Then,
v*(A+vv*)1=v*A11+v*A1v.

Lemma A.2.
LetζC\R+andAbe ann × nnon-negative definite matrix. Then, for anyvCn,
|tr[(A+vv*ζI)1(AζI)1]|1|I(ζ)|.

Proof.
The proof is similar to Lemma 2.6 in Ref. 77. Using the resolvent identity and Lemma A.1,
trA+vv*ζI1AζI1=trA+vv*ζI1vv*AζI1=v*AζI1A+vv*ζI1v=v*AζI1AζI1v1+v*AζI1v.
(A1)
Let A=i=1nλi(A)uiui* be the spectral decomposition of A, where λi(A) ≥ 0 for all 1 ≤ in. Then,
|v*AζI1AζI1v|=i=1n|ui*v|2|λiAζ|2,|1+v*AζI1v|2=1+i=1n|ui*v|2λiAζ2=1+i=1nλiAζ̄|ui*v|2|λiAζ|22=1+i=1nλiARζ|ui*v|2|λiAζ|22+i=1nIζ|ui*v|2|λiAζ|22|Iζ|2i=1n|ui*v|2|λiAζ|22.
(A2)
Plugging in the above estimates in (A1), we obtain the result.□

Lemma A.3
(Lemma 2.7 from Ref. 11 and Eq. (2.5) in Ref. 71). Letξ = (ξ1, ξ2, …, ξn) be a random vector such thatξiare iid complex-valued random variables withE[ξ1]=0andE[|ξ1|2]=1. Then, for any deterministicn × nmatrixA,
E|ξ*AξtrA|pC1pE|ξ1|4trA*Ap/2+E|ξ1|2ptrA*Ap/2,E|ξ*Aξ|pC2pE|ξ1|2ptrA*Ap/2+|trA|p,
whereC1(p), C2(p) are constants that depend only onp.

Corollary A.4.
LetI1,2,,nbe a fixed index set andξ1, ξ2, …, ξnbe a set of iid complex-valued random variables withE[ξ1]=0andE[|ξ1|2]=1. Definev = (v1, v2, …, vn), wherevi=ξi1iI. Then, for any fixedn × ndeterministic matrixA, we have
Ev*AviIaiipC(p)|I|p/2E[|ξ1|2p]Ap.

Proof.

Let us define an n × n matrix à as (Ã)ij=aij1iI1jI, where aij = (A)ij. Then, v*Av=v*Ãv. In addition, trÃ=iIaii. Therefore, using Lemma A.3 and the fact that tr(Ã*Ã)|I|Ã2|I|A2, the claim of the corollary follows.□

Lemma A.5.
LetPandQbe twon × nnon-negative definite matrices; then, for anyζC\R+andI1,2,,n,
kI(PζI)kk1iI(QζI)kk12|I(ζ)|rank(PQ).

Proof.
The above lemma is similar to Lemma C.3 from Ref. 21. For the readers’ convenience, we include the proof here. Using the resolvent identity, we have
(PζI)1(QζI)1=(PζI)1(QP)(QζI)1.
Therefore, r ≔ rank[(PζI)−1 − (QζI)−1] ≤ rank(PQ). Let us write the singular value decomposition as
(PζI)1(QζI)1=i=1rsiuivi*,
where s1, s2, …, sr are at most r non-zero singular values of (PζI)−1 − (QζI)−1 and u1,u2,,ur, v1,v2,,vr are two sets of orthonormal vectors. Consequently, we may write
(PζI)kk1(QζI)kk1=i=1rsi(ekTui)(vi*ek).
Using Cauchy–Schwarz inequality,
kI(PζI)kk1kI(QζI)kk1i=1rsikI|eTuivi*ek|i=1rsikI|ekTui|2kI|vk*ek|2i=1rsiuvi=1rsi2r|I(ζ)|2|I(ζ)|rank(PQ),
where the second last inequality follows from the fact that si(PζI)1(QζI)12/|I(ζ)| for all 1 ≤ ir.□

Result A.6
(Azuma–Hoeffding inequality; see Ref. 60). Letξkkbe a martingale with respect to the filtrationFkksuch that for allk, |ξk+1ξk| ≤ ckalmost surely. Then, for anyt > 0,
P(|ξnE[ξn]|>t)2expt22k=1nck2.

A simple consequence of the previous concentration inequality is a bound on the moments.

Corollary A.7.
Under the conditions of Result A.6, forlN, we have
E[|ξnEξn|l]C(l)k=1nck2l/2,
whereC(l) is a constant only depending onl.

Proof.
This result can be deduced from the straightforward calculation using Result A.6,
E[|ξnEξn|l]=l0tl1P(|ξnEξn|>t)dt2l0tl1expt22k=1nck2dt=l2k=1nck2l/20ul/21eudu=lΓ(l/2)2l/22k=1nck2l/2,
where Γ is the gamma function.□

Our final lemma is a technical observation, which is of use in Sec. III.

Lemma A.8.
We letXbe the random matrix from Theorem 1.4 (without the restriction on the bandwidth). We recall the notation from Sec.3III. For fixedzCandζin the upper half of the complex plane,
Pz,ζ(XzXz*)ζ=(XzI)(XzI)*ζI.
Then, for all 1 ≤ in,
E[Pz,ζii1]=E[Pz,ζ111].

Proof.
We divide [n] into sets I1, …, Im, where Ii=[(i1)bn+1,ibn]N. Let Pij denote the n × n permutation matrix that permutes the ith and jth column when acting from the left on a matrix. Observe that when i, jIk for some k ∈ [m], PijXzPij1 has the same distribution as Xz due to the iid assumption and block structure. Therefore, XzXz* has the same distribution as PijXzPijTPijXz*PijT=PijXzXz*PijT. Thus,
XzXz*ζIii1PijXzXz*ζIPijTii1=PijXzXz*ζI1PijiiXzXz*ζIjj1,
where we use ∼ to denote equality in distribution. This establishes that the expectation for any two indices in the same index block is identical. It remains to show that the expectations for the various blocks are the same. Here, we define a permutation that exploits the block band structure. Let P be the permutation that cyclically shifts Ik to Ik+1 maintaining the order within each block and using the convention that Im+1 = I1. By the structure of the matrix and the iid assumption,
XzXz*PXzXz*P1.
Thus,
XzXz*ζI111PXzXz*ζIP1111=P1XzXz*ζI1P11XzXz*ζIb+1,b+11.
Continuing inductively establishes the equivalence of all the expectations along the diagonal of (XzXz*ζI)1.□

1.
R.
Adamczak
and
D.
Chafaï
, “
Circular law for random matrices with unconditional log-concave distribution
,”
Commun. Contemp. Math.
17
(
4
),
1550020
(
2015
).
2.
R.
Adamczak
,
D.
Chafaï
, and
P.
Wolff
, “
Circular law for random matrices with exchangeable entries
,”
Random Struct. Algorithms
48
(
3
),
454
479
(
2016
).
3.
J.
Aljadeff
,
D.
Renfrew
, and
M.
Stern
, “
Eigenvalues of block structured asymmetric random matrices
,”
J. Math. Phys.
56
(
10
),
103502
(
2015
).
4.
J.
Aljadeff
,
M.
Stern
, and
T.
Sharpee
, “
Transition to chaos in random networks with cell-type-specific connectivity
,”
Phys. Rev. Lett.
114
,
088101
(
2015
).
5.
S.
Allesina
,
J.
Grilli
,
G.
Barabás
,
S.
Tang
,
J.
Aljadeff
, and
A.
Maritan
, “
Predicting the stability of large structured food webs
,”
Nat. Commun.
6
,
7842
(
2015
).
6.
S.
Allesina
and
S.
Tang
, “
The stability–complexity relationship at age 40: A random matrix perspective
,”
Popul. Ecol.
57
(
1
),
63
75
(
2015
).
7.
J.
Alt
,
L.
Erdős
, and
T.
Krüger
, “
Local inhomogeneous circular law
,”
Ann. Appl. Probab.
28
(
1
),
148
203
(
2018
).
8.
G. W.
Anderson
and
O.
Zeitouni
, “
A CLT for a band matrix model
,”
Probab. Theory Relat. Fields
134
(
2
),
283
338
(
2006
).
9.
Z.
Bai
and
J. W.
Silverstein
,
Spectral Analysis of Large Dimensional Random Matrices
, 2nd ed., Springer Series in Statistics (
Springer
,
New York
,
2010
).
10.
Z. D.
Bai
, “
Circular law
,”
Ann. Probab.
25
(
1
),
494
529
(
1997
).
11.
Z. D.
Bai
and
J. W.
Silverstein
, “
No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices
,”
Ann. Probab.
26
(
1
),
316
345
(
1998
).
12.
A. S.
Bandeira
and
R.
van Handel
, “
Sharp nonasymptotic bounds on the norm of random matrices with independent entries
,”
Ann. Probab.
44
(
4
),
2479
2506
(
2016
).
13.
A.
Basak
and
A.
Bose
, “
Limiting spectral distributions of some band matrices
,”
Period. Math. Hung.
63
(
1
),
113
150
(
2011
).
14.
A.
Basak
,
N.
Cook
, and
O.
Zeitouni
, “
Circular law for the sum of random permutation matrices
,”
Electron. J. Probab.
23
,
1
51
(
2018
).
15.
S.
Belinschi
,
A.
Dembo
, and
A.
Guionnet
, “
Spectral measure of heavy tailed band and covariance random matrices
,”
Commun. Math. Phys.
289
(
3
),
1023
1055
(
2009
).
16.
L. V.
Bogachev
,
S. A.
Molchanov
, and
L. A.
Pastur
, “
On the density of states of random band matrices
,”
Mat. Zametki
50
(
6
),
31
42
(
1991
).
17.
C.
Bordenave
,
P.
Caputo
, and
D.
Chafaï
, “
Spectrum of large random reversible Markov chains: Heavy-tailed weights on the complete graph
,”
Ann. Probab.
39
(
4
),
1544
1590
(
2011
).
18.
C.
Bordenave
,
P.
Caputo
, and
D.
Chafaï
, “
Spectrum of non-Hermitian heavy tailed random matrices
,”
Commun. Math. Phys.
307
(
2
),
513
560
(
2011
).
19.
C.
Bordenave
,
P.
Caputo
, and
D.
Chafaï
, “
Circular law theorem for random Markov matrices
,”
Probab. Theory Relat. Fields
152
(
3–4
),
751
779
(
2012
).
20.
C.
Bordenave
and
D.
Chafaï
, “
Around the circular law
,”
Probab. Surv.
9
,
1
89
(
2012
).
21.
C.
Bordenave
and
A.
Guionnet
, “
Localization and delocalization of eigenvectors for heavy-tailed random matrices
,”
Probab. Theory Relat. Fields
157
(
3–4
),
885
953
(
2013
).
22.
P.
Bourgade
, “
Random band matrices
,” in
Proceedings of the International Congress of Mathematicians—Rio de Janeiro 2018. Invited Lectures
(
World Scientific Publishing
,
Hackensack, NJ
,
2018
), Vol. IV, pp.
2759
2784
.
23.
P.
Bourgade
,
L.
Erdős
,
H.-T.
Yau
, and
J.
Yin
, “
Universality for a class of random band matrices
,”
Adv. Theor. Math. Phys.
21
(
3
),
739
800
(
2017
).
24.
P.
Bourgade
,
F.
Yang
,
H.-T.
Yau
, and
J.
Yin
, “
Random band matrices in the delocalized phase, II: Generalized resolvent estimates
,”
J. Stat. Phys.
174
(
6
),
1189
1221
(
2019
).
25.
P.
Bourgade
,
H.-T.
Yau
, and
J.
Yin
, “
Local circular law for random matrices
,”
Probab. Theory Relat. Fields
159
(
3–4
),
545
595
(
2014
).
26.
P.
Bourgade
,
H.-T.
Yau
, and
J.
Yin
, “
The local circular law II: The edge case
,”
Probab. Theory Relat. Fields
159
(
3–4
),
619
660
(
2014
).
27.
G.
Casati
and
V.
Girko
, “
Wigner’s semicircle law for band random matrices
,”
Random Oper. Stochastic Equations
1
(
1
),
15
21
(
1993
).
28.
G.
Casati
,
F.
Izrailev
, and
L.
Molinari
, “
Scaling properties of the eigenvalue spacing distribution for band random matrices
,”
J. Phys. A: Math. Gen.
24
(
20
),
4755
4762
(
1991
).
29.
G.
Casati
,
L.
Molinari
, and
F.
Izrailev
, “
Scaling properties of band random matrices
,”
Phys. Rev. Lett.
64
(
16
),
1851
1854
(
1990
).
30.
R.
Chaudhuri
,
V.
Jain
, and
N. S.
Pillai
, “
Universality and least singular values of random matrix products: A simplified approach
,” arXiv:2007.03595 (
2020
).
31.
N.
Cook
, “
The circular law for random regular digraphs with random edge weights
,”
Random Matrices: Theory Appl.
6
(
3
),
1750012
(
2017
).
32.
N.
Cook
, “
Lower bounds for the smallest singular value of structured random matrices
,”
Ann. Probab.
46
(
6
),
3442
3500
(
2018
).
33.
N.
Cook
, “
The circular law for random regular digraphs
,”
Ann. Inst. Henri Poincare Probab. Stat.
55
(
4
),
2111
2167
(
2019
).
34.
N.
Cook
,
W.
Hachem
,
J.
Najim
, and
D.
Renfrew
, “
Non-Hermitian random matrices with a variance profile (I): Deterministic equivalents and limiting ESDs
,”
Electron. J. Probab.
23
,
1
61
(
2018
).
35.
N. A.
Cook
,
W.
Hachem
,
J.
Najim
, and
D.
Renfrew
, “
Non-hermitian random matrices with a variance profile (II): Properties and examples
,” arXiv:2007.15438 (
2020
).
36.
G.
Dubach
and
Y.
Peled
, “
On words of non-Hermitian random matrices
,”
Ann. Probab.
49
(
4
),
1886
(
2021
).
37.
A.
Edelman
, “
The probability that a random real Gaussian matrix has k real eigenvalues, related distributions, and the circular law
,”
J. Multivar. Anal.
60
(
2
),
203
232
(
1997
).
38.
L.
Erdős
and
A.
Knowles
, “
Quantum diffusion and eigenfunction delocalization in a random band matrix model
,”
Commun. Math. Phys.
303
(
2
),
509
554
(
2011
).
39.
L.
Erdős
,
A.
Knowles
, and
H.-T.
Yau
, “
Averaging fluctuations in resolvents of random band matrices
,”
Ann. Henri Poincare
14
(
8
),
1837
1926
(
2013
).
40.
L.
Erdős
,
A.
Knowles
,
H.-T.
Yau
, and
J.
Yin
, “
Delocalization and diffusion profile for random band matrices
,”
Commun. Math. Phys.
323
(
1
),
367
416
(
2013
).
41.
Y. V.
Fyodorov
and
A. D.
Mirlin
, “
Scaling properties of localization in random band matrices: A σ-model approach
,”
Phys. Rev. Lett.
67
(
18
),
2405
2409
(
1991
).
42.
J.
Ginibre
, “
Statistical ensembles of complex, quaternion, and real matrices
,”
J. Math. Phys.
6
,
440
449
(
1965
).
43.
V. L.
Girko
, “
The circular law
,”
Teor. Veroyatn. Primen.
29
(
4
),
669
679
(
1984
).
44.
V. L.
Girko
, “
The elliptic law
,”
Teor. Veroyatn. Primen.
30
(
4
),
640
651
(
1985
).
45.
V. L.
Girko
, “
The circular law: Ten years later
,”
Random Oper. Stochastic Equations
2
(
3
),
235
276
(
1994
).
46.
F.
Götze
,
A.
Naumov
, and
A.
Tikhomirov
, “
On a generalization of the elliptic law for random matrices
,”
Acta Phys. Pol., B
46
(
9
),
1737
1745
(
2015
).
47.
F.
Götze
and
A.
Tikhomirov
, “
The circular law for random matrices
,”
Ann. Probab.
38
(
4
),
1444
1491
(
2010
).
48.
R. A.
Horn
and
C. R.
Johnson
,
Matrix Analysis
, 2nd ed. (
Cambridge University Press
,
Cambridge
,
2013
).
49.
Y.
Imry
, “
Coherent propagation of two interacting particles in a random potential
,”
Europhys. Lett.
30
(
7
),
405
408
(
1995
).
50.
P.
Jacquod
and
D. L.
Shepelyansky
, “
Hidden Breit–Wigner distribution and other properties of random matrices with preferential basis
,”
Phys. Rev. Lett.
75
,
3501
3504
(
1995
).
51.
V.
Jain
, “
The strong circular law: A combinatorial view
,”
Random Matrices: Theory Appl.
(published online) (2020).
52.
I.
Jana
, “
CLT for non-Hermitian random band matrices with variance profiles
,” arXiv:1904.11098 (
2019
).
53.
I.
Jana
,
K.
Saha
, and
A.
Soshnikov
, “
Fluctuations of linear eigenvalue statistics of random band matrices
,”
Theory Probab. Appl.
60
(
3
),
407
443
(
2016
).
54.
I.
Jana
and
A.
Soshnikov
, “
Distribution of singular values of random band matrices; Marchenko–Pastur law and more
,”
J. Stat. Phys.
168
(
5
),
964
985
(
2017
).
55.
A.
Khorunzhy
, “
On spectral norm of large band random matrices
,” arXiv:math-ph/0404017 (
2004
).
56.
L.
Li
and
A.
Soshnikov
, “
Central limit theorem for linear statistics of eigenvalues of band random matrices
,”
Random Matrices: Theory Appl.
2
(
4
),
1350009
(
2013
).
57.
A. E.
Litvak
,
A.
Lytova
,
K.
Tikhomirov
,
N.
Tomczak-Jaegermann
, and
P.
Youssef
, “
Circular law for sparse random regular digraphs
,”
J. Eur. Math. Soc.
23
(
2
),
467
501
(
2021
).
58.
D.-Z.
Liu
and
Z.-D.
Wang
, “
Limit distribution of eigenvalues for random Hankel and Toeplitz band matrices
,”
J. Theor. Probab.
24
(
4
),
988
1001
(
2011
).
59.
R. M.
May
, “
Will a large complex system be stable?
,”
Nature
238
,
413
414
(
1972
).
60.
C.
McDiarmid
, “
On the method of bounded differences
,” in
Surveys in Combinatorics, 1989
, London Mathematical Society Lecture Note Series Vol. 141 (
Cambridge University Press
,
Cambridge, Norwich
,
1989
), pp.
148
188
.
61.
M. L.
Mehta
,
Random Matrices and the Statistical Theory of Energy Levels
(
Academic Press
,
New York, London
,
1967
).
62.
M. L.
Mehta
,
Random Matrices
, 3rd ed., Pure and Applied Mathematics Vol. 142 (
Elsevier/Academic Press
,
Amsterdam
,
2004
).
63.
A. D.
Mirlin
,
Y. V.
Fyodorov
,
F.-M.
Dittes
,
J.
Quezada
, and
T. H.
Seligman
, “
Transition from localized to extended eigenstates in the ensemble of power-law random banded matrices
,”
Phys. Rev. E
54
,
3221
3230
(
1996
).
64.
S. A.
Molchanov
,
L. A.
Pastur
, and
A. M.
Khorunzhiĭ
, “
Limiting eigenvalue distribution for band random matrices
,”
Theor. Math. Phys.
90
,
108
118
(
1992
).
65.
A.
Naumov
, “
Elliptic law for real random matrices
,” arXiv:1201.1639 (
2012
).
66.
H. H.
Nguyen
, “
Random doubly stochastic matrices: The circular law
,”
Ann. Probab.
42
(
3
),
1161
1196
(
2014
).
67.
H. H.
Nguyen
and
S.
O’Rourke
, “
The elliptic law
,”
Int. Math. Res. Not.
2015
(
17
),
7620
7689
.
68.
S.
Olver
and
A.
Swan
, “
Evidence of the Poisson/Gaudin–Mehta phase transition for band matrices on global scales
,”
Random Matrices: Theory Appl.
7
(
2
),
1850002
(
2018
).
69.
S.
O’Rourke
,
D.
Renfrew
,
A.
Soshnikov
, and
V.
Vu
, “
Products of independent elliptic random matrices
,”
J. Stat. Phys.
160
(
1
),
89
119
(
2015
).
70.
S.
O’Rourke
and
A.
Soshnikov
, “
Products of independent non-Hermitian random matrices
,”
Electron. J. Probab.
16
(
81
),
2219
2245
(
2011
).
71.
B.
Rider
and
J. W.
Silverstein
, “
Gaussian fluctuations for non-Hermitian random matrix ensembles
,”
Ann. Probab.
34
(
6
),
2118
2143
(
2006
).
72.
M.
Rudelson
and
K.
Tikhomirov
, “
The sparse circular law under minimal assumptions
,”
Geom. Funct. Anal.
29
(
2
),
561
637
(
2019
).
73.
M.
Rudelson
and
R.
Vershynin
, “
The Littlewood–Offord problem and invertibility of random matrices
,”
Adv. Math.
218
(
2
),
600
633
(
2008
).
74.
J.
Schenker
, “
Eigenvector localization for random band matrices with power law band width
,”
Commun. Math. Phys.
290
(
3
),
1065
1097
(
2009
).
75.
M.
Shcherbina
, “
On fluctuations of eigenvalues of random band matrices
,”
J. Stat. Phys.
161
(
1
),
73
90
(
2015
).
76.
D.
Shlyakhtenko
, “
Random Gaussian band matrices and freeness with amalgamation
,”
Int. Math. Res. Not.
1996
(
20
),
1013
1025
.
77.
J. W.
Silverstein
and
Z. D.
Bai
, “
On the empirical distribution of eigenvalues of a class of large-dimensional random matrices
,”
J. Multivar. Anal.
54
(
2
),
175
192
(
1995
).
78.
S.
Sodin
, “
The spectral edge of some random band matrices
,”
Ann. Math.
172
(
3
),
2223
2251
(
2010
).
79.
D. B.
Stouffer
and
J.
Bascompte
, “
Compartmentalization increases food-web persistence
,”
Proc. Natl. Acad. Sci. U. S. A.
108
(
9
),
3648
3652
(
2011
).
80.
T.
Tao
and
V.
Vu
, “
Random matrices: The circular law
,”
Commun. Contemp. Math.
10
(
2
),
261
307
(
2008
).
81.
T.
Tao
and
V.
Vu
, “
Random matrices: Universality of ESDs and the circular law
,”
Ann. Probab.
38
(
5
),
2023
2065
(
2010
), with an appendix by Manjunath Krishnapur.
82.
R.
Vershynin
, “
Introduction to the non-asymptotic analysis of random matrices
,” in
Compressed Sensing
(
Cambridge University Press
,
Cambridge
,
2012
), pp.
210
268
.
83.
E. P.
Wigner
, “
Characteristic vectors of bordered matrices with infinite dimensions
,”
Ann. Math.
62
(
3
),
548
564
(
1955
).
84.
P. M.
Wood
, “
Universality and the circular law for sparse random matrices
,”
Ann. Appl. Probab.
22
(
3
),
1266
1300
(
2012
).
85.
H.
Xi
,
F.
Yang
, and
J.
Yin
, “
Local circular law for the product of a deterministic matrix with a random matrix
,”
Electron. J. Probab.
22
,
1
77
(
2017
).
86.
F.
Yang
and
J.
Yin
, “
Random band matrices in the delocalized phase, III: Averaging fluctuations
,”
Probab. Theory Relat. Fields
179
(
1–2
),
451
540
(
2021
).
87.
J.
Yin
, “
The local circular law III: General case
,”
Probab. Theory Relat. Fields
160
(
3–4
),
679
732
(
2014
).
88.
For two invertible matrices A and B of the same dimension, the resolvent identity is the observation that
A1B1=A1(BA)B1.