The Bistritzer–MacDonald (BM) model, introduced by Bistritzer and MacDonald [Proc. Natl. Acad. Sci. U. S. A. **108**, 12233–12237 (2011); arXiv:1009.4203], attempts to capture electronic properties of twisted bilayer graphene (TBG), even at incommensurate twist angles, by using an effective periodic model over the bilayer moiré pattern. Starting from a tight-binding model, we identify a regime where the BM model emerges as the effective dynamics for electrons modeled as wave-packets spectrally concentrated at monolayer Dirac points up to error that can be rigorously estimated. Using measured values of relevant physical constants, we argue that this regime is realized in TBG at the first “magic” angle.

## I. INTRODUCTION

### A. Overview

Twisted bilayer graphene (TBG) consists of two graphene monolayers deposited on top of each other with a relative twist. Although, for generic twist angles, twisted bilayer graphene has no *exact* periodic cell, the atomic structure of twisted bilayer graphene has an *approximate* twist angle-dependent periodic structure known as the bilayer moiré pattern. Bistritzer and MacDonald (BM)^{1,2} proposed a model in 2011 for electronic properties of TBG, which is periodic over the bilayer moiré pattern, and used it to predict a series of “magic” angles where the Fermi velocity would vanish and electron–electron interaction effects might be enhanced. These predictions were dramatically validated by experimental observations of Mott insulation and superconductivity at the first (largest) magic angle *θ* ≈ 1° in 2018.^{3,4} Since these discoveries, superconductivity and other strongly correlated electronic phases have been observed in many other twisted multilayer systems, now often referred to collectively as moiré materials. The ubiquity of moiré scale models in the field of moiré materials makes it important to understand their range of validity. Although compelling formal arguments for the validity of such models exist,^{1,2,5,6} to our knowledge, no mathematically rigorous justification of any of these models exists in the literature.

The present work identifies a parameter regime where the Bistritzer–MacDonald model^{1} can be used to approximate electronic properties of TBG up to error that can be *rigorously estimated*. More specifically, it considers the single-particle time-dependent Schrödinger equation, in the tight-binding approximation, for electrons in TBG, making a two-center approximation for the interlayer hopping function [see (32) and assumption II.1]. We non-dimensionalize the model, identifying its fundamental dimensionless parameters: *ℓ* > 0, the ratio of interlayer distance to the monolayer lattice constant, and *θ* > 0, the twist angle. We introduce sufficiently regular (assumption III.1) wave-packet initial data (92), spectrally concentrated at the monolayer Dirac points, whose spread in momentum space is controlled by a new dimensionless parameter *γ* > 0.^{7} We consider the parameter regime where

where $h\u0302(|\kappa |;\u2113)$ denotes the Fourier transform of the (non-dimensionalized) interlayer hopping function evaluated at the (non-dimensionalized) monolayer ** K** point

**≔**

*κ**a*

**, where**

*K**a*is the monolayer lattice constant. We then prove that in regime (1), the solution evolves as a wave-packet (95) over timescales ∝

*γ*

^{−(1+ν)}for any 0 ≤

*ν*<

*c*for some constant

*c*> 0 depending on the interlayer hopping function (assumption II.1) up to error that → 0 as

*γ*→ 0

*if and only if*

^{8}*the wave-packet envelopes evolve according to a (scaled) Bistritzer–MacDonald model*(Lemma III.2 and Theorem III.1). That BM model is, specifically,

where ** σ** = (

*σ*

_{1},

*σ*

_{2}) denotes the vector of Pauli matrices, $f=f1A,f1B,f2A,f2B\u22a4$ denotes the vector of envelope functions on layers 1, 2 and monolayer sublattices

*A*,

*B*, and

*T*

_{n}, and $sn\u2254asn$,

*n*∈ {1, 2, 3}, denote hopping matrices (56) and (non-dimensionalized) momentum shifts (53). It is straightforward to check that model (2) is invariant (modulo unitary transformations) under translations in the moiré lattice (Lemma III.6) and under arbitrary interlayer displacements (Lemma III.5). All the

*θ*-dependence of model (2) enters through the length of the momentum shifts

*s*_{n}, which sets the moiré scale.

apart from a *θ*-dependence of the monolayer Dirac cones,^{1} which we prove can be neglected in regime (1).^{10} In (3), *ℏ* denotes Planck’s constant, *v*_{D} is the Dirac velocity of monolayer graphene, and $h\u0302(|K|;L)$ denotes the Fourier transform of the interlayer hopping function evaluated at the monolayer ** K** point and interlayer distance

*L*> 0. Using physically realistic values for

*v*and

*w*, Bistritzer and MacDonald numerically computed the band structure of Hamiltonian (3), finding a sequence of “magic” twist angles where the Fermi velocity vanishes. They also gave an

*analytical*prediction of the first (largest) magic angle

*θ*≈ 1° by a formal perturbation theory, and we review this calculation (Sec. V).

We emphasize that, although the analysis of this work is specialized to time-propagation of electronic wave-packets, we view our basic ansatz (95) as probing single-particle electronic properties of TBG more generally. It is natural to ask, then, whether regime (1) is really the relevant regime for understanding TBG’s most interesting physical properties. In particular, one can question as follows: *given the estimated physical value of* $h\u0302(|\kappa |;\u2113)$ *and* *θ* *near to the first magic angle, does there exist a range of* *γ* *values, with* *γ* ≪ 1*, such that both conditions of* *(1)* *hold, with constants on the order of 1?* We verify this as follows. Using the values given in Ref. 1 for the interlayer hopping energy and monolayer graphene *π*-band energy scale,

we find that

On the other hand, the first magic angle is at

The statement now follows immediately: take any *γ* such that $h\u0302(|\kappa |;\u2113)=\lambda 0\gamma $, with *λ*_{0} on the order of 1. Then, we have both *γ* ≪ 1 and *θ* ≤ *λ*_{1}*γ* for all *θ* up to and including the magic angle, with *λ*_{1} also on the order of 1. For such *γ*, our analysis shows that (3) will be a valid model of *single-particle* electronic properties of TBG over the energy scale $\gamma \u210fvDa\u223ch(|K|;L)|\Gamma |$ and the length scale *γ*^{−1}*a*.

We comment finally on the relevance of the BM model (3) to modeling of strongly correlated (many-body) electronic phases in TBG. At the first magic angle, the number of atoms per moiré cell can be estimated by squaring the ratio of the moiré lattice constant $\u2248a\theta $ to the monolayer lattice constant *a* and multiplying by 4 (to capture the layer and sublattice degrees of freedom), giving

The largeness of this number makes it imperative to develop many-body models of TBG at the moiré scale. We now argue, by a simple comparison of energy scales, that the BM model (3) accurately approximates the single-particle part of many-body models (we do not attempt any rigorous analysis in this direction). Fixing *θ* again at the magic angle, the Coulomb e–e interaction in hexagonal boron nitride (hBN, which typically encapsulates TBG in experiments) at the moiré scale defines an energy via

where *ϵ*_{hBN} is the permittivity of hBN $\u22485\u03f50$,^{11}^{,}*ϵ*_{0} is the vacuum permittivity, and *e* is the elementary charge. The smallness of this number relative to the interlayer hopping energy (4) suggests that e–e interaction terms couple only states that are described accurately by the BM model (3). Note that this is not the same as saying that interactions are insignificant. Indeed, they are known to significantly affect the dynamics in TBG’s flat moiré bands, whose width can be as small as $\u224810$ meV,^{12} leading to the wealth of strongly correlated phases seen in experiments to date. Similar considerations apply to lattice-mediated interactions between electrons, which are characterized by a smaller energy scale $\u223c1$ meV.^{13,14}

### B. Related work

We comment on related mathematical work. Our results are closely related to those of Fefferman and Weinstein,^{15} who considered wave-packets spectrally localized at the Dirac points of a graphene *monolayer*, starting from a partial differential equation (PDE) model rather than the tight-binding model we consider here. Under additional assumptions, we recover the same timescale of validity (∼*γ*^{−2+ϵ} for any *ϵ* > 0) as they prove (see Remark III.7).

During the preparation of this work, we were made aware of parallel work deriving Bistritzer–MacDonald-type models for twisted bilayer graphene by Cancès, Garrigue, and Gontier.^{6} Our works developed independently and, indeed, work in different settings and derive different results. While the authors of Ref. 6 worked in the setting of a continuum model, we work in the setting of a discrete tight-binding model. While the authors of Ref. 6 derived a moiré scale model by a formal variational approximation, we identify a parameter scaling limit where the Bistritzer–MacDonald model emerges up to higher-order terms, which can be rigorously estimated. Finally, in contrast to Ref. 6, the Bistritzer–MacDonald model we derive has only *nearest-neighbor* momentum space hopping, consistent with Ref. 1 [see Fig. 4(b) for a discussion of the meaning of “nearest-neighbor” here].

We finally comment on related mathematical work on continuum PDE models of moiré materials. Becker *et al.* introduced a spectral formulation of magic angles in Refs. 16 and 17, while Becker, Kim, and Zhu^{18} obtained expansions of the density of states of TBG in the presence of strong magnetic fields using semiclassical analysis. Becker and Wittsten^{19} proved a semiclassical quantization rule for the one-dimensional moiré-scale model introduced by Timmel and Mele.^{20} Becker, Ge, and Wittsten also studied the transport and spectral properties of this model in Ref. 21. Two of the authors of the present work gave a computer-assisted proof of existence of the first magic angle of the “chiral” approximation to the Bistritzer–MacDonald model in Ref. 22. Bal *et al.*^{23} studied the topological protection of edge states propagating along domain wall interfaces in relaxed twisted bilayer graphene. Numerical methods for computing electronic properties of twisted multilayer materials that exploit similar approximations to those made in the Bistritzer–MacDonald model have been developed.^{24–29}

### C. Conclusions and perspectives

The main results of the present work are (1) the identification of the parameter regime (1) and (2) the identification of sufficient assumptions, most importantly, on the interlayer hopping function (assumption II.1) so that the BM model (3) can be rigorously justified as an effective model of wave-packet propagation in TBG. As alluded to above, however, the BM model (3) is by no means only a model of wave-packet propagation. Rigorously justifying the use of model (3), for example, to calculate the density of states or conductivity of TBG in the parameter regime identified here and under similar assumptions will be the subject of future works.

Another important potential development of the present work would justify various proposed models of TBG, which account for effects of mechanical relaxation (see Remark II.4). Here, it would be especially interesting to identify whether there exists a parameter regime where TBG can be rigorously approximated by the “chiral” model,^{30} a simpler and more symmetric variant of the BM model. A further question would then be how close this regime is to that realized in real TBG. We expect that these analyses could be generalized to essentially arbitrary moiré materials, such as twisted multilayer transition metal dichalcogenides (TMDs) or even twisted heterostructures consisting of layers of distinct 2D materials. Here, an important question is how the distinct dispersion relations of these 2D materials affect the analysis; TMDs, for example, exhibit quadratic dispersion, quite unlike the linear dispersion exhibited by graphene. Finally, it would be very interesting to generalize these analyses to fundamental PDE models of moiré materials, such as the model of TBG studied in Ref. 6.

### D. Structure of paper

The structure of this work is as follows. We will first review tight-binding models of graphene and twisted bilayer graphene in Sec. II. We will then introduce the wave-packet *ansatz* we will use to generate approximate solutions of the tight-binding model Schrödinger equation and explain the strategy of the proof of convergence of these solutions in Sec. III. The strategy reduces the proof to a series of technical lemmas and a single main lemma, which identifies the parameter regime where the BM model represents the dominant effective dynamics, whose proof we give in Sec. IV. In Sec. V, we review the formal derivation of magic angles of Bistritzer and MacDonald. We postpone various detailed calculations and proofs to Appendixes A– E.

## II. THE TIGHT-BINDING MODEL OF TWISTED BILAYER GRAPHENE

The main object of this section is to introduce the tight-binding model of twisted bilayer graphene, which we will take as fundamental in what follows. Before we can do this, we must first recall the tight-binding model of monolayer graphene.

### A. The tight-binding model of graphene

Graphene is a sheet of carbon atoms arranged in a honeycomb lattice, that is, a triangular lattice with two atoms per unit cell. We introduce Bravais lattice vectors as

where *a* > 0 is the lattice constant [see Fig. 1(a)]. The graphene Bravais lattice Λ and a fundamental cell Γ can then be written succinctly as

It is straightforward to check that the area of a fundamental cell $|\Gamma |=32a2$. Within the ** R**th fundamental cell of the lattice, there are two atoms, at positions

**+**

*R*

*τ*^{A}and

**+**

*R*

*τ*^{B}, which we will take as

We model wave functions of electrons in graphene as elements *ψ* of the Hilbert space $Hmono\u2254\u21132(Z2;C2)$, writing $\psi =\psi RR\u2208\Lambda =\psi RA,\psi RBR\u2208\Lambda \u22a4$, where $|\psi R\sigma |2$ represents the electron density on sublattice *σ* ∈ {*A*, *B*} in the ** R**th cell. The graphene tight-binding Hamiltonian with nearest-neighbor hopping acts as

where *t* > 0 is the nearest-neighbor hopping energy.

The Hamiltonian is invariant under lattice translations, so it is natural to pass to the Bloch domain. The reciprocal lattice vectors are defined by the equation *b*_{i} · *a*_{j} = 2*πδ*_{ij} for 1 ≤ *i*, *j* ≤ 2. They are explicitly [see Fig. 1(b)]

We then define the reciprocal lattice Λ* and a fundamental cell of the reciprocal lattice Γ* (known in this context as the Brillouin zone) by

We write wave-functions in the Bloch domain $L2(\Gamma *;C2)$ as $\psi \u0303=\psi \u0303(k)k\u2208\Gamma *=\psi \u0303A(k),\psi \u0303B(k)k\u2208\Gamma *\u22a4$^{31} and the unitary Bloch transform $Hmono\u2192L2(\Gamma *;C2)$ and its inverse by

where |Γ*| denotes the Brillouin zone area. Note that with this convention, Bloch transforms are quasiperiodic with respect to Λ* in the sense that

The Hamiltonian is block diagonal in the Bloch domain, taking the form

The Bloch Hamiltonian *H*(** k**) is easily diagonalized, with eigenpairs

The functions $E\xb1:\Gamma *\u2192R$ are known as the Bloch band functions. The Bloch band functions are degenerate at the Dirac points [see Fig. 1(c)], defined (up to ** G** ∈ Λ*) by

Taylor-expansion around Dirac point ** K** in terms of

**=**

*q***−**

*k***shows that**

*K*^{32}

and hence,

where $\sigma =(\sigma 1,\sigma 2)\u22a4$ is the vector of Pauli matrices and *v*_{D} is the Fermi velocity. Taylor-expansion around Dirac point ** K**′ yields

leading to an identical expansion to (22) other than the sign change in *q*_{1}. We finally record the transformation of *H*(** k**) under the translation of

**in the reciprocal lattice,**

*k*### B. The tight-binding model of twisted bilayer graphene

Twisted bilayer graphene (TBG) consists of two graphene monolayers with a relative twist. Let *R*(*ζ*) denote the matrix that rotates counterclockwise by angle *ζ*,

Then, denoting the twist angle by *θ* > 0, the lattice vectors of each layer are defined by

where the first subscript denotes the layer. We can then define lattices and fundamental cells just as in the monolayer,

The site shifts are similarly rotated, and we allow for a uniform interlayer displacement $d$ so that

There are two especially natural choices of $d$ (see Fig. 2). When $d=0mod\Lambda $, the two untwisted graphene sheets are exactly aligned on top of each other. When $d=\xb1\tau Bmod\Lambda $, the untwisted graphene bilayer is said to be in the “Bernal stacking” configuration. This is, mechanically speaking, the most energetically favorable configuration for the untwisted system.^{27} We will see that the spectral properties of the effective model for the twisted bilayer (*θ* > 0) we derive are independent of $d$; see Lemma III.5. Note that this was already observed in Ref. 1.

We model wave functions of electrons in TBG as elements *ψ* of the Hilbert space $H\u2254\u21132(Z2;C2)\u2295\u21132(Z2;C2)=\u21132(Z2;C2)2$, writing $\psi =\psi 1,\psi 2\u22a4$, where $\psi i=\psi RiRi\u2208\Lambda i=\psi RiA,\psi RiBRi\u2208\Lambda i\u22a4$, *i* ∈ {1, 2}. It will sometimes be convenient to write, instead, $\psi =\psi R1,R2R1,R2\u2208\Lambda 1\xd7\Lambda 2$. The twisted bilayer graphene tight-binding Hamiltonian acts as

where the diagonal (intralayer) part of the Hamiltonian is the monolayer graphene Hamiltonian,

and the off-diagonal (interlayer) part is defined by (the 21 term is similar, so we omit it)

where *R*_{12} ≔ *R*_{1} − *R*_{2} and $\tau 12\sigma \sigma \u2032\u2254\tau 1\sigma \u2212\tau 2\sigma \u2032$. Here, *L* > 0 denotes the interlayer distance, while $h$ is the interlayer hopping function, which arises from the overlap of ionic (or Wannier) orbitals on each layer.^{33,34} In this work, we will always assume that $h$ is radial, i.e., a function of *r* ≔ |** r**| only, and make the standard abuse of notation to write $h(r;L)=h(r;L)$. More specifically, we will always assume that

*h*is a function of the

*three-dimensional*distance between sites so that

for some decaying function $h\u266f$ (see Fig. 3). Roughly speaking, the fact that the interlayer hopping function has form (32) allows us to assume that the two-dimensional Fourier transform of $h$ decays exponentially with a rate proportional to *L*. We postpone a discussion of the precise properties we require of $h$ (assumption II.1) and of the validity of these assumptions (see Proposition II.3 and Examples II.1–II.3) until Sec. II D, when we will nondimensionalize our model. There, we will also discuss possible relaxations of our assumptions on the form of the interlayer hopping function (Remark II.4).

Even though the interlayer hopping breaks translation symmetry^{35} of *H*, it is still useful to transform both layers to the Bloch domain. The reciprocal lattice vectors of each layer are defined by

and the reciprocal lattices and Brillouin zones are similarly given by

The ** K** and

**′ points of each layer are simply given by**

*K*We denote wave functions in the Bloch domain $L2(\Gamma 1*;C2)\u2295L2(\Gamma 2*;C2)$ by $\psi \u0303=\psi \u03031,\psi \u03032\u22a4$, where $\psi \u0303i=\psi \u0303i(ki)ki\u2208\Gamma i*=\psi \u0303iA(ki),\psi \u0303iB(ki)ki\u2208\Gamma i*\u22a4$, *i* ∈ {1, 2}. It will sometimes be convenient to write, instead, $\psi \u0303=\psi \u0303(k1,k2)(k1,k2)\u2208\Gamma 1*\xd7\Gamma 2*$. We then define a unitary Bloch transform $G\u2192L2(\Gamma 1*;C2)\u2295L2(\Gamma 2*;C2)$, and we define its inverse, componentwise, by

where

The Hamiltonian now acts as

where the diagonal terms are, as before, given by

Just as in the monolayer, the formal Taylor-expansion of *H*_{i}(*k*_{i}) around Dirac points *K*_{i} in terms of *q*_{i} = *k*_{i} − *K*_{i} yields effective Dirac operators. Specifically, we have

where

and hence,

where *v*_{D} is the monolayer Fermi velocity (22). The analogous calculation at the ** K**′ points of layer

*i*leads to the same results but with

*q*

_{i,1}replaced everywhere by −

*q*

_{i,1}.

A short calculation, presented in Appendix A, shows that the off-diagonal terms take the form

where we define the Fourier transform pair,^{36}

Note that the off-diagonal part of *H* is not block diagonal with respect to quasimomentum after Bloch transformation, reflecting the model’s lack of translation symmetry. The assumption that $h\u0302$ exponentially decays with rate proportional to *L* (Assumption II.1) because of the form (32) of $h$ will allow us to considerably simplify (44) in what follows.

### C. Moiré lattice, moiré reciprocal lattice, and moiré potential of twisted bilayer graphene

The moiré lattice, moiré reciprocal lattice, and moiré potential of TBG will play a fundamental role in the analysis of Secs. III and IV. The moiré reciprocal lattice vectors are [see Fig. 4(a)]

In terms of the distance between the Dirac points of the layers

the moiré reciprocal lattice vectors have the explicit forms,

We define the moiré reciprocal lattice and Brillouin zone in terms of *B*_{m} ≔ (*b*_{m,1}, *b*_{m,2}) as [see Fig. 4(b)]

The associated moiré lattice vectors, defined by the equation *b*_{m,i} · *a*_{m,j} = 2*πδ*_{ij} for 1 ≤ *i*, *j* ≤ 2, are

For small *θ*, we see that the moiré reciprocal lattice vectors are much longer than the single layer lattice vectors: |*a*_{m,i}| ≫ |*a*_{i}|, *i* ∈ {1, 2}. The moiré lattice and unit cell are then

where *A*_{m} ≔ (*a*_{m,1}, *a*_{m,2}). The monolayer ** K** points are known as the moiré

**and**

*K***′ points,**

*K*We introduce a notation for the momentum difference between the monolayer Dirac points and its rotations^{37} by $2\pi 3$ [see Fig. 4(a)],

These vectors all have length |Δ** K**| and can be written explicitly as

We emphasize that the moiré reciprocal lattice, moiré lattice, and momentum hops (53) all depend on *θ* through |Δ** K**|.

We finally introduce the continuum (matrix-valued) moiré potential, defined for *σ*, *σ*′ ∈ {*A*, *B*} and $r\u2208R2$, by

where

are the hopping matrices. Written out in full, the hopping matrices are

Note that when $d=\u2212\tau B$, we have $b1\u22c5d=b2\u22c5d=\u2212\varphi $ and we recover the hopping matrices appearing in Eq. (7) of Ref. 1. We provide a detailed motivation and derivation of the moiré potential in the Proof of Lemma IV.4. We will see that the $d$-dependent shifts in (57) can always be removed by conjugating by a unitary (Lemma III.5).

The moiré potential has two important and easily verifiable properties. It is periodic (up to a phase) with respect to the moiré lattice,

and its rate of oscillation is controlled by *θ*. The specific property we will require is the existence of a constant *C*_{T} > 0 such that for all *σ*, *σ*′ ∈ {*A*, *B*},

This is, of course, a reflection of the fact that the moiré cell becomes larger for smaller angles.

### D. Non-dimensionalization and assumptions on interlayer hopping function

We close this section by non-dimensionalizing the tight-binding model of twisted bilayer graphene in order to identify fundamental dimensionless parameters. In Secs. III and IV, we will identify the parameter regime where the Bistritzer–MacDonald model represents the dominant effective dynamics. We start from the time-dependent Schrödinger equation for $\psi (\tau ):[0,\u221e)\u2192\u21132(Z2;C2)2$,

It is natural to non-dimensionalize so that *v*_{D} = *a* = *ℏ* = 1. To this end, we introduce dimensionless time and position variables,

With this change, dividing (60) by the energy scale

leaves both sides dimensionless. Recall the definition of the interlayer hopping function (32). We introduce a dimensionless interlayer hopping function *h* by^{38}

where $\u2113\u2254La$ is the dimensionless ratio of the interlayer distance to the monolayer lattice constant and *h*^{♯} is a fixed, dimensionless, decaying function. We finally drop ′s and just write ** r** and

*τ*for the dimensionless spatial and temporal variables. Just as for $h$, we define the Fourier transform pair,

In what follows, we will often use a notation defined in Secs. II A–II C for dimensionful quantities, such as vectors in the monolayer lattice ** R** ∈ Λ, for the

*dimensionless*quantities obtained by setting

*a*= 1. We do this wherever it is unlikely to cause confusion. The exceptions are our notations for the non-dimensionalized Dirac points

**≔**

*κ*

*K**a*and non-dimensionalized momentum shifts $sn\u2254sna$, where we introduce a separate notation so that we can write expressions such as (5) and compare (2) and (3) without ambiguity.

The non-dimensionalized model in real space is, then, given by

where *H* is as in (29), with

In momentum space, the model is as in (38), where the diagonal terms are

and the off-diagonal terms are

Note that in the final model, there are just two dimensionless parameters.

*ℓ*, the ratio of interlayer distance to the monolayer lattice constant, and*θ*, the twist angle.

We will assume in what follows that these parameters are positive, i.e.,

This assumption on *ℓ* is natural. We make the assumption on *θ* without loss of generality since it is straightforward to generalize all our results to the case where *θ* < 0. We avoid the case *θ* = 0, where the moiré lattice is not well-defined. In this case, the system is straightforward to analyze since it is periodic with respect to the monolayer graphene lattice (see, e.g., Ref. 39).

We now discuss the assumptions we require on the interlayer hopping function *h*, starting with the following proposition, which follows immediately from (63).

*Let*

*r*≔ |

**|**

*r**and*

*ξ*≔ |

**|**

*ξ**, and the nondimensionalized interlayer hopping function*

*h*

*has the form*

*(63)*

*. Then, the Fourier transform of*

*h*

*is radial and can be calculated through the Hankel transform,*

*where*

*J*

_{0}(

*ζ*)

*is the Bessel function of the first kind,*

The two-dimensional Fourier transform of the interlayer hopping function can alternatively be calculated as the one-dimensional inverse Fourier transform of the three-dimensional Fourier transform of the interlayer hopping function.

*Let*

*r*,

*ξ*

*, and*

*h*

*be as in Proposition II.1. It is straightforward to see that the three-dimensional (with respect to*

*r**and*

*ℓ*

*) Fourier transform of*$h\u266fr2+\u21132$

*is*

*We then have that*

We now make our main assumptions on *h*.

*Let*

*r*≔ |

**|**

*r**and*

*ξ*≔ |

**|**

*ξ**. We assume that*

*h*

*has the form*

*(63)*

*for some*

*h*

^{♯}

*and that the Fourier transform*

*(64)*

*exists. We assume that*$h\u0302$

*can be bounded above and below by decaying exponentials, more specifically, that there exist constants*

*C*

_{1}

*,*

*C*

_{2}

*,*

*D*

_{1}

*,*

*D*

_{2}

*,*

*ℓ*

_{0}> 0

*, with*

*D*

_{1}≥

*D*

_{2}

*, such that*

*We assume further that the constants*

*D*

_{1}

*and*

*D*

_{2}

*satisfy*

^{40}*We finally assume that*$h\u0302$

*is Lipschitz continuous with a Lipschitz constant, which exponentially decays, more specifically, that there exists a constant*

*C*

_{3}> 0

*such that*

*where*$d0,\xi ,\xi \u2032$

*denotes the minimum distance between the straight line connecting*

*ξ**and*

**′**

*ξ**and the origin (see Remark II.2).*

We discuss briefly the importance of each part of assumption II.1. Radial symmetry of $h\u0302$ allows us to simplify the moiré potential (55) [see (D10)] and simplifies the Proof of Lemma IV.4. We expect that this assumption can be relaxed; see Remark II.4.

The lower bound in (76), together with the upper bound in (76) and inequality (77), guarantees that the interlayer terms corresponding to the momentum space hops (53) are rigorously larger, in regime (1), than all other momentum space hops arising from (70). To see this, note that the interlayer terms corresponding to the momentum space hops (53) are proportional to $h\u0302(|\kappa |;\u2113)$, while the next largest interlayer terms are bounded by $C2e\u22122D2\u2113|\kappa |$. Combining (76) with (77) ensures that

For more details, see Sec. IV, in particular the Proof of Lemma IV.4. The lower bound in (76) may fail in some more realistic models; see Remark II.4.

Finally, estimate (78) allows for bound (126) in the Proof of Lemma IV.4. This bound justifies an approximation where the momentum space interlayer hopping strength $h\u0302$, which is, in general, wavenumber-dependent, is treated as wavenumber-independent. The wavenumber-dependence of the interlayer hopping strength can be seen clearly in (44), where $h\u0302$ depends on *k*_{1}. The approximation can be seen clearly in (D5), where the un-approximated interlayer hopping is on the left-hand side and the approximated interlayer hopping is the first term on the right-hand side. This approximation is known in the physics literature as the “local” approximation; see, for example, Ref. 41.

Using a table of Hankel transform pairs,^{42} we find that assumption II.1 can be verified for some specific *h*. Note that the exponential decay of the interlayer hopping function in real space is to be expected when the tight-binding model has been defined through a basis of exponentially localized Wannier functions^{43,44} (see also Remark II.3).

*The function*

*satisfies assumption II.1, with*

*D*

_{2}= 1

*, any*

*ℓ*

_{0}> 0,

*and any*1 <

*D*

_{1}≤ 2

*. Its Fourier transform is*

Recall that hopping functions such as *h* arise from overlap integrals of ionic or Wannier orbitals.^{33,34} The following form of *h* is physically realistic assuming exponentially decaying orbitals; see Remark II.3.

*For any*

*α*> 0

*, the function*

*satisfies assumption II.1 for any*

*ℓ*

_{0}> 0

*and any pair*

*D*

_{1}> 1

*and*0 <

*D*

_{2}< 1

*such that*

*(77)*

*holds. Its Fourier transform is*

*For any*

*α*> 0

*, the function*

*satisfies assumption II.1 for any*

*ℓ*

_{0}> 0

*and any pair*

*D*

_{1}> 1

*and*0 <

*D*

_{2}< 1

*such that*

*(77)*

*holds. Its Fourier transform is*

Note that in all Examples II.1–II.3, we can take *D*_{1} = 1 + *ϵ* and *D*_{2} = 1 − *ϵ* for *any* *ϵ* > 0 so that (77) holds comfortably.

Examples II.1–II.3 suggest that assumption II.1 may hold for fairly general *h*^{♯}, with *D*_{1} = 1 + *ϵ* and *D*_{2} = 1 − *ϵ* for any *ϵ* > 0. However, we can prove this only for the upper bound in (76) by a straightforward application of Cauchy’s theorem (although see Remark II.1). Although the following proposition is surely well-known, we include a proof for completeness especially since it is so simple.

*Assume that*

*h*

^{♯}(

*ζ*)

*is analytic in*

*ζ*

*, except possibly at the origin, and satisfies the bound*

*for some fixed*

*C*,

*R*,

*δ*> 0

*. Then, for any fixed*0 <

*ϵ*< 1

*, there exists a constant*

*C*

_{ϵ}> 0

*such that*

**= (**

*ξ**ξ*,0)

^{⊤}, with

*ξ*> 0, in the Fourier transform formula (64), resulting in

*x*integral below the real axis using Cauchy’s theorem [using (86) to control the “ends” of the rectangle], we have that

*h*

^{♯}(86), the double integral converges, and we are done.□

The alternative formulation given in Proposition II.2 allows for the following.

*Assume that*$h\u030a\u266f(\zeta )$

*is analytic in*

*ζ*

*, except possibly at the origin, and satisfies the bound*

*for some fixed*

*C*,

*R*,

*δ*> 0

*. Then, for any fixed*0 <

*ϵ*< 1

*, there exists a constant*

*C*

_{ϵ}> 0

*such that*

The proof is exactly analogous to that of Proposition II.3, except that we push the *ω* integral up to $R+i(1\u2212\u03f5)\xi $.□

*The argument given in the Proof of Proposition II.3 could possibly be extended under additional assumptions on* *h*^{♯} *to give lower bounds or even to evaluate* $h\u0302(\xi ;\u2113)$ *explicitly. For example, when* *h* *is as in Example II.1, the integral over* *y* *in* *(88)* *is explicit, and the resulting integrand in* *x* *has a simple pole at* −*iℓ* *so that pushing the integral below the real axis leads immediately to* *(81)**. More generally, the integrand of* *(89)* *has a branch point at* −*iℓ**. In this case, one can fix the branch cut to point downward and then further deform the* *x* *integral of* *(89)* *into the integral either side of the branch cut. Such investigations do not seem to lead in a straightforward way to general conditions on* *h*^{♯} *guaranteeing lower bounds, so we do not consider this further in this work.*

*Note that* *(78)* *follows easily if* $h\u0302$ *is continuously differentiable everywhere and that the derivative decays exponentially with the distance from the origin (with rate proportional to* *ℓ**). We require the slightly more obscure statement* *(78)* *to allow for* $h\u0302$ *merely Lipschitz, as in* *(81)**.*

*We remark on rigorous results on the decay of orbitals (and hence of the interlayer hopping function). Precise exponential asymptotics of the hopping function in real space have been obtained by Helffer and Sjöstrand.*^{45} *Exponential upper and lower bounds on the hopping function have been proved in the context of a continuum model of graphene by Fefferman, Lee-Thorp, and Weinstein.*^{34} *These bounds were generalized to the setting of a strong magnetic field by Fefferman, Shapiro, and Weinstein.*^{46,47} *Exponential decay of Wannier orbitals has been established rigorously for time-reversal symmetric systems with an energy gap.*^{48–50} *These assumptions are satisfied for the lowest two bands of monolayer graphene, which are separated from higher bands by a gap. With the caveat that radial symmetry is certainly an idealization (see Remark II.4), these observations suggest Example II.2 as a physically reasonable form for the interlayer hopping function.*

*The assumption that* *h* *is radial, and, more specifically, a function of the three-dimensional distance* *(63)**, is very much an approximation to the physical* *h**. It would be more realistic to assume that* *h* *is a general decaying function, which respects the symmetries of monolayer graphene, i.e.,* $2\pi 3$*-rotation, inversion* (** r** ↦ −

**)**

*r**, and complex conjugation. Our results will go through, but will be more complicated to state and prove, as long as bounds*

*(76)–(78)*

*still hold for such*

*h*

*. Interestingly, the Fourier transforms of some realistic choices of*

*h*

*, especially those that account for mechanical relaxation of the bilayer, actually have zeros so that the lower bound in*

*(76)*

*fails (see Fig. 3 of Ref.*

*51*

*). It is actually unclear, in such cases, whether a similar reduction to effective dynamics is possible and if it is what the effective dynamics should be.*

*The lattice structure,* Λ_{i}*, for each layer has been observed to be significantly reconstructed at small twist angles to reduce its elastic energy,*^{27,52,53} *and the Hamiltonian must thus be appropriately modified*^{51,52,54} *to predict electronic properties accurately. It follows from the results in Ref.* *51 * *that for small twist angles, the decay rates* *D*_{i} ∼ *θ*/(*d*_{i} + *θ*) *for* *d*_{i} > 0*, but we can continue to expect that* 1 ≤ *D*_{1}/*D*_{2} < 2*. We will see in* Sec. 5*V* *that the unrelaxed Bistritzer–MacDonald model is sufficiently accurate to predict the first magic angle at* *θ* ∼ 1° *although a relaxed model is necessary to predict the correct bandgap between the nearly flat bands at the Fermi energy and the other moiré bands.*^{51,52,54} *Generalizing our rigorous theory to allow for lattice relaxation is the subject of ongoing work.*

## III. APPROXIMATE SOLUTIONS BY A SYSTEMATIC MULTIPLE SCALE ANALYSIS

In this section, we will generate approximate solutions of the time-dependent Schrödinger equation by a systematic multiple scale analysis. In Sec. IV, we will prove the convergence of these approximate solutions to exact solutions.

### A. Wave-packet *ansatz*

In this section, we will introduce the basic “wave-packet” *ansatz* we will use to generate approximate solutions. The non-dimensionalized time-dependent Schrödinger equation is (65). We assume the following “wave-packet” initial condition:

where *γ* > 0 is a dimensionless small parameter, which can be interpreted in a few (equivalent) ways. Most obviously, small *γ* enforces a separation of scales between the spatial variation of the envelope and plane wave parts of the wave-packet. Equivalently, *γ* measures the concentration of the wave-packet, in momentum space, relative to the monolayer ** K** points (see Appendix B 1). We can also, however, using the linear relationship between momentum relative to the

**point and energy defined by the monolayer Dirac dispersion relation (22), think of**

*K**γ*as measuring the spectral width of the wave function relative to the dimensionless energy scale 1 [which, in physical units, corresponds to $E$ (62)].

*In considering initial data of the form* *(92 ), we are restricting attention to initial data concentrated at the monolayer*

*K**points. This is natural because data concentrated at the*

*K**points are the most relevant for understanding the electronic properties of twisted bilayer graphene. The reason is that, at charge neutrality (i.e., in the absence of imposed gate voltage), the Fermi level*

^{55}*in twisted bilayer graphene lies exactly at the monolayer Dirac point energy. Thus, initial data concentrated at such points model low-energy excitations of the twisted bilayer graphene Fermi sea.*

*We expect that initial data concentrated elsewhere in the monolayer Brillouin zone [i.e., with* *κ*_{i} *in* *(92)* *replaced by other points in* $\Gamma i*$*] could be treated similarly, but would likely evolve according to quite different effective dynamics compared with* *(2)**. This is not surprising because it is already well-known that effective dynamics of wave-packets, even in periodic media, depend strongly on where the wave-packet is concentrated in momentum space (see, e.g., Refs.* *15 **,* *33 **,* *56 **, and* *57 **). Note further that if such an analysis was carried out for all points in the monolayer Brillouin zone, the dynamics of* non-concentrated *initial data could then be understood by superposing the dynamics of concentrated initial data.*

We make the following assumption on the initial amplitudes *f*_{i,0}, which, in particular, ensures that $\psi 0\u2208H$.

*We assume that the initial amplitudes have a bounded eighth Sobolev norm,*

*for some constant*$Cf0$

*.*

*Our results require bounded first and second Sobolev norms because the proofs of Lemmas IV.4 and IV.2 involve estimating first and second order Taylor series remainders in momentum space. Requiring bounded fourth Sobolev norms allows for estimate* *(D25)* *in the proof of estimate* *(126)* *of Lemma IV.4. We require higher Sobolev norms to be bounded in order to estimate the* $G\u03031\u22600$ *terms in* *(B16)* *by* *(B20)* *(and other similar terms); see also Remark IV.1.*

where

is the wave-packet *ansatz*,^{59} and *η*(*τ*) is the corrector. To summarize, the additional parameter introduced by our *ansatz* is

*γ*, the momentum space spread of the wave-packet relative to the inverse lattice constant.

In what follows, when we wish to suppress the sublattice degree of freedom, we will adopt the obvious shortened notations $fi=fiA,fiB\u22a4$, $fi,0=fi,0A,fi,0B\u22a4,i\u2208{1,2}$. When we also want to suppress the layer degree of freedom, we will write $f=f1,f2\u22a4$ and $f0=f1,0,f2,0\u22a4$. We will also simply evaluate the multiscale variables ** X**,

*T*, etc., whenever there is no danger of confusion.

Assuming that

the corrector satisfies

where *r* is known as the residual.

Recall that we denote the electronic wave function Hilbert space $\u21132(Z2;C2)2$ of the twisted bilayer by $H$. We will denote by $\Vert \u22c5\Vert H$ the standard $H$-norm, i.e.,

The following lemma shows that the $H$-norm of the corrector *η* can be bounded in terms of the $H$-norm of the residual *r*.

*Let*

*η*

*satisfy the IVP*

*(97)*

*. Then,*

The proof is a standard calculation using self-adjointness of *H*; see, for example, Sec. 3 C of Ref. 60.□

Lemma III.1 obviously implies that

and hence, the time-scale on which the wave-packet *ansatz* Ψ (95) approximates the solution of (65) *ψ* in the $H$-norm is controlled by the $H$-norm of the residual *r*. The key lemma of this work is the following because it identifies a parameter regime where the Bistritzer–MacDonald model emerges as a necessary condition for smallness of *r* beyond a certain order and, hence, as a necessary condition for convergence of Ψ to *ψ* over a longer time scale.

*Let assumptions II.1 and III.1 hold. Assume further that*

*for fixed positive constants*

*λ*

_{0},

*λ*

_{1}> 0

*. Introduce the scaled moiré potential [recall the definition of the moiré potential*

*(55)*]

*and scaled Bistritzer–MacDonald Hamiltonian,*

^{61}*Then, there exist constants*

*γ*

_{0},

*C*,

*c*> 0

*depending only on*$h\u0302$

*, the monolayer operator*

*(18)*

*, and the constants*

*λ*

_{0}

*and*

*λ*

_{1}

*such that for all*

*γ*≤

*γ*

_{0}

*, the residual defined by*

*(97)*

*satisfies*

*where*

*We remark on the origin of the scalings* *(101)**. The scaling of* *ℓ* *with respect to* *γ* *comes from balancing the leading terms of* *(121)* *and* *(125)**. This scaling forces the logarithmic growth of* *ℓ* *with respect to* *γ* *(132)**, which leads to a rapid decay of interlayer hopping terms* *(70)**, and finally a Bistritzer–MacDonald model* *(102)* *with only nearest-neighbor hopping in momentum space. The scaling of* *θ* *with respect to* *γ* *is essential so that the regularity of solutions of the BM model is preserved as* *γ* → 0 *(see Lemma III.3*)*. Informally, this requires that the scaled moiré potential* $T$ *appearing in* *(102)* *varies on scale* 1 *(as opposed to, for example, scale* *γ**). To see how the scaling of* *θ* *with respect to* *γ* *enforces this, note that for small* *θ*, *we have* |*s*_{n}| ≈ *θ* *so that we can consider* *T*_{θ}(** r**)

*as a function of*

*θ*

*r**so that*$T\theta X\gamma $

*can be considered a function of*$\theta X\gamma $

*. It is then clear that the condition for*$T$

*to vary at least over scale*1

*is precisely boundedness of the ratio*$\theta \gamma $

*.*

*The scaling*

*(101)*

*can be stated simply (but a little imprecisely) in terms of length-scales. Approximating*$h\u0302(|\kappa |;\u2113)\u2248e\u2212\u2113|\kappa |$

*(note that this is essentially what happens for each of Examples II.1–II.3), the scaling can be written as*

*where*$1\gamma $

*denotes the lengthscale of variation of the wave-packet envelope and*$1\theta $

*denotes the lengthscale of the moiré pattern.*

The key point of Lemma III.2 is that, formally at least, *the* $H$*-norm of the first term of (III.2) is proportional to* *γ**, while the* $H$*-norm of the second is* *o*(*γ*) *as* *γ* → 0*.* In particular, the residual is *o*(*γ*) as *γ* → 0 in regime (101) *if and only if the wave-packet envelopes evolve according to the Bistritzer–MacDonald model.* This is summed up by the following corollary.

*Under the hypotheses of Lemma III.2, if the wave-packet envelope functions*

*f*

*appearing in*

*(95)*

*evolve according to the Bistritzer–MacDonald model,*

*where*

*H*

_{BM}

*is as in*

*(102)*

*, then there exist constants*

*γ*

_{0},

*C*,

*c*> 0

*depending only on*$h\u0302$

*, the monolayer operator*

*(18)*

*, and the constants*

*λ*

_{0}

*and*

*λ*

_{1}

*such that for all*

*γ*≤

*γ*

_{0}

*, the residual appearing in*

*(97)*

*satisfies*

Note that starting from the scaled BM model (102) with (106), it is straightforward to derive the more familiar form of the BM model (2) as follows: First, change variables from ** X** and

*T*back to the dimensionless, unscaled, position, and time variables [recall (61)]

**and**

*r**τ*as

Then, multiply both sides of the equation by *γ* and use $h\u0302(|\kappa |;\u2113)=\lambda 0\gamma $ (101).

The argument so far is purely formal since we have not discussed the existence and uniqueness theory for (106) nor whether the Sobolev norms of solutions can be controlled. These issues are addressed by the following lemma.

*Consider the general initial value problem for the BM PDE*

*(106)*

*, i.e.,*

*where*$f0\u2208H1(R2;C4)$

*. Then, the following facts hold:*

*The IVP**(109)**has a unique solution**f**satisfying*(110)$\Vert f(\u22c5,T)\Vert L2(R2;C4)=\Vert f0\Vert L2(R2;C4),\u2200T\u22650.$*The IVP**(109)**preserves the regularity of**f*_{0}*in the sense that there exist constants**C*_{s}> 0*depending on**s*,*λ*_{0}*, and**λ*_{1}*such that*(111)$\Vert f(\u22c5,T)\Vert Hs(R2;C4)\u2264Cs\Vert f0\Vert Hs(R2;C4),\u2200T>0,$*for each positive integer**s**.*

See Appendix E.□

We emphasize that Eq. (111) of Lemma III.3 relies on the following fact. Indeed, it is the reason why *θ* must be scaled with *γ* in (101) for our results to hold.

The result follows immediately from (59).□

We can now state and prove the main theorem of this work.

*Under the hypotheses of Lemma III.2, the solution*

*ψ*

*of*

*(65)*

*, where*

*ψ*

_{0}

*is as in*

*(92)*

*, satisfies*

*where*

*i*∈ {1, 2},

*σ*∈ {

*A*,

*B*}

*, and the envelopes*

*f*

*evolve according to the BM model*

*(106)*

*, and there exist constants*

*C*,

*γ*

_{0}> 0

*such that for all*

*γ*<

*γ*

_{0}

*and*

*c*> 0

*as in Lemma III.2,*

*It follows that for any fixed*0 ≤

*ν*<

*c*

*and*

*C*

_{1}> 0

*, the corrector also satisfies*

*Fefferman and Weinstein*^{15} *considered wave-packet propagation in a graphene monolayer modeled by a continuum PDE, obtaining the timescale of validity* *τ* = *O*(*γ*^{−2+ν}) *for any* *ν* > 0*. Recalling Remark III.6, when* *h* *is as in Examples II.1–II.3, we recover this timescale of validity.*

We finish this section by proving two simple lemmas, which establish that (equivalent) BM models (102), (2), and (3) are invariant under arbitrary interlayer displacements $d$ (28) (see Fig. 2) and under translation by moiré lattice vectors (51). In the first case, we will prove that for any fixed interlayer displacement $d\u22600$, there exists a unitary operator $U(d)$, which conjugates the Hamiltonian to the same model with $d=0$. Note that a similar calculation was already given by Bistritzer and MacDonald.^{1}

*Let*

*H*

_{BM}

*be as in*

*(102)*

*, where the momentum hops, moiré potential, and hopping matrices are as in*

*(53)*

*,*

*(55)*

*, and*

*(57)*

*, respectively. Let*

*S*

_{w}

*f*(

**) ≔**

*X**f*(

**−**

*X***)**

*w**and*$w\u2254\gamma (b1\u22c5d)am,1+(b2\u22c5d)am,2$

*. Then,*

*where*$HBM\u0303$

*is exactly as in*

*(102)*

*,*

*(53)*

*,*

*(55)*

*, and*

*(57)*

*, but with*$d=0$

*.*

The second lemma, whose proof is a straightforward calculation using (58), establishes that the BM model Hamiltonian commutes with translation operators in the moiré lattice as long as the translation operators are modified by appropriate phase shifts. The origin of these phase shifts is the non-zero distance between the monolayer ** K** points and the fact that the monolayer

**points are chosen as the momentum space origins in both layers. We include the lemma for completeness and because of its importance in allowing the spectrum of**

*K**H*

_{BM}to be expressed in terms of “moiré” Bloch bands.

^{1}

The translation and rotation symmetries of *H*_{BM} are also discussed in the supplementary material of Ref. 22.

*We remark on potential generalizations of our results other than relaxing assumption II.1, as discussed in Remark II.4.*

*First, it should not be necessary to restrict to the nearest-neighbor tight-binding model of graphene* *(12)**. The Dirac points of graphene are protected by symmetry,*^{62,63} *and so it should be possible to carry through our derivation with* *(12)* *replaced by a general Hamiltonian with exponentially decaying (in real space) hopping, which respects* $2\pi 3$*-rotation, inversion* (** r** ↦ −

**)**

*r**, and complex conjugation symmetries.*

*Second, it should be possible to push the time-scale of the validity of the wave-packet approximation to higher orders in* *γ*^{−1} *by including corrections to ansatz* *(95)* *in the form of a formal series in powers of* *γ**. It would be interesting to understand the effective dynamics over these longer time-scales. We expect that they will involve longer range momentum space hopping than the nearest-neighbor momentum shifts* *(53)**.*

## IV. PROOF OF LEMMA III.2

In this section, we prove Lemma III.2. The structure of the proof is as follows. In Sec. IV A, we state a sequence of lemmas, whose proofs are given in Appendixes B– D. The lemmas write *r* = −(*i∂*_{τ} − *H*)Ψ as a sum of (formally) “leading order” and “higher order” terms. In Sec. IV B, we identify the parameter regime where the leading order terms balance and every higher order term is smaller in order to complete the Proof of Lemma III.2.

### A. Simplification of residual *r* = −(*i∂*_{τ} − *H*)Ψ

We start with the action of *i∂*_{τ}, which is very simple. Because of the scaling of the wave-packet envelope, the time derivative acting on the wave-packet ends up being proportional to *γ*.

We now consider the action of *H* on the wave-packet. The following lemma decomposes the action of the diagonal parts of *H* into a leading order part, proportional to *γ* and with the form of a Dirac operator, and a higher-order term *r*^{I}, which is formally $O\gamma 2$. This term is simply the remainder from Taylor-expanding the monolayer Bloch Hamiltonian up to linear order at the Dirac points.

*Let*

*H*

*be as in*

*(66)*

*and*Ψ

*be as in*

*(95)*

*. Then, for*

*i*∈ {1, 2}

*and*

*σ*∈ {

*A*,

*B*}

*,*

*where*

*θ*

_{i}

*is as in*

*(42)*

*, and there exists a constant*

*C*> 0

*, depending only on the monolayer operator*

*(18)*

*, such that*

See Appendix B.□

*The higher-order term appearing in* *(122)* *arises from estimating the* $G\u03031\u22600$ *terms in* *(B16)* *by* *(B20)**. Note that similar terms appear in* *(124)**,* *(126)**, and* *(127)**, with very similar origins. Under stronger regularity assumptions, these terms can be made arbitrarily high order in* *γ* *[for example, the estimate in* *(122)* *can be improved to* $\gamma 4+s\Vert fi(\u22c5,\gamma \tau )\Vert H6+2s(R2;C2)$ *for any* *s* ≥ 0*].*

The following lemma shows that the diagonal part of *H* can be further decomposed into a *θ*-independent part proportional to *γ* and a *θ*-dependent part, which is *O*(*γθ*).

*Let*

*H*

*be as in*

*(66)*

*and*Ψ

*be as in*

*(95)*

*. Then, for*

*i*∈ {1, 2}

*and*

*σ*∈ {

*A*,

*B*}

*,*

*where*

*r*

^{I}

*is as in Lemma IV.2, and there exists a constant*

*C*> 0

*such that*

See Appendix C.□

We now consider the action of the off-diagonal terms of *H*. The following lemma decomposes this action into a leading order part and higher order terms *r*^{III} and *r*^{IV}. The leading order part is proportional to $h\u0302(|\kappa |;\u2113)$ and describes interlayer coupling through the moiré potential. The higher order terms *r*^{III} and *r*^{IV} come from Taylor-expanding the interlayer hopping function about the wave-packet center and neglecting momentum space coupling terms other than three dominant terms, respectively. Note that we consider *H*_{12}Ψ_{2} without loss of generality since *H*_{21}Ψ_{1} could be simplified by exactly the same steps.

*Let*

*H*

*be as in*

*(66)*

*and*Ψ

*be as in*

*(95)*

*. Recall the hopping matrices*

*T*

_{j}

*(56)*

*and the momentum hops*$sj$

*(53),*$1\u2264j\u22643$

*. Then, for*

*σ*∈ {

*A*,

*B*}

*,*

*where for any*

*ℓ*

_{1}> 0

*and*

*M*

_{0}> 1

*, there exist constants*

*C*

_{1}> 0

*and*

*C*

_{2}> 0

*depending only on*

*ℓ*

_{1},

*M*

_{0}

*, and*$h\u0302$

*such that for all*

*ℓ*>

*ℓ*

_{1}

*and*

*M*>

*M*

_{0}

*,*

See Appendix D.□

### B. Identification of parameter regime where the BM model is dominant effective dynamics

In this section, we complete the Proof of Lemma III.2. We start by identifying the regime where the leading-order terms in (120), (121), and (125) balance. This occurs when

for some fixed, positive constant *λ*_{0} > 0. When (128) holds, the leading-order terms in (120), (121), and (125) are all proportional to *γ*, and the residual can be written as

where

where we write *A*^{†} for the conjugate transpose of a matrix *A* and *r*^{I}, *r*^{III}, and *r*^{IV} are as in (122) and (125), respectively. Assuming for a moment that all Sobolev norms of the envelope functions $fi\sigma $ can be bounded independently of all parameters and *τ*, the orders of these terms are

We will choose *M* > 0 so that these errors balance. First, note that (76), together with (128), implies that

It then follows easily that

and that

We can roughly balance these errors by taking

Using (77), we then have the existence of constants *C* > 0 and *c* > 0 such that

for all *γ* sufficiently small.

Estimate (136), together with (129), brings us close to the result of Lemma III.2. However, the result, so far, is not sufficient for our purposes. Specifically, it is easy to see that, without further assumptions, part (111) of Lemma III.3 will not hold uniformly in *γ* when *H*_{BM} is replaced by $HBM0$ [as in (130)]. The problem is that for fixed *θ* and *γ* → 0, the derivatives of the scaled moiré potential grow without bound [cf. (59)],

To avoid this, we assume that

introducing another positive constant *λ*_{1} > 0. However, now, we can use Lemma IV.3 to drop the *θ*-dependence of the intralayer terms in $HBM0$ (130) so that we can now write the residual as

where *H*_{BM} is as in (102), and

*It is worth noting that*

*(136)*

*and*

*(141)*

*are worst-case estimates. In fact, for each of Examples II.1–II.3, the constants*

*D*

_{1},

*D*

_{2}

*can be taken such that*

*D*

_{1}= 1 +

*ϵ*

*and*

*D*

_{2}= 1 −

*ϵ*

*for any*0 <

*ϵ*

*. In such cases, we have the significantly better bounds,*

*and*

*for any*

*c*′ > 0

*.*

## V. BISTRITZER–MACDONALD’S FORMAL DERIVATION OF THE FIRST MAGIC ANGLE

In this section, we review the formal, analytical, derivation of the first “magic angle” of Bistritzer and MacDonald of TBG.^{1} We start by recalling the Fourier transform pair,

acting componentwise on vector functions. Recall the BM Hamiltonian, in physical units,

where *v* ≔ *ℏv*_{D} and $w\u2254h\u0302(|K|;L)|\Gamma |$. We can write $HBM$ in the Fourier domain as

where $Swf\u0302(\xi )\u2254f\u0302(\xi \u2212w)$ denotes the shift operator. The idea is to consider a drastically truncated version of (106), defined by

The operator (147) can be understood as $H\u0302BM$ projected onto a basis consisting of modes with momentum ** ξ** on layer 1, together with modes at momenta

**−**

*ξ*

*s*_{j},

*j*∈ {1, 2, 3}, on layer 2.

We start by considering the limit without interlayer coupling, i.e., *w* = 0. In this case, we have

so that the spectrum as a function of ** ξ** is the union of cones centered at

*s*_{j},

*j*∈ {1, 2, 3}, i.e.,

In particular, 0 is exactly twofold degenerate when ** ξ** =

**0**.

We now turn on interlayer coupling, setting *w* > 0. We will first show that the two-fold degeneracy of the eigenvalue 0 at ** ξ** =

**0**is unbroken by the perturbation by explicitly constructing the 0 eigenspace. Let $\Phi \u2254\Phi 0,\Phi 1,\Phi 2,\Phi 3\u22a4$. The equation

is a system of four equations. Three of these equations have the solutions

It is straightforward to check that

Remarkably, substituting (151) into the fourth equation of (150) yields a trivial equation for Φ_{0} because direct calculation shows that

We conclude that 0 is a twofold degenerate eigenvalue of $H\u0302BM,trunc$.

We now study how the twofold degenerate 0-eigenspace of $H\u0302BM,trunc$ is perturbed by turning on non-zero ** ξ**. We start by fixing an (unnormalized) basis of this degenerate eigenspace as

for *i* ∈ {1, 2}. Direct calculation shows that

so that a normalized basis of the degenerate eigenspace is given by $\Phi i\u2032\u2254\Phi i1+6\alpha 2$, *i* ∈ {1, 2}. We now calculate the projection of $H\u0302BM,trunc$ onto the degenerate eigenspace as

Further direct calculation shows that (156) can be simplified to the form of the monolayer Dirac Hamiltonian (22), with a re-scaled Fermi velocity

Bistritzer and MacDonald called *θ* such that *ρ* vanishes a “magic angle.” Using the physical values^{1}

and the approximation

we have that *θ* is a magic angle if

## ACKNOWLEDGMENTS

A.B.W.’s research was supported, in part, by NSF DMREF under Award No. 1922165. M.L.’s research was supported, in part, by NSF under Award No. DMS-1906129. The research of M.L., A.H.M., and T.K. was supported, in part, by the Simons Targeted Grant under Award No. 896630. The authors thank Eric Cancès, Diyi Liu, and Max Engelstein for stimulating discussions.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Alexander B. Watson**: Conceptualization (lead); Investigation (lead); Methodology (lead); Visualization (equal); Writing – original draft (lead); Writing – review & editing (equal). **Tianyu Kong**: Investigation (equal); Visualization (lead); Writing – original draft (equal); Writing – review & editing (equal). **Allan H. MacDonald**: Conceptualization (equal); Funding acquisition (lead); Investigation (equal); Project administration (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal). **Mitchell Luskin**: Conceptualization (equal); Funding acquisition (lead); Investigation (equal); Project administration (lead); Supervision (lead); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available within the article.

### APPENDIX A: DERIVATION OF EQ. (44)

By definition,

Writing $h$ as an inverse Fourier transform and exchanging the order of summation and integration, (A1) simplifies to

Using the Poisson summation formula,

we have

Exchanging the order of summation and integration again and carrying out the integration, (A4) simplifies to

from which (44) immediately follows using $|\Gamma *|(2\pi )2=1|\Gamma |$.

### APPENDIX B: PROOF OF LEMMA IV.2

We organize the Proof of Lemma IV.2 as follows. First, we will review formulas for the wave-packet *ansatz* (95) in Appendix B 1. Then, we will separate the momentum space expression for *H*_{11}Ψ_{1} into “leading” and “higher-order” terms in Appendix B 2. We will then estimate the higher-order terms in Appendix B 3 and transform the leading terms back to real space in Appendix B 4.

#### 1. Wave-packet *ansatz* in momentum space

In this section, we record formulas for the wave-packet *ansatz* in momentum space, which will be important in the remainder of this appendix and in Appendixes C and D,

With respect to *q*_{i} = *k*_{i} − *κ*_{i}, the initial condition and *ansatz* take the following simpler forms:

where *q*_{i} varies over the translated Brillouin zones $\Gamma i*\u2212\kappa i,i\u2208{1,2}$. Note that we will generally simply evaluate the multiscale variables $Ki$ and $Qi$ when there is no danger of confusion.

*Note that we work with Fourier transforms of wave-packet amplitudes, rather than with their continuum Bloch transforms. Although this leads to formulas involving sums over reciprocal lattice vectors, we find this less cumbersome than introducing a continuum Bloch transform with respect to reciprocal lattice vectors scaled by* *γ**.*

#### 2. Separation of terms into leading and higher order in momentum space

Since simplifying *H*_{22}Ψ_{2} is so similar, we concentrate on simplifying *H*_{11}Ψ_{1}. This is equivalent to simplifying the momentum space expression,

The idea is to Taylor expand *H*_{1}(*κ*_{1} + *q*_{1}) in *q*_{1} and then transform back to real space. First, recall that, using (24), we have

It follows that

where $r\u03031I$ is exactly

We then define

#### 3. Estimation of “higher-order” terms

We now prove (122). Note, first, that for any $G1\u2208\Lambda 1*$, we have

where *C* > 0 is two times the supremum of the second derivatives of *H*_{1}, which are uniformly bounded because *H*_{1} is smooth and periodic in *q*_{1}. We can now estimate using the triangle inequality,

and then (B11),

Writing $q1\u2212G1\u2032=q1\u2212G1+G1\u2212G1\u2032$ and then changing variables in the sum over $G1\u2032$ to $G\u03031=G1\u2032\u2212G1$, we obtain

Now, we can replace the sum over *G*_{1} with an integral over all $R2$ as

Changing variables in the integral gives

We now split the sum into the $G\u03031=0$ and $G\u03031\u22600$ terms,

We now claim that the sum of the $G\u03031\u22600$ terms is higher-order in *γ*, assuming sufficient regularity of envelope functions. For fixed $G\u03031\u22600$, we can partition $R2$ into half-planes $R+2G\u03031\gamma $ and $R\u22122G\u03031\gamma $, where

by splitting $R2$ by the line that bisects $G\u03031\gamma $ (i.e., the line that passes through $G\u030312\gamma $ perpendicular to $G\u03031$). We now consider the integral over one of these regions (the other is similar),

Multiplying and dividing by |*q*_{1}|^{4} and using the first inequality of (B18), we obtain

where we used the Cauchy–Schwarz theorem for the final inequality. Combining this with the analogous estimate for the integral over $R\u22122$, we have

Simplifying this, we end up with the estimate

from which (122) follows via the equivalence of finite-dimensional norms.

#### 4. Transforming leading terms back to real space

back to real space. Taking the inverse Bloch transform and inserting the Fourier transform formula, the expression becomes

Now, noting that $eik1\u22c5R1=ei(k1\u2212G1)\u22c5R1$ for any *R*_{1} ∈ Λ_{1} and $G1\u2208\Lambda 1*$, we can write all dependence on *k*_{1} in terms of dependence on *k*_{1} − *G*_{1} and then re-write the sum over *G*_{1} and integral over *k*_{1} as an integral over $R2$ as

Now, using the identity

and then integrating over ** X** yield the leading term of (121).

### APPENDIX C: PROOF OF LEMMA IV.3

The proof is straightforward. Substituting

Following the same steps as in Appendix B 3 together with the Taylor-expansion of exponentials *e*^{±iθ/2} with respect to *θ* results in

Just as in Appendix B 3, the leading term comes from the $G\u03031=0$ term and the $G\u03031\u22600$ terms provide a correction,

from which (124) follows.

### APPENDIX D: PROOF OF LEMMA IV.4

We organize the Proof of Lemma IV.4 into sections just like the Proof of Lemma IV.2. First, we will separate the momentum space expression for *H*_{12}Ψ_{2} into “leading” and “higher-order” terms in Appendix D 1. We will then estimate the higher-order terms in Appendix D 2 and transform the leading terms back to real space in Appendix D 3.

#### 1. Separation of terms into leading and higher order in momentum space

In momentum space, $(H12\Psi 2)\sigma $ takes the form [recall (70)]

Substituting the wave-packet *ansatz* (B2), we have

Replacing the sum over *G*_{2} with a sum over $G\u03032=G2+G2\u2032$ and then dropping the tilde, we have

However, now, we can replace the sum over $G2\u2032$ and integral with respect to *q*_{2} over $\Gamma 2*\u2212\kappa 2$ with an integral over $R2$. Carrying out this integral, we have

Making the substitution

puts the off-diagonal terms into the form

where

Now, recall that $h\u0302$ decays (assumption II.1). It follows that the dominant terms in sum (D6) are those that minimize |*κ*_{2} + *G*_{2}|, i.e., those with *G*_{2} = **0**, *G*_{2} = *b*_{2,2}, and *G*_{2} = −*b*_{2,1}, which all give |*κ*_{2} + *G*_{2}| = |** κ**|. It is straightforward to see that the next largest term will have |

*κ*_{2}+

*G*_{2}| = 2|

**|. Hence, we can further simplify the off-diagonal terms as**

*κ*where

We now simplify further. First, using the radial symmetry of $h\u0302$ (assumption II.1), we have

Then, note that we can replace *G*_{1} by −*G*_{1}, *b*_{1,2} − *G*_{1}, and −*b*_{1,1} − *G*_{1} in the first, second, and third terms of (D8), respectively. Each term can then be written simply in terms of the vectors (53) as

This expression can be written more simply in terms of the hopping matrices (56) as

We define

#### 2. Estimation of “higher-order” terms

We now prove (126), starting from

Using the triangle inequality, we have

Now, using assumption II.1, we have

Taking the sup over *σ*′, *σ*″, writing $q1+G1\u2032=q1+G1+G\u03031$, $G\u03031\u2254G1\u2032\u2212G1$, and then combining the sum over *G*_{1} with the integral over *q*_{1} yield

Changing variables in the integral $q1\u2192\kappa 1\u2212\kappa 2\u2212G2+q1\gamma $ and introducing $G\u03032\u2254G2\u2032\u2212G2$ yield

We now split the sum over $G\u03031$ as $\u2211G\u03031\u2208\Lambda 1*,|G\u03031\u2212G\u03032|\u2264L+\u2211G\u03031,|G\u03031\u2212G\u03032|>L$ for some fixed *L* > 0 fixed sufficiently small such that the first sum has at most one term. We start by bounding this term. The expression to be estimated is

Using the Cauchy–Schwarz theorem, this expression can be estimated by

Note that the summand is independent of $G\u03031$, so we can drop this sum, and note that, by changing variables in the sum over $G\u03032$, we can write the expression as a square,

Let us now consider the *L*^{2} norm in (D21),

We can split the integral into integrals over the regions $|q1|\u22641\gamma \kappa 2+G2M$ and $|q1|>1\gamma \kappa 2+G2M$ where *M* > 1 but is otherwise arbitrary and will be fixed later to optimize the estimate. In the first integration region, we have

and so, using (78), we have

where *C* > 0 is a constant depending only on $h\u0302$ and *D*_{2} is as in assumption II.1. The second integral can be bounded using the decay of $f\u03022\sigma \u2032$ as

We can thus bound (D21) by

and, then, using the equivalence of finite-dimensional norms, by

Bounding the sums by

we can finally bound (D19) by

It remains to bound the other terms in (D18), i.e., to bound the sum

The idea to bound these terms is just as in Eq. (B18). We split the plane into two half-planes as

By this procedure, we can bound (D30) by

Again, the summand is independent of $G\u03031$, so we can subsume that sum into the constant *C* > 0. We are then back to the setting of (D20). Following the same steps, we can bound (D33) by

It will be convenient in what follows to simplify this expression and then use assumption II.1 to factor out $h\u0302(|\kappa |;\u2113)2$ to obtain

from which (126) follows.

We now prove (127). After applying the triangle inequality, the quantity to bound is

We follow similar steps as in Appendixes B and C. We replace the integral over $\Gamma 1*$ and sum over *G*_{1} by an integral over $R2$, introduce $G\u03031\u2254G1\u2032\u2212G1$, and then change variables $q1\u2192\kappa 1\u2212\kappa 2\u2212G2+q1\gamma $ to obtain