In this paper, we consider the defocusing Hartree nonlinear Schrödinger equations on T3 with real-valued and even potential V and Fourier multiplier decaying such as |k|β. By relying on the method of random averaging operators [Deng et al., arXiv:1910.08492 (2019)], we show that there exists β0, which is less than but close to 1, such that for β > β0, we have invariance of the associated Gibbs measure and global existence of strong solutions in its statistical ensemble. In this way, we extend Bourgain’s seminal result [J. Bourgain, J. Math. Pures Appl. 76, 649–702 (1997)], which requires β > 2 in this case.

In this paper, we study the invariant Gibbs measure problem for the nonlinear Schrödinger (NLS) equation on T3 with Hartree nonlinearity. Such an equation takes the form


where V is a convolution potential. We will assume that it satisfies the following properties:

  1. V is real-valued and even, and so is V^.

  2. (1.1) is defocusing, i.e., V ≥ 0.

  3. V acts like β antiderivatives, i.e., V^(0)=1 and |V^(k)|kβ for some β ≥ 0.

A typical example for such V is the Bessel potential ⟨∇⟩β of order β on T3, which can be written in the form (for some c > 0) V(x) = c|x|−(3−β) + K(x) for 0 < β < 3 and xT3\{0}, where K is a real-valued smooth function on T3. Note that when V is the δ function (and β = 0), we recover the usual cubic NLS equation. Our main result (see Theorem 1.3) establishes invariance of the Gibbs measure for (1.1) when V is the Bessel potential of order β, β < 1, and is close enough to 1, greatly improving the previous result of Bourgain,7 which assumes that β > 2 (see also Remark 1.4).

Equation (1.1) can be viewed as a regularized or tempered version of the cubic NLS equation, and both naturally arise in the limit of quantum many-body problems for interacting bosons (see, e.g., Refs. 21 and 33 and references therein). An important question, both physically and mathematically, is to study the construction and dynamics of the Gibbs measure for (1.1), which is a Hamiltonian system.

1. Gibbs measure construction

The Gibbs measure, which we henceforth denote by dν, is formally expressed as


where H[u] is the renormalization of the Hamiltonian,


Rigorously making sense of (1.2) is closely linked to the construction of the Φ34 measure in quantum field theory, which has attracted much interest since the 1970s and 1980s1,20,22,26,27,31 and in recent years.3,4,21,32 In the case of (1.1), the answer actually depends on the value of β. When36 β > 1/2, the measure dν can be defined as a weighted version of the Gaussian measure dρ, namely,


where :|u|2(V * |u|2): is a suitable renormalization of the nonlinearity [see (1.12) for a precise definition] and the Gaussian free field dρ is defined as the law of distribution for the random variable,37 


with {gk(ω)} being i.i.d. normalized centered complex Gaussians. On the other hand, if 0 < β ≤ 1/2, then dν is a weighted version of a shifted Gaussian measure, which is singular with respect to dρ1. These results were proved recently by Bringmann11 and Oh, Okamoto, and Tolomeo28 by adapting the variational methods of Barashkov and Gubinelli.3 

We remark that, in either case mentioned above, it can be shown that the Gibbs measure dν is supported in H1/2(T3), the same space as dρ1. In particular, the typical element in the support of dν has infinite mass, which naturally leads to the renormalizations in the construction of dν alluded above; see Sec. I B. From the physical point of view, it is also worth mentioning that, in the same way (1.1) is derived from quantum many-body systems, the Gibbs measure dν, with the correct renormalizations, can also be obtained by taking the limit of thermal states of such systems, at least when V is sufficiently regular (see Refs. 21 and 32).

2. Gibbs measure dynamics and invariance

Of the same importance as the construction of the Gibbs measure is the study of its dynamics and rigorous justification of its invariance under the flow of (1.1). The question of proving invariance of Gibbs measures for infinite dimensional Hamiltonian systems, with interest from both mathematical and physical aspects, has been extensively studied over the last few decades. In fact, it is the works of Refs. 5, 6, and 26—which attempted to answer this question in some special cases—that mark the very beginning of the subject of random data partial differential equations (PDEs).

The literature is now extensive, so we will only review those related to NLS equations. After the construction of Gibbs measures in Ref. 26, the first invariance result was due to Bourgain,5 which applies in one dimension for focusing sub-quintic equations and for defocusing equations with any power nonlinearity. Bourgain6 then extended the defocusing result to two dimensions, but only for the cubic equation; the two-dimensional case with arbitrary (odd) power nonlinearity was recently solved by the authors.17 For the case of Hartree nonlinearity (1.1) in three dimensions, Bourgain7 obtained invariance for β > 2. We also mention the works of Tzvetkov34,35 and of Bourgain and Bulut,8,9 which concern the NLS equation inside a disk or ball, the construction of non-unique weak solutions by Oh and Thomann30 following the scheme in Refs. 2, 13, and 15, and the relevant works on wave equations.11,12,14,28,29 In particular, the recent work of Bringmann12 established Gibbs measure invariance for the wave equation with the Hartree nonlinearity (1.1) for arbitrary β > 0.

The main mathematical challenge in proving invariance of Gibbs measure is the low regularity of the support of the measure, especially in two or more dimensions. For example, for the two-dimensional NLS equation with power nonlinearity, the support of the Gibbs measure dν lies in the space of distributions H0(T2), while the scaling critical space is H1/2(T2) for the quintic equation and approaches H1(T2) for equations with high power nonlinearities. This gap is a major reason why the two-dimensional quintic and higher cases have remained open for so many years. In the case of (1.1), a similar gap is present, namely, between the support of dν at H1/2(T3) and the scaling critical space H(1β)/2(T3), which is higher than H0(T3) with β < 1.

On the other hand, it is known since the pioneering work of Bourgain6 that with the random initial data, one can go below the classical scaling critical threshold and obtain almost-sure well-posedness results. In the recent works17,18 of the authors, an intuitive probabilistic scaling argument was performed. This leads to the notion of the probabilistic scaling critical index spr ≔ −1/(p − 1), which is much lower than the classical scaling critical index scr≔ (d/2) − 2/(p − 1) in the case of pth power nonlinearity in d dimensions. In Ref. 18, we proved that almost-sure local well-posedness indeed holds in Hs in the full probabilistic subcritical range when s > spr, in any dimensions, and for any (odd) power nonlinearity.

For the case of (1.1), a similar argument as in Refs. 17 and 18 yields that the probabilistic scaling critical index for (1.1) is spr = (−1 − β)/2, which is lower than −1/2, so it is reasonable to think that almost-sure well-posedness would be true. However, the situation here is somewhat different from that in Refs. 17 and 18 due to the asymmetry of the nonlinearity (1.1) compared to the power one, which leads to interesting modifications of the methods in these previous works, as we will discuss in Sec. I C.

3. Probabilistic methods

The first idea in proving almost-sure well-posedness was due to Bourgain6 and Da Prato and Debussche,15 the latter in the setting of parabolic stochastic partial differential equations (SPDEs), which can be described as a linear–nonlinear decomposition. Namely, the solution is decomposed into a linear term, random evolution (or noise) term, and a nonlinear term that has strictly higher regularity, thanks to probabilistic cancellations in the nonlinearity. If the linear term has regularity close to scaling criticality, then the nonlinear term can usually be bounded sub-critically; hence, a fixed point argument applies. However, this idea has its limitations in that the nonlinear term may not be smooth enough, and in practice, it is usually limited to slightly supercritical cases (relative to deterministic scaling) and does not give optimal results.

In Ref. 17, inspired partly by the regularity structure theory of Hairer and the para-controlled calculus by Gubinelli, Imkeller, and Perkowski in the parabolic SPDE setting, we developed the theory of random averaging operators. The main idea is to take the high–low interaction, which is usually the worst contribution in the nonlinear term described above, and express them as a para-product-type linear operator—called the random averaging operator—applied to the random initial data. Moreover, this linear operator is independent of the initial data it applies to and has a randomness structure, which includes the information of the solution at lower scales; see Sec. I C. This structure is then shown to be preserved from low to high frequencies by an induction on scales argument and eventually leads to improved almost-sure well-posedness results. We refer the reader to Ref. 33 for an example of a recent application of the method of random averaging operators of Ref. 17 to weakly dispersive NLS.

In Ref. 18, the random averaging operators are extended to the more general theory of random tensors. In this theory, the linear operators are extended to multilinear operators, which are represented by tensors, and whole algebraic and analytic theories are then developed for these random tensors. For NLS equations with odd power nonlinearity, this theory leads to the proof of optimal almost-sure well-posedness results; see Ref. 18. We remark that, while the theory of random tensors is more powerful than random averaging operators, the latter has a simpler structure, is less notation-heavy, and is already sufficient in many situations (especially if one is not very close to probabilistic criticality).

Finally, we would like to mention other probabilistic methods, developed in the recent works of Gubinelli, Koch, and Oh,25 Bringmann,10,12 and Oh, Okamoto, and Tolomeo.28 These methods also go beyond the linear–nonlinear decomposition and are partly inspired by the parabolic theories. They have important similarities and differences compared to our methods in Refs. 17 and 18, but they mostly apply for wave equations instead of Schrödinger equations, so we will not further elaborate here but refer the reader to the above papers for further explanation.

We start by fixing the i.i.d. normalized (complex) Gaussian random variables {gk(ω)}kZ3 so that Egk=0 and E|gk|2=1. Let


and it is easy to see that f(ω)H1/2(T3) almost surely. Let V:T3R be a potential such that V is even, non-negative, and V0 = 1, |Vk| ≲ ⟨kβ as described above. Here and below, we will use uk to denote the Fourier coefficients of u and use u^ to represent time Fourier transform only. In this paper, we fix β < 1 and sufficiently close (This is a specific value, but we do not track it below.) to 1. Let N2Z0{0} be a dyadic scale, and define projections ΠN such that (ΠNu)k=1kNuk and ΔN = ΠN − ΠN/2, and define

fN(ω)=ΠNf(ω),  FN(ω)=ΔNf(ω)=fN(ω)fN/2(ω).

We introduce the following truncated and renormalized versions of (1.1), with the truncated random initial data:


Here, in (1.7), we fix


and CN is a Fourier multiplier,


Note that uN is supported in ⟨k⟩ ≤ N for all time. The first counterterm in (1.7), namely, −σNuN, corresponds to the standard Wick ordering, where one fixes k1 = k2 in the expression,


plugs in u = fN(ω), and takes expectations. The second term CNuN corresponds to fixing k2 = k3, which is present due to the asymmetry of the nonlinearity (|u|2 * V) · u. Note that (CN)k is uniformly bounded and, thus, is unnecessary if β > 1 (in particular, this is the case of Bourgain7); if β < 1, this becomes a divergent term that needs to be subtracted.

Equation (1.7) is a finite dimensional Hamiltonian equation with the Hamiltonian


where γN=k,NVkk22.

Remark 1.1.
In fact, the Hamiltonian HN[u] can also be expressed as T3|u|2+:|u|2(V*|u|2):, where the suitable renormalized nonlinearity :|u|2(V * |u|2): is defined as
Note that T3σN(V*|u|2)=T3σN|u|2 since V^(0)=1.
We can define the corresponding truncated and renormalized Gibbs measures, namely,
where ZN > 0 is a normalization constant, making dνN a probability measure. Clearly, dνN is invariant under the finite dimensional flow (1.7). Note that we can also write
where ZN* is another positive constant, dρN is the law of distribution for the linear Gaussian random variable fN(ω) ≔ ΠNf(ω), and HNpot[u] represents the potential energy, given by
Now, define ΠN=1ΠN. Let VN and VN be the ranges of the projections ΠN and ΠN, and let dρ and dρN be the laws of distribution for f(ω) and ΠNf(ω), respectively. Then, we have dρ=dρN×dρN; moreover, we define
dνN=dηN×dρN=GN(u)dρ,  GN(u)1ZN*eHNpot[ΠNu].
We have the following result. Recall that in this paper, we are fixing β < 1 close enough to 1; in particular, β > 1/2.

Proposition 1.2.

Suppose that V is the Bessel potential of order β and β > 1/2; then, GN(u) converges to a limit G(u) in Lq(dρ) for all 1 ≤ q < ∞, and the sequence of measures dνNconverges to a probability measure dν in the sense of total variations. The measure dν is called the Gibbs measure associated with system (1.1).


This is proved in the recent works of Bringmann11 and Oh, Okamoto, and Tolomeo.28 Strictly speaking, they are dealing with the case of real-valued u (as they are concerned about the wave equation), but the proof can be readily adapted to the complex-valued case here.□

Now, we can state our main theorem.45 

Theorem 1.3.
Let V be the Bessel potential of order β and β < 1 be close enough to 1. There exists a Borel setΣH1/2(T3)such that ν(Σ) = 1, and the following holds. For any uin ∈Σ, let uN(t) be defined by (1.7), and then,
exists inCt0Hx1/2(R×T3), and u(t) ∈Σ for alltR. This u(t) solves (1.1) with a suitably renormalized nonlinearity and defines a mapping Φt: Σ → Σ for eachtR. These mappings satisfy the group properties Φt+s = ΦtΦsand keep the Gibbs measure dν invariant, namely, ν(E) = νt(E)) for anytRand Borel set E ⊂Σ.

Remark 1.4.

The condition that V is the Bessel potential of order β in Theorem 1.3 is required only because of the assumption of Proposition 1.2, which is proved by Refs. 11 and 28. In fact, all the proofs in this paper are mainly about the almost-sure local-in-time well-posedness and only require that V satisfies the following: (1) V is real-valued and even, and so is V^; (2) V ≥ 0; and (3) V^(0)=1 and |V^(k)|kβ.

Remark 1.5.

As in Refs. 17 and 18, the sequence {uN} can be replaced by other canonical approximation sequences, for example, with the sharp truncations ΠN on the initial data replaced by smooth truncations or with the projection ΠN onto the nonlinearity in (1.7) omitted. The limit obtained does not depend on the choice of such sequences, and the proof will essentially be the same.

1. Regarding the range of β

The range of β obtained in Theorem 1.3 is clearly not optimal. In fact, Eq. (1.1) with Gibbs measure data is probabilistically subcritical as long as β > 0, and one should expect the same result at least when β > 1/2 (so the Gibbs measure is absolutely continuous with the Gaussian free field).

The purpose of this paper, however, is to provide an example where the method of random averaging operators17 is applied so that one can significantly improve the existing probabilistic results (β close but smaller than 1 vs β > 2 in Ref. 7) while keeping the presentation relatively short. In order to treat β > 1/2, one would need to adapt the sophisticated theory of random tensors,18 which will considerably increase the length of this work, so we decide to leave this part to a next paper.

As for the case 0 < β < 1/2, one would need to deal with the mutual singularity between the Gibbs measure and the Gaussian free field [of course, if one studies the local well-posedness problem with Gaussian initial data, as in (1.5), which is, of course, different from Gibbs, then a modification of the random tensor theory18 would also likely work for all β > 0]. The recent work of Bringmann12 provides a nice example where this issue is solved in the context of wave equations, and it would be interesting to see whether this can be extended to Schrödinger equations. Finally, the case β = 0, which is the famous Gibbs measure invariance problem for the three-dimensional cubic NLS equation, still remains an outstanding open problem as of now. It is probabilistically critical, which presumably would require completely new techniques to solve.

Due to the absolute continuity of the Gibbs measure in Proposition 1.2, in order to prove Theorem 1.3, we only need to consider the initial data distributed according to dρ for (the renormalized version of) (1.1) and the initial data distributed according to dρN for (1.7). In other words, we may assume that u(0) = f(ω) for (1.1) and uN(0) = fN(ω) for (1.7).

1. Random averaging operators

Let us focus on (1.7); for simplicity, we will ignore the renormalization terms. The approach of Bourgain and of Da Prato and Debussche corresponds to decomposing


where fN is as in (1.6) and v(t) is the nonlinear evolution. In particular, this v(t) contains a trilinear Gaussian term,


This term turns out to only have H0− regularity, which is not regular enough for a fixed point argument (note that the classical scaling critical threshold is H(1−β)/2). Therefore, this approach does not work.

Nevertheless, one may observe that the only contribution to v* that has the worst (H0−) regularity is when the first two input factors are at low frequency and the third factor is at high frequency, such as


for N′ ≪ N and FN as in (1.6). Moreover, this low frequency component fN may also be replaced by the corresponding nonlinear term at frequency N′, so it makes sense to separate the low–low–high interaction term ψN defined by


as the singular part of yNuNuN/2 so that yNψN has higher regularity.

The idea of considering high–low interactions is consistent with the para-controlled calculus in Refs. 2325. However, in those works, the singular term ψN and the regular term yNψN are characterized only by their regularity (for example, one is constructed via fixed point argument in H0− and the other in H1/2−), which, as pointed out in Ref. 17, is not enough in the context of Schrödinger equations. Instead, it is crucial that one studies the operator, referred to as the random averaging operator in Ref. 17, which maps z to the solution to the following equation:


Note that the kernel of this operator, which we denote by HN=(HN)kk(t), is a Borel function of {gk(ω)}kN/2 and is independent of FN(ω). Moreover, this HN encodes the whole randomness structure of uN/2, which is captured in two particular matrix norm bounds for HN. Essentially, they involve the k2k2 operator norm and the kk2 Hilbert–Schmidt norm for fixed time t (or fixed Fourier variable λ); see Sec. II B 2 for details.

This is the main idea of the random averaging operators in Ref. 17. Basically, it allows one to fully exploit the randomness structure of the solution at all scales, which is necessary for the proof in the setting of Schrödinger equations in the lack of any smoothing effect.

2. The special term ρN: A “critical” component

In addition to the ansatz introduced in Sec. I C 1, it turns out that an extra term is necessary due to the structure (especially, the asymmetry) of the nonlinearity (1.1). Recall that (|u|2 * V)u can be expressed as in (1.10); for simplicity, we will ignore any resonances (which are canceled by the renormalizations), i.e., assume that k2 ∉ {k1, k3} in (1.10). Here, if |k1k2| ≳ Nε for some small constant ε, then the potential Vk1k2, which is bounded by k1k2β, will transform into a derivative gain, which allows one to close easily using the random averaging operator ansatz in Sec. I C 1.

However, suppose that |k1k2| is very small, say, |k1k2| ∼ 1 in (1.10); then, the potential does not lead to any gain of derivatives, and we will see that this particular term, in fact, exhibits some (probabilistically) “critical” feature. To see this, let us define N to be this portion of nonlinearity (and the corresponding multilinear expression),


and note the Π1 projector restricting to |k1k2| ∼ 1. Then, if we define the iteration terms


it follows from simple calculations that u(0) has regularity H−1/2−, while each u(m), where m ≥ 1, has exactly regularity H1/2−. Therefore, although u(1) is indeed more regular than u(0), the higher order iterations are not getting smoother despite all input functions [which are FN(ω)] having the same (and high) frequency. This is in contrast with the “genuinely (probabilistically) subcritical” situations (for the standard NLS equation) in Ref. 18, where for fixed positive constants ε and c, the mth iteration u(m), assuming that all input frequencies are the same, will have increasing and positive regularity in Hεmc as m grows and becomes large. Similarly, one may consider the linear operator,


with N as in (1.18), and in typical subcritical cases, the norm of this operator from a suitable Xs,b space to itself would be Nα for some α > 0; see Refs. 17 and 18. However, here (for Hartree), one can check that the corresponding norm is, in fact, ∼1 and may even exhibit a logarithmic divergence if one adds up different scales.

Therefore, it is clear that the contribution N as in (1.18) needs a special treatment in addition to the ansatz in Sec. I C 1. Fortunately, this term does not depend on the value of β and was already treated in the work of Bourgain.7 In this work, we introduce an extra term ρN, which corresponds to the term treated in the work of Bourgain,7 by defining ξN such that


and defining ρN = ξNψN, where Π̃Nε is a smooth truncation at frequency Nε for some small ε. This term is then measured at regularity Hs for some s < 1/2, while the remainder term zNyNξN, where yN = uNuN/2, is measured at regularity Hs for some s < s′ < 1/2. See Sec. III A 3 for the solution ansatz and Proposition 3.1 for the precise formulations.

3. Additional remark

Note that the precise definitions of the equations satisfied by ψN and ξN [see (3.2) and (3.8)] involve projection ΔN onto the right-hand sides; this is to make sure that (ψN)k and (ξN)k are exactly supported in N/2 < ⟨k⟩ ≤ N so that one can exploit the cancellation due to the unitarity of the matrices HN (corresponding to ψN), as well as the matrices MN that correspond to the term ξN. This unitarity comes from the mass conservation property of the linear equations defining these matrices and already plays a key role in the work of Bourgain.7 See Sec. III B for details.

We start with system (1.7) with initial data uN(0) = fN(ω). Clearly, (uN)k is supported in ⟨k⟩ ≤ N. If we denote the right-hand side of (1.7) by ΠNN(uN), then in Fourier space, we have


We will extend N(u), which is a cubic polynomial of u, to an R-trilinear operator N(u,v,w) in the standard way. Note that

is conserved under the flow (1.7), and we may get rid of the second term on the right-hand side of (2.1) by a gauge transform,


If we further define the profile vN by


then v will satisfy the integral equation,




Below, we will focus on systems (2.3) and (2.4).

We set up some basic notations and norms needed later in the proof.

1. Notations

As denoted above, we will use vk to denote Fourier coefficients, and Fvk=v^k=v^k(λ) denotes the Fourier transform in time. For a finite index set A, we will write kA = (kj: jA), where each kjZ3, and denote by hkA a tensor h:(Z3)AC. We may also define tensors involving λ variables where λR.

We fix the parameters, to be used in the proof, as follows. Let ε > 0 be sufficiently small absolute constant. Let ε1 and ε2 be fixed such that ε2ε1ε. Let β < 1 be such that 1 − βε2, and choose δ such that δ ≪ 1 − β and κ such that κδ−1. We use θ to denote any generic small positive constant such that θδ (which may be different at different places). Let b = 1/2 + κ−1, so 1 − b = 1/2 − κ−1. Finally, let τ be sufficiently small compared to all the above-mentioned parameters, and denote J = [−τ, τ]. Fix a smooth cutoff function χ(t), which equals 1 for |t| ≤ 1 and equals 0 for |t| ≥ 2, and define χτ(t) ≔ χ(τ−1t). We use C to denote any large absolute constant and Cθ for any large constant depending on θ. If some event happens with probability 1CθeAθ, where A is a large parameter, we say that this event happens A-certainly.

2. Norms

If (B, C) is a partition of A, namely, BC = and BC = A, we define the norm hkBkC such that


The same notation also applies for tensors involving the λ variables. For functions u = uk(t) and h = hkk(t) and 0 < c < 1, we also define the norms


For any interval I, define the corresponding localized norms


and similarly define Yc(I) and Zc(I). By abusing notations, we will call the above v an extension of u, though it is actually an extension of the restriction of u to I.

Remark 2.1.

Note that the Yc norm defined above follows the Yc norm defined in our recent work,18 which is more convenient for the purpose of this paper than the Yc norm defined in our earlier work.17 

Here, we record some basic estimates. Most of them are standard or are in our previous works.17,18

1. Linear estimates

Define the original and truncated Duhamel operators,


Lemma 2.2.
We have the formula
where the kernel I satisfies that


See Ref. 16, Lemma 3.1, whence by a similar proof, one can also prove (2.10) for |λ,λI|.□

Proposition 2.3
(Short time bounds). Let φ be any Schwartz function, and recall that φτ(t) = φ(τ−1t) for τ ≪ 1. Then, for any u = uk(t), we have
provided that either 0 < cc1 < 1/2 or uk(0) = 0 and 1/2 < cc1 < 1. The same result also holds if u = u(t) is measured in norms other than ℓ2, so (2.11) is true with X replaced by Y or Z.


See Ref. 18, Lemma 4.2.□

Lemma 2.4
(Suitable extensions). Suppose that f(x, t) is a function defined in t ∈ [−τ, τ] = J, with |τ| ≪ 1. Define that
For any Schwartz function φ, we have
provided that either 0 < b < b1 < 1/2 or 1/2 < b < b1 < 1. When 1/2 < b < b1 < 1, we have


We only need to bound locally in-time the function f*(t), which equals f(0) for t ≥ 0 and f(t) for t < 0; in fact, g is obtained by performing twice the transformation from f to f*, first at center τ and then at center −τ.

We can decompose f into two parts, f1, which is smooth and equals f(0) near 0, and f2 such that f2(0) = 0. Clearly, we only need to consider f2 so that f* equals f2 multiplied by a smooth truncation of 1[0,+), with f2(0) = 0.

We may replace 1[0,+) by the sign function and then apply Proposition 2.3; note that for an even smooth cutoff function χ,
where ΔN are the standard Littlewood–Paley projections. Moreover, ΔN(χ · sgn) (x) can be viewed as a rescaled Schwartz function of the same form as in Proposition 2.3 with τ = N−1 (due to the expression of the Fourier transform of sgn and simple calculations), so the desired result follows from Proposition 2.3.□

2. Counting estimates

Here, we list some counting estimates and the resulting tensor norm bounds.

Lemma 2.5.
  1. LetR=ZorZ[i]. Then, given0mRanda0,b0C, the number of choices for(a,b)R2that satisfies

m=ab,  |aa0|M,  |bb0|N
is O(MθNθ) with constant depending only on θ > 0.
  1. For dyadic numbers N1, N2, N3, R > 0 and some fixed number Ω0,

and then,SkRis the set of (k, k1, k2, k3) ∈ SRwhen k is fixed. We have the following counting estimates:


(1) It is the same as part (1) of Lemma 4.3 in Ref. 17. (2) We consider |SR|. First, the number of choices of k1 and k3 is N13N33. After fixing the choice of k1 and k3 to count (k, k2), it is equivalent to count k2 satisfying the restriction |k2|2 + |k2 + c1|2 = c2 or to count k satisfying the restriction |k|2 + |k + c3|2 = c4 for some fixed numbers c1, … , c4, and hence, we have |SR|N13N33(N2N)1+θ. Similarly, if we first fix k and k2, we have |SR|N3N23(N1N3)1+θ. In addition, if we fix k2 first, then to count (k, k1, k3) is equivalent to count (k1, k3) with the restriction (k2k1) · (k2k3) = c for some fixed number c. By fixing the first two components of (k1, k3) and using part (1), we have |SR,M|N23RN32+θ. Similarly, we also have |SR|N3(RN1)2+θ). The proofs of (2.18)–(2.24) are similar.□

3. Probabilistic and tensor estimates

Proposition 2.6
(Proposition 4.11 in Ref. 18). Consider two tensorshkA1(1)andhkA2(2), where A1A2 = C. Let A1ΔA2 = A, and define the semi-product
Then, for any partition (X, Y) of A, let XA1 = X1and YA1 = Y1, and we have

Proposition 2.7
(Proposition 4.12 in Ref. 18). Let Aj (1 ≤ jm) be index sets such that any index appears in at most two Ajs, and leth(j)=hkAj(j)be tensors. Let A = A1Δ⋯ΔAmbe the set of indices that belong to only one Ajand C = (A1 ∪⋯∪ Am) A be the set of indices that belong to two different Ajs. Define the semi-product
Let (X, Y) be a partition of A. For 1 ≤ jm, let Xj = XAjand Yj = YAj, and define
and then, we have
For the proofs of Propositions 2.6 and 2.7, see Ref. 18. In that work, the full power of (2.26) and (2.29) is needed, but here, we only need some specific cases, mainly those of the following form (where qr):
where (kA, kB) is a partition of the variables (k1, …, kq, kq+1, …, kr) and (kA, kB) is a partition of the variables (k1, …, kr), where each kj (1 ≤ jq) is replaced by kj in (kA, kB).

Proposition 2.8
(Proposition 4.14 in Ref. 18). Let A be a finite set andhbckA=hbckA(ω)be a tensor, where eachkjZdand(b,c)(Z3)qfor some integer q ≥ 2. Given signs ζj ∈ {±}, we also assume thatb⟩, ⟨c⟩ ≲ M andkj⟩ ≲ M for all jA, where M is a dyadic number, and that in the support ofhbckA, there is no-pairing in kA. Define the tensor
where we restrict kjE in (2.31), with E being a finite set such that{hbckA}is independent with {ηk: kE}. Then, τ−1M-certainly, we have
where (B, C) runs over all partitions of A. The same results hold if we do not assumeb⟩, ⟨c⟩ ≲ M, but instead that (i)b,cZ3and |bc| ≲ M andb|2|c|2|Mκ3and (ii)hbckAcan be written as a function of bc, |b|2 − |c|2, and kA.

For the Proof of Proposition 2.8, see Ref. 18 and Propositions 4.14 and 4.15.

Proposition 2.9
(Weighted bounds). Suppose that the matricesh=hkk,h(1)=hkk(1), andh(2)=hkk(2)satisfy that
andhkk(1)is supported in |kk′| ≲ L, then we have

For the Proof of Proposition 2.9, see Ref. 17, Proposition 2.5 or Ref. 18, Lemma 4.3 (there are different versions of this bound, but the proofs are the same).

Start with systems (2.3) and (2.4). Let yN = vNvN/2, and then, yN satisfies the integral equation,


1. The term ψN,L

For any LN/2, consider the linear equation for Ψ = Ψk(t),


where we define, with δ ≪ 1,


and define also M>MM<. If (3.2) has initial data Ψk(0) = ΔNϕk, then the solution may be expressed as


where HN,L=HkkN,L is the kernel of a linear operator (or a matrix). Define also


and similarly,


Note that when L = 1, we will replace L/2 by 0, so, for example, (ψN,0)k(t)=(FN)k. For simplicity, denote

HNHN,N/2  and  ψN:=ψN,N/2.

Note that each hN,L and HN,L is a Borel function of (gk(ω))kN/2 and is thus independent of the Gaussians in FN.

2. The terms ξN and ρN

Next, similar to (3.2), we consider the linear equation,


where M is defined by


If the initial data are Ξk(0) = ΔNϕk, then we may write the solution as


which defines the matrix MN=MkkN. We then define ξN and ρN by


3. The ansatz

Now, we introduce the ansatz


where zN is a remainder term. We can calculate that zN solves the following equation (recall yN = vNvN/2):


The following properties of H and M will play a fundamental role. This idea goes back to that of Bourgain.7 Recall that for LN/2, the matrix HN,L is defined by (3.2) and (3.4). Note that if Ψ solves (3.2), then Ψk(t) is supported in N/2 < ⟨k⟩ ≤ N and recall that Vk=Vk=Vk¯, and then, we have


The sum on the right-hand side may be replaced by two terms, namely, S1 where we only require k1k2 in the summation and S2 where we require k1k2 and k2 = k3 in the summation. For S1, by swapping (k, k1, k2, k3) ↦ (k3, k2, k1, k), we also see that S1R and, hence, Im(S1) = 0; moreover,


which is also real-valued by swapping (k, k2) ↦ (k2, k). This means that ∑kk(t)|2 is conserved in time. Therefore, for each fixed t, the matrix HN,L=HkkN,L is unitary; hence, we get the identity


with δk1k2 being the Kronecker delta. This, in particular, holds for L = N/2. In the same way, the matrix MN defined by (3.8) and (3.10) also satisfies (3.15).

We now state the main a priori estimate and prove that this implies Theorem 1.3.

Proposition 3.1.

Given 0 < τ ≪ 1, and let J = [−τ, τ]. Recall the parameters defined in Sec. II B. For any M, consider the following statements, which we call Local(M):

  1. For the operators hN,L, where L < M and N > L is arbitrary, we have

as well as
  1. For the terms ρNand zN, where NM, we have

  1. For any L1, L2 < M, the operator defined by

has an extension, which we still denote byLfor simplicity. The kernelLkk(t,t)has Fourier transformL^kk(λ,λ), which satisfies
where L = max(L1, L2).
Now, with the above definition, we have that

Proof of Theorem 1.3.
By Proposition 3.1, in particular, we know that τ−1-certainly, the event Local(M) happens for any M. By (3.4), (3.11), and (3.12), we have
Exploiting independence between hN,L and FN and using Proposition 2.8 combined with (3.16), we can show that ζN,LXb(J)NδL1/3. Summing over L and noting that ζN,L is supported in N/2 < ⟨k⟩ ≤ N, we see that
for any γ > 0. Using also (3.18), we can see that the sequence {vNfN} converges in Ct0Hx0(J). Hence, {vN} converges in Ct0Hx1/2(J), and so does the original sequence {uN}.

Therefore, the solution uN to (1.7) converges to a unique limit as N up to an exceptional set with probability 1Cθeτθ. This proves the almost-sure local well-posedness of (1.1) with Gibbs measure initial data. Since the truncated Gibbs measure dηN defined by (1.13) is invariant under (1.7) and the truncated Gibbs measures converge strongly to the Gibbs measure dν as in Proposition 1.2, we can apply the standard local-to-global argument of Bourgain, where the a priori estimates in Proposition 3.1 allow us to prove the suitable stability bounds needed in the process, in exactly the same way as in Ref. 17. The almost-sure global existence and invariance of Gibbs measure then follow.□

From now on, we will focus on the Proof of Proposition 3.1, and assume that the bounds involved in Local(M/2) are already true. The goal is to recover (3.16)–(3.18) and (3.20)–(3.21) for M. Before proceeding, we want to remark on a few simplifications that we would like to make in the proof below. These are either standard or are the same as in Refs. 17 and 18, and we will not detail out these arguments in the proof below.

  1. In proving these bounds, we will use the standard continuity argument, which involves a smallness factor. The localized Xc norm, Xc([0, T]), is continuous in T if the function is smooth. This should enable the continuity argument. Here, this factor is provided by the short time τ ≪ 1. In particular, we can gain a positive power τθ by using38 Proposition 2.3 at the price of changing the c exponent in the Xc (or Yc or Zc) norm by a little (e.g., from 1 − b to b). It can be checked in the proof below that all the estimates allow for some room in c, so this is always possible.

  2. In each proof below, we can actually gain an extra power Mδ/10 compared to the desired estimate, so any loss that is MCκ1 will be acceptable. In fact, in the proof below, we will frequently encounter losses of at most MCκ1 due to manipulations of the c exponent in various norms as in (1) and due to the application of probabilistic bounds such as Proposition 2.8 where we lose a small θ power.

  3. In the course of the proof, we will occasionally need to obtain bounds of quantities of the form supλG(λ), where λ ranges in an interval, and for each fixed λ, the quantity |G(λ)| can be bounded, apart from a small exceptional set; moreover, here, G will be differentiable and G′(λ) will satisfy a weaker but unconditional bound. Then, we can apply the meshing argument in Refs. 17 and 18, where we divide the interval into a large number of subintervals, approximate G on each small interval by a sample (or an average), control the error term using G′, and add up the exceptional sets corresponding to the sample in each interval. In typical cases, where M-certainly |G(λ)| ≤ Mθ for each fixed λ, |I| ≤ MC, and |G′(λ)| ≤ MC unconditionally, we can deduce that M-certainly, supλ|G(λ)| ≤ Mθ because the number of subintervals is O(MC), so the total probability for the union of exceptional sets is still sufficiently small.

We start by proving (3.20) and (3.21) for L = M/2. We need to construct an extension of L defined in (3.19). This is done first using Lemma 2.4 to find extensions of each component of yL1 and yL2 [note that max(L1, L2) = M/2] such that these extension terms satisfy (3.16)–(3.18) with the localized Xb(J), …, norms replaced by the global Xb, …, norms, at the expense of some slightly worse exponents. The change in the value in exponents will play no role in the proof below, so we will omit it. Then, by attaching to L a factor χ(τ−1t) and using Lemma 2.3 (see Sec. III D), we can gain a smallness factor τθ at the price of further worsening the exponents. These operations are standard, so we will not repeat them below.

Note that the extension defined in Lemma 2.4 preserves the independence between the matrices hLj,Rj and FLj for RjLj/2.

Recall that L^kk(λ,λ) is the Fourier transform of the kernel Lkk(t,t) of L, and we have


Now, we consider the different cases.

  1. Suppose in (3.19) we replace yLj by ρLj+zLj for j ∈ {1, 2}, then, in particular, we may assume that yLjXbLj1/2+ε1+ε2 due to (3.18). By (3.19) and (4.1), we have


where Ω = |k|2 − |k1|2 + |k2|2 − |k′|2 and I = I(λ, μ) is as in (2.10); we will omit the factor η((k1k2)/N1−δ) in the definition of M< in (3.3) as it does not play a role. We may also assume that |k1k2| ∼ RL. In the above expression, let μλ − (Ω + λ1λ2 + λ′); in particular, we have |I| ≲ ⟨λ−1μ−1 by (2.10). By a routine argument, in proving (3.20), we may assume that |λj| ≤ L100 and |μ| ≲ L100; in fact, if, say, |λ1| is the maximum of these values and |λ1| ≥ L100 (the other cases being similar), then we may fix the values of kj and, hence, kk′, at a loss of at most L12, and reduce to estimating


with |λ1| ∼ KL100 and λjbwj^L21 for each j. By estimating w1 in the unweighted L2 norm, we can gain a power K−1/2, and using the Lλ21 integrability of w2^, which follows from the weighted L2 norm, we can fix the value of λ2. In the end, this leads to


and hence,


which is more than enough because L^kk=supk,k|L^kk| if L is supported, where kk′ is constant.

Now, we may assume |λj| ≤ L100 for j ∈ {1, 2} and |μ| ≤ L100; we may also assume |λ|+|λ|Lκ3 as, otherwise, we gain from the weights ⟨λ2(1−b) and ⟨λ′⟩−2b in (3.20). Similarly, in proving (3.21), we may assume |λj| ≤ N100 for j ∈ {1, 2}, |μ| ≤ N100, and |λ| + |λ′| ≤ N100 (otherwise, we may also fix (k, k′) and argue as above). Therefore, in proving (3.21), we may replace the unfavorable exponents ⟨λ2bλ′⟩−2(1−b) by the favorable ones ⟨λ2(1−b)λ′⟩−2b at a price of NCκ1; this will be acceptable since in the proof, we will be able to gain a power Nδ/2. We remark that in the proof below (though not here), we may use the Y1−b norm as in (3.16) for the matrices in the decomposition of yLj; using the bounds of λj as above, we may replace the exponent 1 − b by b (which then implies Lλj1 integrability) again at a loss of either LCκ1 or NCκ1 depending on whether we are proving (3.20) or (3.21), which is acceptable. See also Sec. III D.

This then allows us to fix the values of λj in (4.2) using the Lλj1 integrability coming from the weighted norms; moreover, by using the bound |I| ≲ ⟨λ−1μ−1, upper bounds for λ and μ as mentioned above, and the weights in (3.20)–(3.21), we may also fix the values of λ, λ′, and ⌊μ⌋and reduce to estimating the following quantity:


where the tensor (which we call the base tensor)


with some value Ω0 determined by λj, λ, λ′, and ⌊μ⌋. Here, we also assume |kj| ≲ Lj and |k1k2| ∼ RL and wj2Lj1/2+ε1+ε2.

Now, (4.3) is easily estimated by using Proposition 2.7 that


which is enough for (3.20) [namely, we multiply this by the factor ⟨λ−1 coming from I and the weight ⟨λ1−bλ′⟩b in (3.20) and then take the L2 norm in λ and λ′ to get (3.20); the same happens below]. In particular, the norm hbkk2k1k is bounded as


by Schur’s bound, where Sk1kR and Skk2R are defined similarly as in Lemma 2.5.

For the Qkk norm, we have


which is enough for (3.21). Note that all the bounds for hb we use here follow from Schur’s bound and Lemma 2.5.

  1. Suppose that yL1 is replaced by ρL1+zL1 and yL2 is replaced by ψL2. We may further decompose ψL2 into ζL2,R2 for R2L2/2 (including the case R2 = 0 by which we mean ζL2,0=FL2) and perform the same arguments as mentioned above fixing the λ variables and reduce (this reduction step actually involves a meshing argument as the estimate for Q is probabilistic; see Sec. III D) to estimating the following quantity:


where w12L11/2+ε1+ε2 and h(2) is independent of FL2 and is either the identity matrix or satisfies h(2)k2k2R21/2+3ε1 and h(2)k2k2L21+δR21/2+2ε1. We then estimate (4.4) by


using Propositions 2.6 and 2.8, which is enough for (3.20). Note that here hb depends on k and k′ only via kk′ and |k|2 − |k′|2 and that k|2|k|2|Lκ3, given the assumptions, so Proposition 2.8 is applicable. Similarly, for the kk2 norm, we have


which is enough for (3.21).

  1. Suppose that yLj is replaced by ψLj for j ∈ {1, 2}. In this case, we will start from (3.19) and expand


for j ∈ {1, 2}. There are then two cases, namely, when k1 = k2 or otherwise.

If k1k2, then we can repeat the above argument [including further decomposing ψLj into ζLj,Rj using (3.6) and (3.7)] and fix the time Fourier variables and reduce to estimating a quantity


where h(j) is independent of FLj and is either the identity matrix or satisfies h(j)kjkjRj1/2+3ε1 and h(j)kjkjLj1+δRj1/2+2ε1. Since k1k2, we can apply Proposition 2.8 either in (k1, k2) jointly (if L1 = L2) or first in k1 and then in k2 (if, say, L1 ≥ 2L2) and get that


which is enough for (3.20). As for the kk2 norm, we have


which is enough for (3.21).

Finally, assume that k1 = k2, and then, L1 = L2 = L. In (3.19), the summation in k1 = k2 gives


Using the cancellation (3.15) since k1k2, we can replace the factor 1/k12 in the above expression by 1/k121/k12; then, by further decomposing HLj into hLj,Rj by (3.7) and repeating the above arguments, we can reduce to estimating the following quantity:


where h(j) is either the identity matrix or satisfies h(j)kjkjRj1/2+3ε1 and h(j)kjkjLj1+δRj1/2+2ε1. Note that we may assume |kjkj| ≲ RjLδ using the bound (3.17), so, in particular, we have


up to a loss of L (which is acceptable as in this case, we can gain at least Lε2). Using these, we estimate, assuming without loss of generality that R1R2,


This completes the proof for (3.20) and (3.21).

We now prove (3.16) and (3.17). Let LN,L be the linear operator defined by


and we also extend its kernel in the same way as we do for L in Sec. IV A. Let L̃N,L=LN,LLN,L/2, and then, by induction hypothesis and the proof in Sec. IV A, we know that L̃N,L also satisfies estimates (3.20) and (3.21). Clearly, (3.20) implies that L̃N,LXbX1bL1/2+3ε1ε2; moreover, it is easy to see that


and hence, LN,LX0X1L12, and the same holds for L̃N,L. By interpolation, we obtain that L̃N,LXαXαL1/2+3ε1 for α ∈ {b, 1 −b} (note that we can always gain a positive power of τ using Lemma 2.3; see Sec. III D). Moreover, consider the kernel (FL̃N,L)kk(λ,λ), and then, we also have the following bound:


which follows from (3.20). If we replace the factor ⟨λ′⟩b by 1, then a simple argument shows that


(and the same for L̃N,L) by using that


and then fixing the Fourier modes of vL. Interpolating again, we get that


A similar interpolation gives


Clearly, LN,L satisfies (4.12) and (4.13) with right-hand sides replaced by 1.

Now, let

H  N,L=(1LN,L)1=n=0(LN,L)n,

and then, it is easy to see that H  N,L1 satisfies the same bounds (4.12) and (4.13) with right-hand sides replaced by 1; for example, (4.12) follows from iterating the following bound:


provided that


and (4.13) is proved similarly. Define further

H  ̃N,L=H  N,LH  N,L/2=n=1(1)n1(H  N,LL̃N,L)nH  N,L.

By iterating the XαXα bounds and using also (3.21) for L̃N,L, we can show that

R2λ2bλ2(1b)(FH  ̃N,L)kk(λ,λ)kk2dλdλN2+2δL1+4ε1.

The weighted bound

R2λ2bλ2(1b)1+|kk|min(L,N1δ)κ(FH  ̃N,L)kk(λ,λ)kk2dλdλN3

is shown in the same way but using Proposition 2.9.

In addition, we can also show that

R2λ2(1b)λ2b(FH  ̃N,L)kk(λ,λ)kk2dλdλL1+6ε1.

This can be proved using (4.12)–(4.13), by iterating the following bounds:


and similarly,


assuming (4.14).

Now, we can finally prove (3.16) and (3.17). In fact, by definition of H  N,L and H  ̃N,L, there exists an extension of hN,L such that

(hN,L^)kk(λ)=R(FH  ̃N,L)kk(λ,λ)χ^(λ)dλ,

so the Y1−b and Zb bounds in (3.16), as well as (3.17), can be deduced directly from (4.15)–(4.17). The bound supthN,L(t)kk is also easily controlled by H  ̃N,LXbXb using the embedding LtL2Xb. This completes the proof for (3.16) and (3.17).

In this section, we prove the first bound in (3.18) regarding ρN, assuming N = M. Recall that from (3.2), (3.4), and (3.8), we deduce that ρN satisfies the following equation:


with initial data (ρN)k(0)=0. Let LN,L be defined as in (4.11), and denote LNLN,N/2; from Sec. IV B, we know that (1LN)1H  N is well-defined and has kernel (H  N)kk(t,t) in physical space and (FH  N)kk(λ,λ) in Fourier space. Then, (5.1) can be reduced to

(ρN)k(t)=k0t(H  N)kk(t,t)Wk(t)dt,



Here, in (5.3), we assume for j ∈ {1, 2} that wj{ψNj,ρNj,zNj}, where max(N1, N2) = N, and that w3 ∈ {ψN, ρN}.

In order to prove the bound for ρN in (3.17), we will apply a continuity argument, namely, assuming (3.17) and then improving it with a smallness factor. This can be done as long as we bound


since from Sec. IV B, we know that H  N is bounded from Xb(J) to Xb(J). In fact, we will prove (5.4) with an extra gain Nε2/2, which will allow us to ignore any possible N loss in the process. The smallness factor τθ will be provided by Lemma 2.3 as in Sec. III D, so we will not worry about it below. We divide the right-hand side of (5.3) into three terms:

  • Term I: when w3 = ρN.

  • Term II: when w3 = ψN and zN ∈ {w1, w2} for some N′ ≥ N/2.

  • Term III: when w3 = ψN and w1, w2 ∈ {ψN, ρN, ψN/2, ρN/2}.

Note that these are the only possibilities since if (say) N1 = N, w1 ∈ {ψN, ρN}, and N2N/2, then we must have N2 = N/2 due to the support condition for ψN and ρN, as well as the restriction |k1k2| ≲ Nε in M. Moreover, the estimate of term I follows from the operator norm bound,


which is proved by repeating the arguments in Sec. IV A (the proof that works for M< certainly also works for M). In Secs. V A and V B, we will deal with terms II and III, respectively.

For simplicity, we assume that w1 = zN (the proof of the case w2 = zN is similar). There are then two cases to consider, when w2{ρN2,zN2} or when w2=ψN2.

1. The case w2{ρN2,zN2}

If w2{ρN2,zN2}, then we, in particular, have w2Xb(J)N21/2+ε1+ε2. By Lemma 2.4, we may fix an extension of w1 and w2 that satisfy the same bounds as they do but with Xb(J) replaced by Xb; moreover, they satisfy the same measurability conditions as w1 and w2. For simplicity, we will still denote them by w1 and w2. The same thing is done for w3 = ψN, as well as the corresponding matrices.

Now, by (5.3) and Lemma 2.2, we can find an extension of II, which we still denote by II for simplicity such that


where Ω = |k|2 − |k1|2 + |k2|2 − |k3|2 and I = I(λ, μ) is as in (2.10). In the above expression, let μλ − (Ω + λ1λ2 + λ3). In particular, we have |I| ≲ ⟨λ−1μ−1 by (2.10). By a routine argument, we may assume that |λ| ≤ N100 and similarly for μ and each λj; in fact, if, say, |λ1| is the maximum of these values and |λ1| ≥ N100, then we may fix the values of k and all kj at a loss of at most N12 and reduce to estimating (with the value of Ω fixed)


with |λ1| ∼ KN100 and λjbwj^L21 for each j. By estimating w1 in the unweighted L2 norm, we can gain a power K−1/2, and using the L1 integrability of wj^ that follows from the weighted L2 norms, we can fix the values of λj for j ∈ {2, 3}. In the end, this leads to


and hence, λbII^L2K1/3N30, which is more than enough for (3.18).

Now, with |λ| ≤ N100, …, we may apply the bounds (3.16)–(3.18), but for the extensions and global norms, and replace the Y1−b norm (if any) by the Yb norm at a loss of NCκ1, which will be neglected as stated above. Similarly, as |λ| ≤ N100, we also only need to estimate II in the X1−b instead of Xb norm again at a loss of NCκ1. Then, using L1 integrability in λj (together with a meshing argument; see Sec. III D), provided by weighted bounds (3.16)–(3.18) and the (almost) summability in μ due to the ⟨μ−1 factor in (2.10), we may fix the values of λ, λj (1 ≤ j ≤ 3), and ⌊μ⌋(and hence, the value of ΩZ) and reduce to estimating the k2 norm of the following quantity:


Here, in (5.7), we assume that |k1| ≤ N, |k2| ≤ N2, |k1k2| ≲ Nε, and N/2 < ⟨k3⟩, ⟨k3⟩ ≤ N and Ω0Z is fixed, and the inputs satisfy that


To estimate Q, we may assume |k1k2| ∼ RNε and define the base tensor


with also the restrictions on kj as mentioned above. Then, we have


and hence,


By Lemma 2.8 and the independence between Hk3k3 and (FN)k3, we get that


N-certainly. By the definition of hb and using Schur’s bound and counting estimates in Lemma 2.5 and noting that |k2| ≤ N2 and |k1k2| ≲ R, we can bound


Since also Hk3k31, we conclude that


which is enough for (3.18). This concludes the proof for term II when w2{ρN2,zN2}. Note that the above argument also works for the case when w1 = ρN and w2=ρN2 because here, we must have N2N/2 due to the support condition of ρN and the assumption |k1k2| ≲ Nε, and the above arguments give the same (in fact, better) estimates.

2. The case w2=ψN2

In this case, by repeating the first part of the arguments in Sec. V A 1, we can reduce to estimating the k2 norm of the following quantity:


Here, in (5.9), we assume that |k1| ≤ N, |k2| ≤ N2, |k1k2| ∼ RNε, N2/2 < ⟨k2⟩, ⟨k2⟩ ≤ N2, and N/2 < ⟨k3⟩, ⟨k3⟩ ≤ N, and Ω0Z is fixed, and the inputs satisfy that


Moreover, this H(j) is such that either H(j) = Id or H(j)kjkjNj1+δ with N3 = N. The sum in (5.9) can be decomposed into a term where k2k3 and a term where k2 = k3.

Case 1: k2k3. Let hkk1k2k3b be defined as mentioned above, and it suffices to estimate the k12k2 norm of the tensor


by using the 2 norm of w1. If N3 = N, then the tensors hb, H(2), and H(3) are independent of (FN)k2 and (FN)k3 and k2k3, so we can apply Lemma 2.8; if N2N/2, then hb, H(2), and H(3) and (FN2)k2 are all independent of (FN)k3, and moreover, hb and H(2) are independent of (FN2)k2, so we can apply Lemma 2.8 iteratively, first for the sum in (k3, k3) and then for the sum in (k2, k2). In either case, by applying Lemma 2.8 and combining it with Lemma 2.7 and estimating H(j) in the kjkj norm, we obtain N-certainly that the desired k12k2 norm of the tensor is bounded by


Using the fact that |k2| ≤ N2 and |k3| ≤ N in the support of hb and Lemma 2.5 as above, we can show that


and hence,


which is enough for (3.18).

Case 2: k2 = k3. In this case, we must have N2 = N, and we can reduce (5.9) to the following expression:39 




As k2k3 in (5.9) due to the definition of M, we know that either H(2) or H(3) must not be identity, and hence, we have H̃k2k32N1+δ. By (5.11), we then simply estimate


using Lemma 2.5, notig that |k1k2| ≲ R and |k3| ≤ N. This completes the proof for term II.

Here, we assume w3 = ψN and w1, w2 ∈ {ψN, ρN, ψN/2, ρN/2}. We consider two possibilities, when w1, w2 ∈ {ψN, ψN/2}, which we call term IV, and when wj ∈ {ρN, ρN/2} for some j ∈ {1, 2}, which we call term V.

1. Term IV

Suppose w1, w2 ∈ {ψN, ψN/2}. We may also decompose them into ψNj,Lj for LjNj/2 and reduce to


where N1, N2 ∈ {N, N/2} and N3 = N. In (5.13), we consider two cases, depending on whether there is a pairing k1 = k2 or k2 = k3 or not.

Case 1: no-pairing. Assume that k2 ∉ {k1, k3}, and then, we take the Fourier transform in the time variable t and repeat the first part of the arguments in Sec. V A 1 to reduce to estimating the k2 norm of the following quantity:


In (5.14), we assume that |kj| ∼ N and |k1k2| ∼ RNε and that the matrices h(j) are either identity or satisfy that


and moreover, we may assume that h(j) is supported in |kjkj| ≲ LjNδ by inserting a cutoff exploiting (3.17). The k2 norm for Qk can then be estimated using Proposition 2.8 in the same way as in Sec. V A 2, either jointly in (k1, k2, k3) if each Nj = N or first in those kj with Nj = N and then in those kj with Nj = N/2, so that N-certainly we have (with the base tensor hb defined as in Secs. V A 1 and V A 2)


using Lemma 2.5, which is enough for (3.18).

Case 2: pairing. We now consider the cases when k1 = k2 or k2 = k3. First, if k2 = k3, then we can apply the reduction arguments as mentioned above and reduce to




Note that both h(2) and h(3) cannot be identity as k2k3. Now, if max(L2, L3) ≤ N/2, then due to independence, applying similar arguments as mentioned before, we can estimate N-certainly that


using the constraint |k1k2| ≲ Nε, which is enough for (3.18); if max(L2, L3) = N, then we can gain a negative power of this value and view FN1 simply as an H−1/2− function (without considering randomness) and bound


which is also enough for (3.18).

Finally, consider the case k1 = k2, so, in particular, N1 = N2. We will sum over L1 and L2 in order to exploit the cancellation (3.15) (as k1k2); this leads to the following expression:


where again we have replaced |gk1|2 by 1 as mentioned before. Since k1k2, by (3.15), we may replace k12 in the above expression by k12k12. Then, decomposing in L1 and L2 again and taking Fourier transform in t and repeating the reduction steps as mentioned before, we arrive at the following quantity:




with h(j) as mentioned above. Note that we may assume |k1k1| ≲ Nδ min(L1, Nε + L2) ≲ Nε+δ min(L1, L2) in view of |k1k2| ≲ Nε, and it is easy to show, assuming min(L1, L2) = L, that


Since max(L1, L2) ≤ N/2, using independence and arguing as mentioned before, we can estimate that N-certainly,


which is enough for (3.18). This completes the estimate for term IV.

2. Properties of the matrix MNHN

Before studying term V, we first establish some properties of the matrix QNMNHN=(QN)kk(t) such that


Lemma 5.1.
Letεεso that (ε1 ≪) εε′ ≪ 1. Then, we have
Moreover, we can decompose QN = QN,≪ + QN,remsuch thatQN,remZb(J)N1/2+2ε1ε/4and that
Moreover, QN,≪can be decomposed into at most Nterms. For each term Q, there exist vectors ℓ*, m*such that |*|, |m*| ≲ Nεand that(Q^)kk(λ)is a linear combination (in the form of some integral40 ), with summable coefficients, of expressions of the form
whereYis independent with(FN)kand|Y|1andR(k)depends only on m*k; moreover, we have

Remark 5.2.

Lemma 5.1 plays an important role in Sec. V B 3 when estimating term V. In particular, we will exploit the one-dimensional extra independence of R(k) with (FN)k since R(k) depends only on m*k instead of on k.

By definition of ξN and ψN in (3.11) and (5.1)–(5.3), as well as the associated matrices, we have the following identity:
(QN)kk(t)=k10t(H  NM)kk1(t,t1)(HNQN)k1k(t1)dt1,
and hence, we have
(QN)kk(t)=n=1(1)n1k10t{(H  NM)n}kk1(t,t1)(HN)k1k(t1)dt1,
where H  N=H  N,N/2 is defined in Sec. IV B and M denotes the following operator:
The bounds in (5.18) then follow from iterating such as in Sec. IV B using the bounds (4.12)–(4.17) (together with the XαXα bounds) for the operators H  N and M, where the bounds for M are proved in the same way as in Secs. IV A and IV B. Moreover, in (5.22), if we assume n ≥ 2 or replace H  N by H  NH  N,Nε (or HN by HNHN,Nε), then the corresponding bounds can be improved by Nε′/4, and the resulting terms can be put in Ref. 41Qrem. As for the remaining contribution, we can write
(QN,)kk(t)=k1,k2H  kk1N,Nε(t,t1)Mk1k2(t1,t2)(HN,Nε)k2k(t2)dt1dt2,
and hence,
(QN,^)kk(λ)=k1,k2(FH  N,Nε)kk1(λ,λ1)(FM)k1k2(λ1,λ2)(FHN,Nε)k2k(λ2)dλ1dλ2.
We may assume that |kk1| ≲ Nε and the same for k1k2 (using the definition of M) and k2k′, so at a loss of N, we may fix the values of kk1, k1k2, and k2k′. Note that the matrices FH  N,Nε and FM satisfy bounds (4.15)–(4.17); moreover, in (4.17), we may replace the unfavorable exponents ⟨λ2(1−b)λ′⟩−2b by the favorable ones ⟨λ2bλ′⟩−2(1−b), at the price of replacing the right-hand side by a small positive power NCκ1, by repeating the interpolation argument in Sec. IV B. Using these bounds, we then see that the integral (5.24) provides the required linear combination. Here, summability of coefficients follows from the estimate
and the improved versions of (4.15)–(4.17). Recall that kk1, k1k2, and k2k′ are all fixed. We set that * ≔ (k1k) + (k2k1) + (k′ − k2) = k′ − k and m*k1k2. Finally, for fixed (λ1, λ2),42 we set Y*,m*(k,λ)(FH  N,Nε)kk1(λ,λ1)(FHN,Nε)k2k(λ2) and R*,m*(k)(FM)k1k2(λ1,λ2). The factors coming from HN,Nε and H  N,Nε are independent of (FN)k, while the factor coming from Mdepends on k1only via the quantity |k1|2 − |k2|2 in view of the definition (5.23); hence, the desired decomposition is valid because |k1|2 − |k2|2 equals m*k plus a constant once the above-mentioned difference vectors are all fixed. In addition, the bounds of R and Y in (5.21) can be easily proved by the above setting of R and Y together with bounds (4.12)–(4.17) (together with the XαXα bounds) for the operators H  N and M.□

3. Term V

Now, let us consider term V as defined in the introduction of Sec. V B. In the following proof of estimating term V, we will fully use the cancellation in (3.15) together with Lemma 5.1. We may assume that N1 = N2 = N because if N1N2, then in later expansions, we must have k1k2 [so the cancellation in (3.15) is not needed], and the proof will go in the same way; if N1 = N2 = N/2, then the same cancellation holds, and again, we have the same proof. Now, recall that ρN = ξNψN and that


as in (3.5) and (3.11), and that both MN and HN satisfy equality (3.15). Using this cancellation (when k1 = k2 in the expansion) in the same way as in Sec. V B 1 and by repeating the reduction steps mentioned before, we can reduce to estimating the quantity that is either






Here, in (5.27) and (5.28), the matrix Q is coming from QN, where Qk1k1=(QN^)k1k1(λ) for some fixed λ; similarly, P is coming from either QN or hN,L2, and h(3) is coming from hN,L3 in the same way.

First, we consider (5.28). By losing a power N, we may fix the values of k1k2 and kk3, and then, we will estimate Q using hbk1k2kk3N2+Cε, and we have the following bounds:


(with L2 = N if P is coming from QN), where the first bound mentioned above follows from Proposition 2.8 for each fixed k3 and the second bound follows from estimating h̃k1k2L2N3Qk1k1Pk2k2. This leads to


which is enough.

Now, we consider (5.27). If P is coming from QN, then in (5.27), we may remove the condition k1k2, reducing essentially to the expression in (5.3) with both w1 and w2 replaced by ρN, which is estimated in the same way as in Sec. V A 1. On the other hand, the term when k1 = k2 can be estimated in the same way as in (5.28). The same argument applies if P is coming from hN,L2 and max(L2, L3) ≥ Nε, where we can gain a power Nε′/4 from either L2 or L3, or if Q is coming from QN,rem, where we can gain extra powers Nε′/4 using Lemma 5.1.

Finally, consider (5.27), assuming that max(L2, L3) ≤ Nε and that Q comes from QN,≪ in Lemma 5.1. By losing at most N, we may fix the values of k1k2, kk3, k2k2, and k3k3 and consider one single component of QN,≪ described as in Lemma 5.1. Then, there are only two independent variables—namely, k and k1—and we essentially reduce (5.27) to


Here, |A|1 is a non-probabilistic factor, ||, |*| ≲ Nε are fixed vectors, Y=Y(k1) and R=R(k1) are as in Lemma 5.1, and P=Pk2k2 is defined as above. Moreover, we know that Y and P are independent of gk1 and gk2, that R(k1) depends only on m*k1 for some fixed vector |m*| ≲ Nε, and that |P| ≲ NO(ε), |Y|NO(ε), and R2N1/2+O(ε) (after fixing λ as before). Finally, gk̃ in (5.29) is k3hk3k3(3)(FN3)k3 bounded by |gk̃|N1.

Since R(k1) only depends on m*k1, if we fix the value of m*k1 in the above-mentioned summation, then R(k1) can be extracted as a common factor, and for the rest of the sum, we can apply independence (using Proposition 2.8) and get


where R(a)=R(k1) for any k1 · m* = a and Sa,k{k1Z3:(k1+k)=Ω0,k1m*=a}. Note that in the above estimate, we are dividing the set of possible k1’s into subsets Sa,k, where · k1 equals some constant and m* · k1 equals another constant, and that Sa,k is either empty or has cardinality ≥N1−. When Sa,k = , |Qk|=0. When Sa,k, we have |Sa,k| ≥ N1−, and hence,


Then, using Schur’s bound, we get that


which is enough for (3.18). This completes the proof for ρN.

For the purpose of Sec. VI, we need an improvement for the ρN bound in (3.18), namely, the following.

Proposition 5.3.
Let N = M,YRbe any constant, and consider ρ*defined by
and then, N-certainly, we can improve (3.18) toρ*Xb(J)N1/2+ε1/2. Note that this bound is better than the bound for zNin (3.18) [which is better than the bound for ρNin (3.18)].


We only need to examine terms I ∼ V in the above proof. For terms I and IV and V (hence also III), in the above proof, we already obtain bounds better than N1/2+ε1/2, so these terms are acceptable, and we need to study term II. Note that the definition of ρ* restricts k to a set E of cardinality ≤N1+ by the standard divisor counting bound.

Let hb=hkk1k2k3b be the base tensor, which is supported in |kj| ≲ NjN and |k1k2| ∼ R, such that in the support of hb, we have kk1 + k2k3 = 0 and |k|2 − |k1|2 + |k2|2 − |k3|2 = Ω0. There are three cases in term II that need consideration:
  1. The case in Sec. V A 1: Here, bound (5.8) suffices unless max(N2, R) ≤ N; if this happens, note that in the above proof, (5.8) follows from the estimate

assuming that max(N2, R) ≤ N. However, if we further require kE, then the right-hand side of the above bound can be improved to |E|1/2 = N1/2+, which leads to the desired improvement of (3.18).
  1. Case 1 in Sec. V A 2: Here, the bound (5.10) suffices unless RN; if this happens, note that (5.10) follows from the estimate

assuming that RN. However, if we further require kE, then the right-hand side can be improved to |E|1/2N2 = N1/2+N2, which allows for the improvement.
  1. Case 2 in Sec. V A 2: Here, (5.12) follows from the estimate hbkk2k3k1RβNR. However, if we further require kE, then the right-hand side can be improved to Rβ|E|1/2R = RβN1/2+R, which allows for the improvement. This finishes the proof.

Now, we will prove the zN part of bound (3.18), assuming that N = M. We will prove it by a continuity argument [see part (1) of Sec. III D for more details], so we may assume (3.18) and only need to improve it using Eq. (3.13); note that the smallness factor is automatic as long as we use (3.13), as explained before. As such, we can assume that each input factor wjs on the right-hand side of (3.13) has one of the following four types, where in all cases, we have NjN:

  • Type (G), where we define Lj = 1 and

  • (ii)

    Type (C), where


with hkjkj(j)(λj,ω) supported in the set Nj2<kjNj,Nj2<kjNj and BLj measurable for some LjNj/2 and satisfying the following bounds (where in the first bound, we first fix λj, take the operator norm, and the take the L2 norm in λj):


Moreover, using (3.17), we may assume that h(j) is supported in |kjkj| ≲ NδLj. Note that if wj is of type (G), (wj^)kj(λj) can be also expressed in the same form as (6.2) but with hkjkj(j)=1kj=kjχ^(λj), except that the second equation in (6.3) is not true in this case.

  • (iii)

    Type (L), where (wj^)kj(λj) is supported in {|kj| ∼ Nj} and satisfies


Also such wj is a solution to Eq. (5.1).

  • (iv)

    Type (D), where (wj^)kj(λj) is supported in {|kj| ≲ Nj} and satisfies


Also such wj is a solution to Eq. (3.13).

Now, let the multilinear forms M, M<, M>, and M be as in (2.4), (3.3) and (3.9). The terms on the right-hand side of (3.13), apart from the first term in the second line of (3.13) which are trivially bounded, are as follows:

  1. Term


where wj can be of any type and max(N1, N2, N3) = N.

  1. Term


where wj can be of any type and max(N1, N2) = N.

  1. Term


where wj can be of any type and max(N1, N2, N3) ≤ N/2.

  1. Term


where wj can be of any type and max(N1, N2) ≤ N/2 and N3 = N.

  1. Term


where wj can be of any type and max(N1, N2) = N3 = N.

  1. Term


where w1 and w2 can be of any type, w3 has type (D), and max(N1, N2) ≤ N/2 and N3 = N.

  1. Term


where w1 and w2 can be of any type, w3 has type (D), and max(N1, N2) = N3 = N.

  1. Term VIII represents the last two lines of the right-hand side of (3.13).

Our goal is to recover the bound for zN in (3.18) for each of the terms I–VIII mentioned above. In doing so, we will consider two cases. First is the no-pairing case, where if w1 and w2 are of type (C) or (G) and, hence, expanded as in (6.2), then we assume that k1k2; similarly, if w2 and w3 are of type (C) or (G), then we assume that k2k3. The second case is the pairing case which is when k1 = k2 or k2 = k3 (the over-pairing case where k1 = k2 = k3 is easy, and we shall omit it). We will deal with the no-pairing case for terms I–VII in Secs. VI AVI C, the pairing case for these terms in Sec. VI D, and term VIII in Sec. VI E.

We start with the no-pairing case.

1. Preparation of the proof

We start with some general reductions in the no-pairing case. Recall as in Sec. III D that we can always gain a smallness factor from the short time τ ≪ 1 and can always ignore losses of (N*)Cκ1, provided that we can gain a power Nε/10 (which will be clear in the proof). We will consider IχM()^(w1,w2,w3)k(λ), where M() can be one of ΠM, ΠM<, ΠM>, ΠM, and Π(M<M), with Π being a general notation for projections for ΠN, ΠN/2, and ΔN,


where Ω = |k|2 − |k1|2 + |k2|2 − |k3|2 and ∑(⋆) is directly defined based on the definitions of M, M<, M>, and M and the selection of Π. For example, if M() is ΠNM>, then there will be two more restrictions |k| ≤ N and ⟨k1k2⟩ > N1−δ in the sum ∑(⋆). The other ∑(⋆) will defined in the similar ways.

Before going into the different estimates for I–VII, we first make a few remarks.

  • If a position wj has type (L) or (D), then in most cases, we only need to consider type (L) terms since (6.5) is stronger than (6.4); there are exceptions that will be treated separately later.

  • wj of type (G) can be considered as a special case of type (C) when hkjkj(j)(λj)=1Nj/2<kjNj1kj=kjχ^(λj); if we avoid using the kjkj2 norm in (6.3), then we only need to consider type (C) terms.

  • Term I can be estimated in the same way as term II. In fact, the definition of M> implies max(N1, N2) ≥ N1−δ, so we are essentially in (special case of) term II up to a possible loss N, which will be negligible compared to the gain. Moreover, term V can be estimated similarly as term IV; see Sec. VI C.

  • Terms VI and VII are readily estimated using the XαXα bounds for the linear operator (3.19) proved in Secs. IV A and IV B.

Based on these remarks, from now on, we will consider terms II–IV (and VIII at the end), where the possible cases for the types of (w1, w2, w3) are (a) (C, C, C), (b) (C, C, L), (c) (C, L, C), (d) (L, C, C), (e) (L, L, C), (f) (C, L, L), (g) (L, C, L), and (h) (L, L, L).

In Sec. VI B, we will estimate term II, which can be understood as high–high interactions in view of max(N1, N2) = N, and noting that assuming k is the high frequency, then either k3 is also high frequency or |k1k2| must be large. In Sec. VI C, we will estimate terms III and IV by using a counting technique in a special situation called Γ-condition [see (6.18)]. In Sec. VI D, we consider the pairing case.

We will estimate term II in this subsection. First, we can repeat the arguments for λ, λj, and the Duhamel operator I in (6.6) as in Secs. IV and V. Namely, we first restrict to |λj| ≤ N100 and |λ|, |μ| ≤ N100, where μ = λ − (Ω + λ1λ2 + λ3), and replace the unfavorable exponents (1 − b or b depending on the context) by the favorable ones (b or 1 − b) and then exploit the resulting integrability in λj to fix the values of λ, λj, and ⌊μ⌋. Then, we reduce to the following expression where Ω0 is a fixed integer:


where hb is the base tensor thatcontains the factors


We assume that hb is supported in the set where |kj| ≤ Nj and ⟨k1k2⟩ ∼ R, where R is a dyadic number. Moreover, we assume that R and the support of hb satisfy the conditions associated with the definition of some M(). In view of the factor |Vk1k2|Rβ in hb, we also define hR,(⋆)Rβ · hb, which is essentially the characteristic function of the set


possibly with extra conditions determined by the definition of M(). We also define SkR to be the set of (k, k1, k2, k3) ∈ SR with fixed k and, similarly, define Sk1k2R. Noting that when wj has type (G), (C), or (L), we can further assume that |kj| > Nj/2 in the definition of SR.

The goal now is to bound the norm Xk or abbreviated Xkk, assuming that wj satisfy bounds (6.1)–(6.5) but without the λj component, for example, (6.5) becomes wjkjNj1/2+ε1.

1. Case (a): (C, C, C)

In this case, we have


where hkjkj(j)=hkjkj(j)(ω) satisfies (6.3) with some Nj and LjNj/2 for 1 ≤ j ≤ 3 and hkk1k2k3R,() is defined as mentioned above.

To estimate Xk, we would like to apply Proposition 2.8 and then Proposition 2.7. Like in Secs. IV A and V, the way we apply Proposition 2.8 depends on the relative sizes of Nj (1 ≤ j ≤ 3). For example, if N1 = N2 = N3, we shall apply Proposition 2.8 jointly in the (k1, k2, k3) summation in (6.9); if N1 = N3 > N2, we will first apply Proposition 2.8 jointly in the (k1, k3) summation and then apply it in the k2 summation, and if N3 > N1 > N2, we will apply first in the k3 summation, then in the k1 summation, and then in the k2 summation. The results in the end will be the same in all cases, so, for example, we will consider the case N3 > N1 > N2. Now, we have




By the independence between gk3 and H̃kk3hk3k3(3) since N3 > N1 > N2, we apply Proposition 2.8 and Proposition 2.7 and get τ−1N*-certainly that


Similarly, by the independence between gk1 and Ḧkk1k3hk1k1(1) since N1 > N2 and also by the independence between gk2 and hkk1k2k3R,()hk2k2(2)¯, once again we can apply Proposition 2.8 and Proposition 2.7 to H̃kk3kk3 and then to Ḧkk1k3. As a consequence, we have τ−1N*-certainly that


In the other cases, we get the same bound. Without loss of generality, we may assume that N1 = N, and then, using Lemma 2.5, we can estimate that


which implies that XkkN1+CδN31/2N1/2+Cδ, which is enough for (3.18).

2. Case (b): (C, C, L)

In this case, we have


where hkjkj(j)=hkjkj(j)(ω) satisfies (6.3) with some Nj and LjNj/2 for 1 ≤ j ≤ 2 and the base tensor hkk1k2k3R,() is defined as before. Clearly, Xkk can be bounded by N31/2+ε1+ε2 times the norm,


By applying Propositions 2.8 and 2.7 again, in the same manner as (VIB1), we get that the above norm is bounded by


By Lemma 2.5, we can conclude that


and hence, we easily get XkkN1+Cε1, which is enough for (3.18).

3. Case (c): (C, L, C) and case (d): (L, C, C)

The estimates of case (c) and case (d) are similar to case (b), so we will state the estimates in case (c) and case (d) without proofs. In case (c), we get

XkkN21/2+ε1+ε2Rβ(N1N3)1  ×max(hR,()kk1k2k3,hR,()kk1k2k3,hR,()kk3k1k2,hR,()kk1k3k2),

and in case (d), we get a similar bound, but with subindices 1 and 2 switched.

Now, by Lemma 2.5, we can obtain that


In the first case, we directly get that


which is enough for (3.18) as max(N1, N2) = N and N1−δRNε (then, we have N1N2N) in view of the definition of M<M. In the second case, we get


which is also enough for (3.18) as max(N1, N2) = N and RNε. In the third case, we get


which is also enough for (3.18). By switching indices 1 and 2, we also get the same estimates in case (d).

4. Case (e): (L, L, C)

In this case, we have


where hk3k3(3)=hk3k3(3)(ω) satisfies (6.3) with some N3 and L3N3/2 and the base tensor hkk1k2k3R,() is defined as before. By symmetry, we may assume N1N2, and then, by the same argument as above, using Propositions 2.7 and 2.8, we can bound


By Lemma 2.5, both tensor norms are bounded by min(N1, R)N3; as N1N2 (and hence, N2 = N) and RNε, it is easy to check that this bound is enough for (3.18).

5. Case (f): (C, L, L) and case (g): (L, C, L)

The estimates of cases (f) and (g) are similar to case (e), so we will state the estimates of case (f) and (g) directly. Again the two cases here only differ by switching indices 1 and 2, so we only consider case (f). Like in case (e), we get two bounds,




Now, if N3Nε2, we will apply the first bound and use that


so the factor N11N21/2+ε1+ε2min(N1,N2), together with N31/2+ε1+ε2, where N3Nε2, provides the bound that is enough for (3.18). Moreover, the same bound also works if N2N1ε2 (since in this case, N1 = N).

If N3Nε2 and N2N1ε2, we will apply the second bound and use that


assuming that N3Nε2. This is also enough for (3.18) assuming that N2N1ε2 and RNε.

6. Case (h): (L, L, L)

In this case, we have


where the base tensor hkk1k2k3R,() is defined as before. Then, simply using Proposition 2.7, we get


By Lemma 2.5, we have hkk1k2k3R,()kk2k1k3(Rmin(N1,N2))1/2, which implies that


which is enough for (3.18) because max(N1, N2) = N and RNε.

In this section, we estimate terms III and IV. These two terms are actually similar, and the key property that they satisfy is the so-called Γ condition. Namely, due to the projections and assumptions on the inputs in terms III and IV, we have that

|k|2Γ|k3|2 for all(k,k1,k2,k3)Sor|k|2Γ|k3|2 for all(k,k1,k2,k3)S

for some real number Γ, where S is the support of the base tensor hb [note that in term IV, we may assume that w3 is not of type (D) as, otherwise, the bound follows from what we have already done, so here, we may choose Γ = (N/2)2 − 1].

To proceed, we return to IχM()^(w1,w2,w3)k(λ) in (6.6), where Ω = |k|2 − |k1|2 + |k2|2 − |k3|2, and suppose that μ = λ − (Ω + λ1λ2 + λ3), and then, we have |I| ≲ ⟨λ−1μ−1 by (2.10). Following the same reduction steps as before, we can assume that |λ|, |λj|(j = 1, 2, 3), |μ| ≤ N100 and may replace the unfavorable exponents by the favorable ones. Now, instead of fixing each λj and λ and ⌊μ⌋, we do the following.

Without loss of generality, we may assume that |λ3| is the maximum of all the parameters |λj| and |λ| and |μ|; the other cases are treated similarly. We may fix a dyadic number K and assume that |λ3| ∼ K. Then, we may fix λj (j ≠ 3) and λ and ⌊μ⌋, again using integrability in these variables, and exploit the weight λ3b in the weighted norms in which w3 is bounded and reduce to an expression


where (w3̃)k3(λ3)=Kb(w3^)k3(λ3) and hkk1k2k3R,K,()(λ3) is essentially the characteristic function of the set (with possibly more restrictions according to the definition of M())


where Ω0 is a fixed number such that |Ω0| ≲ K. We also define the sets SkR,M to be the set of (k1, k2, k3, λ3) such that (k, k1, k2, k3, λ3) ∈ SR,M for fixed k. Note that when wj is of type (C), (G) or (L), we can further assume that Nj2<|kj|Nj.

The idea in estimating (6.19) is to view (k3, λ3) as a whole (say, denote it by k3̃), which will allow us to gain using the Γ condition in estimating the norms of the base tensor hR,K,(⋆). Although our tensors here involve the variable λ3R, it is clear that Propositions 2.6 and 2.7 still hold for such tensors, and Proposition 2.8 can also be proved by using a meshing argument (see Sec. III D, where the derivative bounds in λ3 are easily proved as all the relevant functions are compactly supported in physical space). Moreover, by the induction hypothesis and the manipulation mentioned above (for example, with the Y1−b norm replaced by the Yb norm), we can also deduce corresponding bounds for w3=(w3)k3̃ and the corresponding matrices such that h̃k3̃k3(3), for example, h̃k3̃k3(3)k3k3̃L31/2+3ε1. Because of this, in the proof below, we will simply write k3̃, while we actually mean k3dλ3, so the proof has the same format as the previous ones.

We now consider the input functions. In term III, clearly, max(N1, N2, N3) ≳ N; if N3N, then we must have max(N1, N2) ≳ N and |k1k2| ≳ N, and hence, this term can be treated in the same way as term II. Therefore, we may assume that N3N, and clearly, the same happens for term IV. If max(N1, N2) ≳ N, then again using the term II estimate, we only need to consider the case where |k1k2| ≲ Nε. This term can be treated using similar arguments as below and is much easier due to the smallness of |k1k2|, so we will only consider the case max(N1, N2) ≪ N. In the same way, we will not consider term V here. Finally, if w3=zN3, with N3N, then (3.18) directly follows from the linear estimate proved in Sec. IV A and the Γ condition is not needed.

There are two cases: when w3 has type (L) or or w3 has type (C) [or (G)]. In the latter case, there are four further cases for the types of w1 and w2, which we will discuss below.

1. The type (L) case

Suppose that w3 has type (L). Clearly, if max(N1,N2)N100ε2, then (3.18) also follows from the linear estimates in Sec. IV A [because the difference between the ρN bound and the zN bound in (3.18) is at most Nε2], so we may assume that max(N1,N2)N100ε2. Then, in (6.19), we may further fix the values of (k1, k2) at the price of NCε2, and hence, we may write


and by definition, it is easy to see that hk3̃k1. Then, (3.18) follows, using the bound for w3, if KNε12. Finally, if KNε12, then we have |Ω|Nε12, where Ω = |k|2 − |k1|2 + |k2|2 − |k3|2. Using the Γ condition (6.18), we conclude that |k3|2 belongs to an interval of length NO(ε12), so we can apply Proposition 5.3 to gain a power Nε1/2, which covers the loss NO(ε2+ε12) and is enough for (3.18).

2. The type (C, C, C) case

Now, suppose that w1, w2, and w3 have type (C, C, C). By symmetry, we may assume that N1N2. Then, by the same argument as in Sec. VI B 1, we obtain that


The last three factors are easily bounded by 1, so it suffices to bound the tensor hR,K,(⋆).

By definition, this is equivalent to counting the number of lattice points (k, k1, k2, k3) such that k1k2 + k3 = k (and also satisfying the inequalities listed above) and |Ω| ≲ K. Note that


so when KK1, by the Γ condition, |k|2 has at most K1 choices, and hence, k has at most K1N choices. Once k is fixed, the number of choices for (k1, k2, k3) is at most KN12R2, which leads to the bound


If, instead, KK1, then k has at most KN choices, and once k is fixed, the number of choices for (k1, k2, k3) is at most N13R3, so we get


In either way, we get


which is enough for (3.18) as max(R, N1) ≲ N2.

3. The type (L, L, C) case

Now, suppose that w1, w2, and w3 have type (L, L, C). First, assume that N1N2. The same arguments in Sec. VI B 4 yields


The second norm mentioned above is easily bounded by K1/2RN1 using Lemma 2.5, which is clearly enough for (3.18); for the first norm, there are two ways to estimate.

The first way is to use Lemma 2.5 directly, without using the Γ condition, to get


The second way is to use the Γ condition and first fix the value of |k|2 and then we fix k and then count (k1,k3̃). This yields


assuming KK1 and a better bound assuming KK1. Now, plugging in the second bound yields


which can be shown to be ≲N−1/2 using the fact max(R, N1) ≤ N2 and by considering whether RN1 or RN1. Moreover, the same estimate can be checked to work if N1N21.1. If N1N21.1, we can switch subscripts 1 and 2, in which case, we have the weaker bound,


without the 1/2 power in the last factor; however, this is still ≲N−1/2, provided that N1N21.1.

4. The type (L, C, C) and (C, L, C) cases

Now, suppose that w1, w2, and w3 have type (L,C,C); the case (C,L,C) is treated similarly. Here, the same arguments in Sec. VI B 3 implies

XkkN11N21/2+ε1+ε2N1RβKb  ×max(hR,K,()kk1k3̃k2,hR,K,()kk1k2k3̃,hR,K,()kk1k2k3̃,hR,K,()kk3̃k1k2).

The two norms kk1k2k3̃ and kk1k2k3̃ can be estimated by K1/2R min(N1, N2), using Lemma 2.5 only and without the Γ condition, which is clearly enough for (3.18). For the kk1k3̃k2 norm, we can use the estimates in Sec. VI C 3 and get