This work opens a series of papers where we develop a general quasi-optical theory for mode-converting electromagnetic beams in plasma and implement it in a numerical algorithm. Here, the basic theory is introduced. We consider a general quasimonochromatic multicomponent wave in a weakly inhomogeneous linear medium with no sources. For any given dispersion operator that governs the wave field, we explicitly calculate the approximate operator that governs the wave envelope *ψ* to the second order in the geometrical-optics parameter. Then, we further simplify this envelope operator by assuming that the gradient of *ψ* transverse to the local group velocity is much larger than the corresponding parallel gradient. This leads to a parabolic differential equation for *ψ* (“quasioptical equation”) on the basis of the geometrical-optics polarization vectors. Scalar and mode-converting vector beams are described on the same footing. We also explain how to apply this model to electromagnetic waves in general. In the next papers of this series, we report successful quasioptical modeling of radio frequency wave beams in magnetized plasma based on this theory.

## I. INTRODUCTION

### A. Motivation

Describing the propagation of waves in inhomogeneous media is a classic problem with a long history.^{1–3} It is particularly important in fusion research, where quasistationary beams of electromagnetic (EM) radiation are commonly used for many purposes and need to be modeled with fidelity.^{4} Full-wave modeling, which involves solving complete Maxwell's equations, can be impractical at short (cm and mm) wavelengths, especially when multidimensional simulations with complex geometries are required, such as those of tokamak and stellarator plasmas. Hence, reduced methods have been widely used in practice. These methods are rooted in geometrical optics (GO)^{1} and include conventional ray tracing,^{5–8} complex ray tracing,^{9,10} beam tracing,^{11–14} and variations thereof.^{15} There are also other “quasioptical” models, such as in Refs. 16–19, which resolve the evolution of the beam transverse structure without adopting any particular ansatz for the intensity profile. Still, they assume that only one branch of the dispersion relation is excited in each given case,^{20} i.e., mode conversion does not occur.^{21} As a result, today's simulations of mode-converting beams mainly rely on full-wave codes^{19} and thus have to compromise the fidelity by reducing the number of dimensions resolved.^{8}

However, simulating mode conversion does not actually require the full-wave approach; in fact, it can be done within the quasioptical approach too if the latter is generalized properly. Developing the corresponding algorithms would facilitate not only fusion applications. For example, advanced quasioptical models could find use in optics and general relativity, where mode-converting beams are also possible and have been attracting much attention lately.^{22,23} Hence, it is potentially beneficial to approach the problem of quasioptical modeling as a general-physics problem, without restricting it to fusion applications.

### B. General idea

Mode conversion can be described quite generally, particularly without restricting it to narrow regions in space, within “extended geometrical optics” (XGO), which was developed recently.^{24–28} XGO is a theory which calculates the leading-order correction *U* to the GO dispersion operator of a *general* vector wave and shows^{26} that this correction is analogous to (and a generalization of) the Stern–Gerlach Hamiltonian of a quantum spin-1/2 electron. Accordingly, *U* is responsible for two effects simultaneously: (i) it modifies the ray equations just like spin–orbital interactions affecting the electron motion and (ii) it also governs mode conversion, which appears as a direct analog of spin up–down transitions.^{29} An examination of the quasioptical algorithms such as those in Refs. 12 and 16 shows that they already involve calculations of terms similar to *U*. Hence, adding the mode-conversion capability to quasioptical codes should not be burdensome and is not expected to slow down the codes considerably. However, formulating the corresponding theory is easier to do using the abstract quantumlike formalism of XGO. One only needs to upgrade the existing XGO by adding diffraction, for which it also helps to introduce more general coordinates with curved metric.^{30}

In this series of papers (Papers I–III), we propose such an upgrade of XGO and apply it to numerical simulations. Our goal is to develop a framework which later could be applied to a broad variety of problems both in plasma physics and beyond. Within this framework to be presented, one can assume general dispersion, diffraction, and polarization effects, including mode conversion of not just two but many arbitrary resonant waves. Also importantly, our formulation below is not restricted to EM waves since we do not specify the dispersion operator in the governing wave equation.

Our series of papers is organized as follows. In Paper I, we introduce the basic theory of waves that diffract and mode-convert simultaneously. In Paper II^{31} and Paper III,^{32} we apply this theory to perform quasioptical modeling of radiofrequency-wave beams in magnetized plasma as an example. In particular, we consider applications to mode conversion caused by magnetic shear in edge plasma,^{29} which, for example, is a known problem^{33,34} in the Large Helical Device.^{35,36}

### C. Outline

In this first paper of our series, we consider an arbitrary quasimonochromatic multicomponent wave in a weakly inhomogeneous linear medium. Supposing that $D\u0302$ is some dispersion operator governing the wave dynamics, we simplify $D\u0302$ and obtain an approximate operator that governs the wave envelope *ψ*. Then, we derive a parabolic differential equation for *ψ* (“quasioptical equation”) by assuming that the gradient of the wave envelope transverse to the local group velocity is much larger than the corresponding parallel gradient. The resulting theory applies to both scalar and mode-converting vector beams. At the end of the paper, we also discuss how this model can be applied to EM waves in particular. However, readers who are mainly interested in simulations (as opposed to the general theory) are encouraged to proceed straight to Paper II, where our key equations are overviewed in a simplified form and without derivations.

This paper is organized as follows: In Sec. II, we introduce the general problem. In Sec. III, we formalize the concept of the envelope dispersion operator. In Sec. IV, we derive an approximation for the envelope dispersion operator for scalar waves, and we also derive its quasioptical approximation. In Sec. V, we extend this model to vector waves. In Sec. VI, we explain how to apply the resulting theory to EM waves in particular. In Sec. VII, we present our main conclusions. In Appendix A, we summarize some of our notations. In Appendix B, some auxiliary calculations are presented. Our paper also contains the supplementary material. There, we overview the Weyl calculus on a curved configuration space, which is used in this work.

## II. GENERAL PROBLEM

Consider a wave propagating on an *n*-dimensional configuration space *M ^{n}* with coordinates $x\u2261{x0,x1,\u2026,xn\u22121}$ and some general metric tensor $g(x)$. For simplicity, assume that

*M*is diffeomorphic to $\mathbb{R}n$, i.e., the

^{n}*n*-dimensional Euclidean space or pseudo-Euclidean space with the same metric signature as

*M*. (For more information on why the diffeomorphism with $\mathbb{R}n$ is needed, see Sec. IV A 2.) Suppose that the wave field $\Psi \u2261\Psi (x)$, which may have multiple components, is governed by a linear equation with no source terms,

^{n}where $D\u0302$ is a differential or, more generally, integral dispersion operator. (For vector waves, $D\u0302$ is a matrix whose elements are operators; see Sec. V.) We shall assume that the GO parameter *ϵ* is small, namely,

(the symbol $\u2009\u2250\u2009$ denotes definitions), where *λ* is the characteristic wave period, or wavelength, and *L* is the least characteristic scale among those of the wave envelope and of the medium, including the metric.^{37} Below, we propose a systematic reduction of Eq. (1) using the smallness of *ϵ* and eventually obtain a quasioptical model based on this equation. The idea of quasioptical modeling will be formalized later (Secs. IV E and V F).

As a side note, we emphasize that the theory to be developed does not describe wave transformations near cutoffs, where the GO parameter (2) is not small. It is possible to waive this limitation, but formulating such generalized theory is left to future publications.

## III. ENVELOPE DISPERSION OPERATOR

As the first step, let us introduce a unitary variable transformation

(Other $U\u0302$ may also be justified in some cases, e.g., for dealing with caustics or quasiperiodic media,^{38} but we shall not consider this possibility in the present work.) The phase *θ*, which we call the “reference phase,” serves as a gauge potential. It is a real function such that

is the wave vector identical or close to that predicted by the GO approximation. This implies that the envelope *ψ* and also **k** are slow functions [and that the wavelength substituted in Eq. (2) is $\lambda \u22482\pi /k$]. Then, Eq. (1) becomes

where the “envelope dispersion operator” is $D\u0302\u2009\u2250\u2009U\u0302\u2020D\u0302U\u0302$ (the dagger denotes the adjoint, as usual) or, more explicitly,

The reference phase *θ* is treated as a prescribed function. As it will become clear later, knowing *θ per se* is not actually needed for our purposes; instead, it is **k** that matters. The latter can be calculated on some “reference rays” using the conventional ray equations,^{1}

(*τ* is any parameter along the ray), which also leads to

Note that, in general, *θ* and **k** are defined uniquely only to the leading order, and so there exists some freedom in choosing reference rays and their Hamiltonian *H*. This means that more than one $D\u0302$ is possible. Still, envelope equations that (slightly) differ in the choice of **k** are equivalent in the sense that the *total* field Ψ that they describe is the same by construction. We shall discuss this in more detail in Secs. V E and V G.

Also note that our approach equally applies to stationary and nonstationary waves. In the case of a stationary wave, we assume that **x** is a coordinate in physical space (“spatial problem”) and the wave frequency *ω* serves as a constant parameter. In the case of a nonstationary wave, we assume that **x** is a coordinate in spacetime (“spacetime problem”), and then, *ω* is a part of **k**. In spacetime problems, we assume coordinates such that $x0=ct$, where *c* is the speed of light, *t* is the time, and the metric signature is $(\u2212,+,+,\u2026)$; then, $k0=\u2212\omega /c$ (in the case of the Minkowski metric, this implies that $k0=\omega /c$), and so $\omega =\u2212\u2202t\theta $, as usual. In other respects, spatial and spacetime problems are described on the same footing and will be distinguished only in Sec. VI.

Having defined this terminology, we shall now discuss how $D\u0302$ can be asymptotically expanded in *ϵ* for any $D\u0302$.

## IV. SCALAR WAVES

Our asymptotic theory of mode-converting beams is based on the well-known phase-space methods in quantum and classical wave theory. (For recent overviews, see Refs. 1 and 24. Specific key papers include, but are not limited to, Refs. 39–44. In particular, Ref. 43 is one of the first papers which explicitly discuss applications of the Weyl calculus to plasma problems; although also see Refs. 45–47.) In order to present our framework and notation in a self-contained manner, we restate some basics in Sec. IV A and the supplementary material. Also note that our calculations in Sec. IV B are similar in spirit to those in Ref. 43, except that we perform a higher-order expansion and assume a more general metric. We shall also comment throughout this paper on how our equations reproduce other known results in the corresponding limits.

### A. Weyl calculus

#### 1. Basic definitions

Until Sec. V, we shall assume that Ψ is a *scalar* function. Any given operator $A\u0302$ acting on it maps Ψ to a new scalar function $A\u0302\Psi $ which can be expressed as follows:

Here, the integral is taken over $\mathbb{R}n$ (and so are all integrals below, up to dimensions), $g\u22c4\u2009\u2250\u2009|det\u2009g\u2009|$, and $A$ is some kernel function that determines $A\u0302$. Also, consider a family of all unitary operators $A\u0302u$, which is a subset of all possible $A\u0302$. For a given Ψ, all image functions $A\u0302u\Psi $ are mutually equivalent up to an isomorphism, and so their family ${A\u0302u\Psi}$ can be viewed as a single object, a “state vector” $|\Psi \u27e9$, which belongs to a Hilbert space $H1n$ with the inner product

Then, Eq. (9) can be viewed as the “**x** representation” of $A\u0302$, while the operator itself can be understood more generally as a transformation of $|\Psi \u27e9$, i.e., of the whole family ${A\u0302u\Psi}$. Using this invariant notation, one can formulate a machinery, called the Weyl calculus,^{48} which allows efficient asymptotic approximation of operators using the smallness of *ϵ*. Below, we overview the key theorems of the Weyl calculus which are used in our paper. Readers who are interested in details and proof of these theorems can find them in the supplementary material.

#### 2. Coordinate and momentum operators

We start by defining the coordinate and momentum (wave-vector) operators,

such that the **x** representations of $x\u0302\mu $ and $p\u0302\mu $ are as follows:

Here, $\u2202\mu \u2009\u2250\u2009\u2202/\u2202x\mu ,\u2009x\mu $ and *p _{μ}* are the corresponding eigenvalues, and the factors $g\u22c4\xb11/4$ are introduced to keep $p\u0302$ self-adjoint under the inner product (10).

^{49}(This would not be the case if

*M*were not diffeomorphic to $\mathbb{R}n$.

^{n}^{50}) Since

and *q _{ν}* commutes with $x\u0302\mu $, the usual commutation relation $[x\u0302\mu ,p\u0302\nu ]=i\delta \nu \mu $ is satisfied.

Let us consider the eigenvectors $|x\u27e9$ and $|p\u27e9$ of the coordinate and momentum operators, which are defined as

Since the operators are self-adjoint, these eigenvectors can be chosen as mutually orthogonal, and we shall assume the following normalization:

Here, we introduced

The function $g\xaf\u22c4$ can be chosen arbitrarily as long as it is kept positive. It plays a role similar to that of $g\u22c4$ in the Weyl calculus, but note that this is just a normalization factor, and we introduced it only to maintain the symmetry between $x\u0302$ and $p\u0302$. One can show then that our original function Ψ and the envelope *ψ* can be expressed through the corresponding state vectors as

The **p** representations of these state vectors are introduced similarly as $\u27e8p|\Psi \u27e9$ and $\u27e8p|\psi \u27e9$. One can also show that $\u27e8x1|A\u0302|x2\u27e9=A(x1,x2)$ and

Here, $p\xb7x\u2009\u2250\u2009p\mu x\mu $, and summation over repeated indices is assumed from now on.

#### 3. Wigner–Weyl transform

The set of all eigenvalues of the coordinate and momentum operators form a 2*n*-dimensional “phase space” $z\u2261(x,p)$. For every given *z*, we introduce the so-called *Wigner operator* $\Delta \u0302z$, which is self-adjoint and defined as

(Note that *G *=* *1 if the **x** space is Euclidean or pseudo-Euclidean.) Using $\Delta \u0302z$, we define the *Wigner–Weyl transform*$Wz:A\u0302\u21a6A$, which maps a given operator $A\u0302$ on $H1n$ to a function *A* (“Weyl symbol”) on the *z* space. Specifically, the Weyl symbol of a given operator $A\u0302$ is $A(x,p)\u2009\u2250\u2009tr(\Delta \u0302zA\u0302)$ (“tr” stands for trace); i.e.,

As can be seen easily from this definition, if a given operator is self-adjoint, then its Weyl symbol is real. This also leads to the following corollary. Consider splitting a given $A\u0302$ as $A\u0302=A\u0302H+iA\u0302A$, where the subscripts denote the Hermitian and anti-Hermitian parts,

Both $A\u0302H$ and $A\u0302A$ (not to be confused with $iA\u0302A$) are self-adjoint by definition. Thus, the corresponding Weyl images *A _{H}* and

*A*are real.

_{A}We also define the *inverse Wigner–Weyl transform* $W\u22121:A\u21a6A\u0302$, which maps a given function *A* to the corresponding operator $A\u0302$ via

The direct and inverse transforms set the “Weyl correspondence” between operators and functions on the $(x,p)$ space, $A\u0302\u21d4A(x,p)$. As can be checked by a direct calculation, for any function *f*, one has

However, the Weyl symbols of operators that cannot be represented as $f1(x\u0302)+f2(p\u0302)$ are generally more complicated. In particular, one can show that (see the supplementary material)

and also

Overall, the Weyl symbol of an operator, that is, *any* given combination $f(x\u0302,p\u0302)$ of $x\u0302$ and $p\u0302$, approaches $f(x,p)$ in the GO limit, when $[x\u0302\mu ,p\u0302\nu ]$ is negligible. However, in general, $f(x\u0302,p\u0302)$ does not map simply to $f(x,p)$.

### B. Approximate $D\u0302$

Using the notation introduced in Sec. IV A, one can express Eq. (1) in the following invariant form:

Accordingly, Eq. (5) becomes

where the invariant form of the envelope dispersion operator [Eq. (6)] is as follows:

We shall now approximate $D\u0302$ in three steps: (i) we map the right-hand side of Eq. (34) onto a function, or symbol, using the Wigner–Weyl transform; (ii) we approximate this symbol using the smallness of *ϵ* [Eq. (2)]; and (iii) we produce an operator out of the approximated symbol using the inverse Wigner–Weyl transform.

The first two steps can be done by expressing the Weyl symbol of $D\u0302$ as $D(x,p)=e\u2212i\theta (x)\u2009\u22c6\u2009D(x,p)\u2009\u22c6\u2009ei\theta (x)$, where *D* is the Weyl symbol of $D\u0302$ and $\u22c6$ is the Moyal product (see the supplementary material). By expanding $\u22c6$ in the GO parameter, one can then obtain an approximate symbol of $D\u0302$ formally. But here we adopt a different approach, which is more transparent. We start by expressing $D\u0302$ through *D* explicitly,

where *G* is a metric factor given by Eq. (24). Using the fact that $\theta (x\u0302)|x\xb1s/2\u27e9=\theta (x\xb1s/2)|x\xb1s/2\u27e9$, one can rewrite Eq. (34) as follows:

Consider a formal Taylor expansion of the reference phase *θ* in **s**,

This gives

and so

This leads to the following expression for the envelope dispersion operator:

where we integrated by parts and introduced an “effective dispersion function,”

The first term on the right-hand side of Eq. (41) is *O*(1) because at $\u03f5\u21920$, it becomes the GO dispersion function, which is assumed order-one.^{51} (At nonzero *ϵ*, the Weyl symbol *D* generally does *not* coincide with the dispersion function that governs the wave in a homogeneous medium; see Sec. IV A 3.) The second term can be estimated by assuming $\u2202/\u2202x\u223c1/L$ and $\u2202/\u2202p\u223c1/k$, since we are interested in $D\u2032(x,p)$ at **p** close to **k**. Then,

meaning that the second term in Eq. (41) is roughly *ϵ*^{2} times smaller than the first one. This fact will be used below.

The inverse Wigner–Weyl transform will turn the coordinate **p** into the operator $p\u0302$. The latter will act on the wave envelope [Eq. (33)], which is considered slow in the coordinate representation, and so $p\u0302|\psi \u27e9=O(\u03f5)$. In this sense, $p=O(\u03f5)$, and so by Taylor-expanding $D\u2032$ in **p**, we are effectively expanding $D\u0302$ in *ϵ*. It is sufficient for our purposes to adopt the second-order expansion,

where we introduced $D\u2032(x)\u2009\u2250\u2009D\u2032(x,k(x))$ and

By properties of the inverse Wigner–Weyl transform $W\u22121$ (Sec. IV A 3), $D\u0302$ can be written as follows:

The inverse Wigner–Weyl transforms here can be calculated using Eqs. (28)–(31) and $\Theta \mu \nu =\Theta \nu \mu $. This leads to

One can further simplify Eq. (47) as follows. Suppose that the medium, including the metric, is characterized by some parameters $P$. Within the accuracy of the theory developed in this paper, we shall neglect terms involving $\u22022P$ and $(\u2202P)(\u2202\psi )$ while generally retaining terms such as $\u22022\psi $ (as to be explained Sec. IV E 2), where *∂* denotes a generic spatial derivative. Hence, we can ignore the difference between *D* and $D\u2032$; we can also omit $\u2202\mu \nu 2\Theta \mu \nu $ in Eq. (47), and so $D\u0302$ can be expressed as follows:

Also, $V\mu $ and $\Theta \mu \nu $ can be calculated through the zeroth-*ϵ* limit of *D*, denoted *D*_{0}, which is just the dispersion function of the homogeneous medium,

Importantly, $D0(x,k)$ can depend only on **k** and on local parameters of the medium (but not on their derivatives). As a gradient of a scalar [Eq. (4)], **k** is a true covector; hence, $D0(x,k)$ is a true scalar. This makes $V\mu $ a vector and $\Theta \mu \nu $ a tensor within the assumed accuracy.

As a side remark, note that vectors like $V\mu $, which are denoted with upper indices, belong to the space tangent to *M ^{n}*. (Such vectors must not be confused with multicomponent fields denoted with Latin indices in Sec. V.) Similarly, covectors like

*k*, which are denoted with lower indices, belong to the space cotangent to

_{μ}*M*. For any given

^{n}*X*, one does not have to distinguish

*X*and

^{μ}*X*only if the metric is Euclidean. In the case of a general metric

_{μ}*g*, one has

_{μν}where the matrix $g\mu \nu $ is the inverse of $g\mu \nu $. Similar rules apply to manipulating tensors, as usual.

### C. Approximate envelope equation

Let us derive the **x** representation of Eq. (48), which we need to obtain an explicit form of Eq. (5). From the **x** representations of $x\u0302$ and $p\u0302$ given by Eqs. (13) and (14), terms like $p\u0302f(x\u0302)$ (for any *f*) must be interpreted as

where *g* and *ψ* are also functions of **x**. Then,

where all terms are functions of **x** and $;\mu $ denotes the covariant derivative with respect to *x ^{μ}*, so $X\mu ;\mu $ (for any given vector

*X*) is the divergence, namely,

^{μ}Except for *Dψ*, all terms in Eq. (51) are covariant (i.e., invariant with respect to coordinate transformations) within the accuracy of our theory. Hence, the symbol *D* is also covariant within the accuracy of our theory [this includes not only the zeroth-order dispersion function *D*_{0} but also the leading-order correction $D\u2212D0=O(\u2202P)$ that may appear for inhomogeneous media], even though this may not be obvious directly from the definition of the Weyl transform (Sec. IV A). To emphasize this fact, and also to ensure consistency of notation with the vector-wave theory to be discussed in Secs. V and VI, we denote this covariant function as $D$, and we also adopt the notation

Then, the approximate form of the envelope equation (5) can be summarized as follows:

Here, $D\u0302H$ and $D\u0302A$ are the Hermitian and anti-Hermitian parts of the approximated $D\u0302$, respectively, namely,

[from now on, we do not emphasize the approximate nature of Eq. (51), and so ≈ will be replaced with =], and

where we assume a standard notation

Note that $,\mu $ is the “full” derivative in the sense that, for any given $f(x,k(x))$, it applies to both the first and the second argument of *f*; namely, the chain rule leads to

The true partial derivatives are introduced as follows:

The derivatives $,\mu $ and $|\mu $ are equivalent for functions that depend only on **x**. In particular, $k\nu \u2261\u2202\nu \theta (x)$ satisfies

### D. Leading-order approximation: Geometrical optics

For the envelope approximation to hold, i.e., for $\psi (x)$ to remain a slow function, $k(x)$ must be chosen such that $D(x,k(x))\psi \u2272O(\u03f5)$. [The other terms in Eq. (5) are automatically small on the account of Eq. (2).] Let us adopt the usual GO ordering^{52}

or more rigorously, $DA\u2009\u2272\u2009O(\u03f5)$; i.e., $DA$ much smaller than *O*(*ϵ*) is also allowed.^{53} Then, one can define $k(x)$ such that

This can be achieved by calculating **k** as discussed in Sec. III with the ray Hamiltonian $H=DH$ and the initial condition [at any chosen $x(0)$ and $k(0)$] such that

To the first order in *ϵ*, Eq. (5) is

and as a corollary,

At zero $DA$, the dynamics is conservative, and Eq. (67) reflects conservation of the wave action, or quanta.^{54} Accordingly, $DA$ determines the dissipation rate, $J\mu $ can be identified (at least up to a constant factor) as the action flux density, and *V ^{μ}* is proportional to the group velocity (in space or in spacetime, depending on the problem). Equations (64)–(67) coincide with the equations of the traditional GO theory.

^{1,54}Below, we extend them by retaining the second-order terms neglected in Eq. (66).

### E. Quasioptical model

#### 1. Ray-based coordinates

Let us start by introducing ray-based coordinates as follows.^{55} Suppose multiple rays launched with different $x(0)$ within the beam. Their trajectories can be found using the ray equations as discussed in Sec. IV D; this determines the path *ζ* along each ray as a function of the initial coordinate. We treat *ζ* as the longitudinal coordinate along the wave beam, and each of its isosurfaces is considered as the transverse space $M\u22a5n\u22121(\zeta )$. The coordinates on this space, $\u03f1\u2261{\u03f11,\u03f12,\u2026,\u03f1n\u22121}$, can be introduced arbitrarily (yet they will be specified below). Then, we define *n* – 1 independent vector fields $e\sigma $ via

and another vector $e0$ such that it is linearly independent of all $e\sigma $. Hence, any $dx$ can be decomposed as follows:

and summation over repeated indices *σ* (and $\sigma \u0303$) is henceforth assumed from 1 to *n* – 1.

Let us also construct the dual basis ${e\mu}$. (We treat vectors and covectors on the same footing; namely, $X\xb7Y=X\mu Y\mu =X\mu Y\mu $ and $X=Y$ equally means $X\mu =Y\mu $ and $X\mu =Y\mu $.) By definition,

and so we adopt $e0\u2009\u2250\u2009\u2207\zeta $. Then, the general form of $e0$ is

where the first sign is determined by the metric signature and $\alpha \u2009\u2250\u2009|e0\xb7e0|\u22121/2,\u2009\beta \u2009\u2250\u2009e\sigma \beta \sigma $, and $\beta \sigma $ are arbitrary coefficients. This leads to the following metric representation in the coordinates ${\zeta ,\u03f1}$:

Here, ** h** is the matrix with elements $h\sigma \sigma \u0303\u2009\u2250\u2009e\sigma \xb7e\sigma \u0303$ and $T$ denotes the transposition. By using a known theorem for the determinant of a block matrix,

^{56}one obtains

Let us also introduce the transverse projection of $d\u03f1$,

where $1$ is a unit matrix and $e0e0$ is a dyad formed out of $e0$. Then, one obtains from Eq. (69) that

and so $h\sigma \sigma \u0303$ serves as the transverse metric. Below, we assume $\beta =0$, and so $\u03f1\u22a5\sigma $ and $\u03f1\sigma $ do not need to be distinguished, and we can define the inner product on $M\u22a5n\u22121$ as follows:

We also choose transverse coordinates specifically such that $\u03f1=const$. In other words, the coordinate $\u03f1$ of a given point in space is the initial location of the ray that arrives at this point from the initial transverse surface $M\u22a5n\u22121(0)$. Then, the transverse group velocity is zero,

#### 2. Quasioptical equation

Suppose a wave beam such that its longitudinal scale $L||$, which is defined via $\psi ,\zeta \u223c\psi /L||$, is much larger than its perpendicular scale $L\u22a5$, which is defined via $\psi ,\sigma \u223c\psi /L\u22a5$ and may or may not be the beam global width. (We assume the notation $\psi ,\zeta \u2009\u2250\u2009\u2202\psi /\u2202\zeta $ and $\psi ,\sigma \u2009\u2250\u2009\u2202\psi /\u2202\u03f1\sigma $.) We introduce two GO parameters,

[so the original parameter (2) is $\u03f5=\u03f5\u22a5$], and we shall neglect terms smaller than $O(\u03f5||\u03f5\u22a5)$ from now on. Also, we require that either *α* is chosen as independent of $\u03f1$, or, more generally, the relative variation of *α* across the wave beam is small enough such that $\alpha ,\sigma $ is negligible. We also assume

where the derivatives can be taken over the longitudinal or transverse coordinates. (Having $g\alpha \beta ,\mu \u223c1/L||$ is indeed common in typical applications.^{30})

Here, $V\u2009\u2250\u2009V0$, and we used $V\mu \psi ,\mu =V\psi ,\zeta $ due to Eq. (77). The operator $G\u0302$ is self-adjoint under the inner product (76) and given by

(We kept only the transverse derivatives in this term because the longitudinal ones are beyond the assumed accuracy.) Similarly, Eq. (56) becomes

where the higher-order terms are neglected because $DA$ is assumed small [Eq. (63)]. Although the last term in Eq. (82) is negligible within the assumed accuracy, it is retained anyway to make $D\u0302A$ (not to be confused with $iD\u0302A$) exactly self-adjoint under the inner product (76). Then, Eq. (5) becomes

and, as a corollary,

where we used the fact that $G\u0302$ and $D\u0302A$ are self-adjoint. In particular, for zero $DA$, Eq. (84) predicts conservation of the wave-action flux through the beam cross section,

Equation (84) indicates that $D\u0302A$ cannot be much larger than $\u2202/\u2202\zeta $, and so the correct scaling to assume for the dissipation term is $DA\u2009\u2272\u2009O(\u03f5||)$. Then, Eq. (83) leads to the “quasioptical scaling”

Hence, the difference between $D\u0302A$ and $DA$ in Eq. (82) is $\u2272\u2009O(\u03f53)$, and so it is beyond the accuracy of our theory, which is $O(\u03f52)$. For this reason, we henceforth adopt^{57}

which will also simplify our notation. [Also, as a reminder, $\u2202\mu g\alpha \beta \u2272\u2009O(\u03f5||)\u223c\u03f5\u22a52$, and the parameters of the medium vary similarly.] This leads to

Equation (88) is the general quasioptical equation for a scalar-wave beam in an inhomogeneous medium. Since it is a *parabolic* equation (it contains only the first-order derivative with respect to *ζ*), Eq. (88) is much easier to solve than Eq. (5) with the original expressions for $D\u0302H$ and $D\u0302A$ given by Eqs. (55) and (56).

#### 3. Simplified equations

A typical metric of interest for quasioptical modeling has the form $g\mu \nu =\delta \mu \nu +R\mu \nu \sigma \u03f1\sigma $ with $R\mu \nu \sigma =O(L||\u22121)$.^{30} This satisfies the assumed scaling (79) and corresponds to $ln\u2009g\u22c4=O(L\u22a5/L||)$. Then,

which is negligible. [In contrast, $(ln\u2009g\u22c4),\sigma =O(\u03f5||)$.] The derivatives of $g\u22c4$ in Eq. (81) are negligible too. Hence, the metric factors in the quasioptical equation (88) can be replaced with unity. (However, the effect of the metric on the dispersion matrix may remain important in this case; see Sec. VI.) Then, Eq. (88) becomes

Using the variable transformation $\varphi \u2009\u2250\u2009V\psi $, this equation can be further simplified as follows:

which is just a dissipative Schrödinger equation with

[In Eq. (91), we used that $V,\sigma \varphi ,\sigma \u0303$ is negligible compared to $V\varphi ,\sigma \u0303\sigma =O(\u03f5\u22a5)$.] Also, Eq. (84) becomes

Note that Σ can also be expressed alternatively as follows. Suppose one finds the group velocity and constructs a ray-based metric (as described in Sec. IV E 1) in the vicinity of a given **x**. Then, Eq. (64) can be considered as an equation for the local *k*_{0} as a function of the transverse wave-vector components $k\sigma $ (and **x**) in this prescribed metric; i.e.,

By differentiating this with respect to $k\sigma $, we obtain $k0|\sigma =\u2212DH|\sigma /DH|\zeta $. By differentiating the latter formula once again, now with respect to $k\sigma \u0303$, we also obtain

Recall that $DH|\sigma =V\sigma $ and $V\sigma =0$ [Eq. (77)]. Then, $k0|\sigma =0$, and by comparing Eq. (95) with Eq. (92) and using Eq. (57), one finds that $k0|\sigma \sigma \u0303=\u2212\Sigma \sigma \sigma \u0303$. Thus, Eq. (91) can be rewritten as

In the case of a spacetime problem, $k0|\sigma \sigma \u0303$ can also be expressed through the wave frequency in the laboratory frame. Then, the familiar Schrödinger equation for the wave envelope^{58} can be reproduced. We do not discuss it further because this subject is not essential to our paper.

## V. VECTOR WAVES

### A. Hilbert space for vector waves

Now, let us generalize the above results to vector waves. Suppose an *m*-component wave field $\Psi \u2261\Psi (x)$, so

Here, $|\Psi \u27e9$ is a vector on some Hilbert space $Hmn$ for which we assume the following general inner product:

Here, the Latin indices that characterize the field components span from 1 to *m*. (This is in contrast to the Greek indices introduced earlier, which characterize the components of **x** and span from 0 to *n* – 1.) The matrix $\gamma (x)$ can be any *m *×* m* symmetric matrix. It can be considered as an additional metric. This metric does not have to be the same as *g*, as seen already from the fact that *m* and *n* do not have to be the same. (For example, Ref. 27 describes a six-dimensional field on a three-dimensional spacetime; also see Sec. VI B.) This means that the space to which $\Psi (x)$ belongs is not necessarily the space tangent to *M ^{n}*. With that said, having $\gamma =g$ is possible as a special case; see Secs. VI A and VI C.

Using *γ* as a metric, we introduce the standard rules for manipulating the Latin indices,

where $\gamma ab$ are the elements of $\gamma \u22121$. In particular, in the **x** representation, one has

and we shall also use the following local dot product:

Under this dot product, a matrix *A* with mixed-index elements $Aab$ is self-adjoint [$(A\Psi )\xb7\Phi =\Psi \xb7(A\Phi )$] if the matrix with the corresponding lower-index elements, *A _{ab}*, is Hermitian. The difference between Hermitian and self-adjoint matrices can be ignored if the metric

*γ*is Euclidean or pseudo-Euclidean.

### B. Operators on $Hmn$

Any operator $A\u0302$ on $Hmn$ can be understood as an *m *×* m* matrix with elements $A\u0302ab$ which are operators on $H1n$. For any given $A\u0302ab$, we introduce $A\u0302\xafab\u2009\u2250\u2009\gamma acA\u0302cb$, which is also an operator on $H1n$. Correspondingly, two types of Weyl images can be defined, namely, $Wz[A\u0302ab]$ and $Wz[A\u0302\xafab]$. Below, we assume the notation

Also, *A* will be an index-free notation of the matrix with elements $Aab$, and $A\xaf$ will be an index-free notation of the matrix with elements $A\xafab$. Importantly, $Aab$ should not be confused with $Wz[A\u0302ab]$ because the Weyl symbols do not transform as tensors,

except for those which are independent of **p**. Symbols $Wz[A\u0302ab]$ will not be used below, and so no special notation is introduced for them.

Note that by Eq. (98), the following equality holds for any $\Phi ,\u2009\Psi $, and $A\u0302$ on $Hmn$:

where we invoked the definition of the inner product on $H1n$ [Eq. (10)]. By the definition of the adjoint operator $A\u0302\u2020$,

In order to express $A\u0302\u2020$ through $A\u0302$, let us represent $A\u0302\xafab$ in terms of the corresponding Weyl image $A\xafab$ and the Wigner operator $\Delta \u0302z$ on $H1n$, with $z\u2261(x,p)$ (Sec. IV A 3). Then, Eq. (105) leads to

where we used that $\Delta \u0302z\u2032$ is self-adjoint on $H1n$. By comparing this with Eq. (105), one finds that, on one hand,

On the other hand, by Eq. (27), for the inverse Wigner–Weyl transform, one has

Hence, the Weyl image of $A\u0302\u2020$ is simply the conjugate transpose of the Weyl image of $A\u0302$ in the sense that

In other words, *for operators with both indices lowered*, the operations $Wz$ and $\u2020$ commute.

Equation (109) implies that, if $A\u0302$ is self-adjoint on $Hmn$, then $A\xaf$ is a Hermitian matrix, and thus, the corresponding matrix *A* is self-adjoint (Sec. V A). In particular, this means that the Hermitian and anti-Hermitian parts [Eq. (26)] of an operator are determined by the Hermitian and anti-Hermitian parts of its lower-index Weyl symbol, respectively. We summarize this as follows:

For lower-index matrices, the Hermitian and anti-Hermitian parts are defined as usual,

### C. Weyl expansion of the dispersion operator

Now, let us consider the dispersion operator $D\u0302$ in particular. According to the above definition, it is viewed as an $m\xd7m$ matrix with elements $D\u0302ab$. By lowering the index in the envelope equation (1), one can write this equation as follows:

where $D\u0302\xafab\u2009\u2250\u2009\gamma acD\u0302cb$. Equivalently, this can be written as the envelope equation,

Then, the approximate operators $D\u0302\xafab$ can be expressed just like $D\u0302$ for scalar waves (Sec. IV B),

Here, the coefficients are **x**-dependent matrices; specifically, $D\xafab(x)\u2250\u2009D\xafab(x,k(x))$, and

One can also multiply Eq. (115) by $\gamma \u22121$ to raise the first index. Then, one obtains

Note that if $D\u0302$ is self-adjoint, then $D\xafab$ is Hermitian; then, $D\u0302ab$ is self-adjoint too.

Unlike in scalar waves (Sec. IV C), $D\xafab(x,k)$ is not a covariant object, i.e., in this case, not a true tensor. (For example, see Sec. VI.) It is convenient to split it as

where $D\xaf$*is* a true tensor and $C\xaf=O(\u2202P)$ is a small remainder. [Here, $\u2202P$ means the same as in Sec. IV C, and so $C=O(\u03f5)$ under the GO ordering and $C=O(\u03f5\u2225)$ under the quasioptical ordering.] The splitting of Eq. (120) is not unique and is a matter of convenience; for example, one can define $D\xaf(x,k)$ simply as the zeroth-order dispersion tensor. Hence, we can redefine

which is equivalent to the above definitions within the accuracy of our theory. Then, assuming the same orderings as in the scalar-wave case (also see below), the following approximation is enough both for the first-order theory and for the quasioptical theory:

Here, $D\u2261D(x)$ and $D=\gamma \u22121D\xaf$, and so $Dab(x)=\gamma acD\xafcb(x)$; similar conventions are assumed for *C* and *D*. Accordingly, the matrices $DH$ and *D _{A}* are self-adjoint and so are the operators $D\u0302H1$ and $D\u0302H2$, which are given by

Using $[\gamma \u22121,p\u0302\mu ]=i(\gamma \u22121),\mu $ and $(\gamma \u22121\gamma ),\mu =0$ to obtain

one can also express these as

(The term $\u221d\u2113\mu $ must be kept in $D\u0302H1$, but a similar term in $D\u0302H2$ can be neglected.) Also,

and since *γ* is **k**-independent, this leads to

### D. Active and passive modes

#### 1. Basis from the eigenvectors of $DH$

Let us assume the GO ordering as in Eq. (63), except that $DH$ and $DA$ are now matrices. Then, $D\u0302=O(\u03f5)$, and so the envelope equation [Eq. (5)] acquires the form

Hence, it is convenient to express *ψ* through the eigenvectors of $DH(x)$. Let us denote these eigenvectors as $\eta s(x)$ and the corresponding eigenvalues as $\Lambda s(x)$, and so

[Note that $\eta s(x)=\eta s(x,k(x))$, where $\eta s(x,p)$ are the eigenvectors of $DH(x,p)$. Likewise, $\Lambda s(x)=\Lambda s(x,k(x))$, where $\Lambda s(x,p)$ are the eigenvalues of $DH(x,p)$.] Since $DH$ is self-adjoint, it has *m* eigenvectors *η _{s}* which form a complete basis ${\eta s}$. Let us also introduce the corresponding dual basis ${\eta s}$ as usual, i.e., such that

Then, we can represent *ψ* as

where *a ^{s}* serves as the components of

*ψ*in the basis ${\eta s}$. [Remember that the dot product (101) includes conjugation.] Since $DH$ is self-adjoint, it is always possible to make the basis ${\eta s}$ orthonormal, and this choice is assumed below. Then, $\eta s\xb7\eta s\u2032=\delta ss\u2032$; thus, $\eta s=\eta s$, i.e.,

Also, for any two fields $\psi =\eta sas$ and $\varphi =\eta sbs$, the inner product (98) can be written as follows:

where $as\u2009\u2250\u2009\delta ss\u2032as\u2032$. The matrix *δ* with elements $\delta ss\u2032$ serves as a Euclidean metric for manipulating the mode indices *s* and $s\u2032$. Those should not to be confused with the coordinate indices denoted with Greek letters and also with other Latin indices that are manipulated by the metric *γ* [Eq. (99)].

#### 2. Amplitude vectors

The physical meaning of the expansion coefficients *a ^{s}* is understood as follows. Consider multiplying Eq. (133) by

*η*from the left. This gives $\Lambda sas=O(\u03f5)$, where no summation over

^{s}*s*is assumed. This shows that, for a given

*s*, there are two possibilities: either

*a*is small or Λ

^{s}_{s}is small. In the first case, the polarization

*η*does not correspond to a propagating wave mode

_{s}*per se*; the small nonzero projection of

*ψ*on

*η*is only due to the fact that the wave field is not strictly sinusoidal in inhomogeneous medium. We call such “modes” passive. In the second case, the local $k(x)$ approximately satisfies the local dispersion relation $\Lambda s(x,k(x))\u22480$. Then,

_{s}*a*can be understood as the local scalar amplitude of an actual GO mode so that $as=O(1)$ is allowed. We call such modes active, and mode conversion occurs when more than one active mode exists. In other words, for a given boundary or initial conditions, active modes are those that are excited resonantly, while passive modes are those that are nonresonant and, thus, adiabatically isolated. (However, this does not mean that the passive-mode amplitudes are simply negligible; see below.)

^{s}Assuming that there are $N\u22651$ active modes, we shall order them such that they correspond to $s=1,2,\u2026,N$ and the remaining $N\xaf=m\u2212N$ modes with $s=(N+1),(N+2),\u2026,m$ are passive. Let us also adopt the notation

($s=1,2,\u2026,N\xaf$) for the passive-mode amplitudes, polarizations, and eigenvalues, respectively. Then, it is convenient to introduce the “amplitude vectors”

and the corresponding row vectors that are dual to the amplitude vectors under the Euclidean complex dot product, namely,

Below, we seek to derive an approximate envelope equation in terms of these amplitude vectors.

#### 3. Polarization matrices

Before we proceed, let us introduce the following notation. First, consider the “polarization matrices”

These are nonsquare matrices that have active- and passive-mode polarizations as their columns, namely,

Using these, Eq. (136) can be rewritten compactly as

Also, consider the auxiliary polarization matrices

which have $\eta s*$ as their rows,

Since

one obtains

However, note that in general, $+$ does *not* mean to represent the adjoint in the common sense [cf. Eq. (154)].

Next, notice that

where the indices *s* and $s\u2032$ span from 1 to *N*. Analogous formulas apply to $\Xi \xaf$. Then,

and similarly for $\Xi +\Xi \xaf$. In other words, one has

where $1$ is the unit square matrix and $0$ is the zero square matrix. (The dimensions of $1$ and $0$ are different in different equations.) Hence, if one multiplies Eq. (145) by $\Xi +$ and, separately, by $\Xi \xaf+$, one obtains concise formulas for the amplitude vectors *a* and $a\xaf$,

Another property that we shall use later on is

and similarly for $\Xi \xaf+$. Here, $H$ denotes the conjugate transpose ($\Xi H\u2009\u2250\u2009\Xi T*$) and $\delta \u22121$ is the Kronecker matrix with upper-index elements $\delta ss\u2032$. Hence,

The factor $\delta \u22121$ can be omitted if one ignores the difference between upper and lower indices *s* and $s\u2032$ that refer to the mode number. If *γ* is Euclidean, one can also ignore the difference between upper and lower coordinate indices; then, $\Xi +=\Xi H$.

As a side remark, note that Ξ and $\Xi \xaf$ are generally nonsquare, and so $\Pi \u2009\u2250\u2009\Xi \Xi +$ and $\Pi \xaf\u2009\u2250\u2009\Xi \xaf\Xi \xaf+$ are not unit matrices but rather *projectors*. Indeed, due to Eqs. (152), one has $\Pi 2=\Pi $ and $\Pi \xaf2=\Pi \xaf$. Also, by applying Π and $\Pi \xaf$ to Eq. (145), one obtains

Thus, $\Pi \psi $ is the projection of *ψ* on the active-mode space and $\Pi \xaf\psi $ is the projection of *ψ* on the passive-mode space.

#### 4. Equation for the active modes

Using the eigendecomposition theorem and Eq. (152), one obtains

where Λ and $\Lambda \xaf$ are the diagonal eigenvalue matrices,

These and Eq. (152) also lead to

Hence, the envelope equation (5) can be written as

where we used Eq. (159). Let us multiply this by $\Xi +$ and, separately, by $\Xi \xaf+$. Then, due to Eq. (152), one obtains

Let us also use Eq. (162) to express $a\xaf$ through *a* and substitute the result into Eq. (161). Since both $a\xaf$ and $D\u0302$ are of order *ϵ*, it is sufficient to solve Eq. (162) for $a\xaf$ approximately to the leading order,

Then, Eq. (161) becomes an equation for just the *N*-dimensional active-mode amplitude vector,

As a reminder, this equation is valid up to $O(\u03f52)$.

### E. Leading-order approximation: Extended geometrical optics

Upon substituting Eqs. (124)–(126) into Eq. (164), we obtain the following equation to lowest (first) order in *ϵ*:

where we introduced

As shown in Appendix B 1,

where Ξ is considered as a function of **x**. [As a reminder, $\Xi H\u2009\u2250\u2009\Xi T*$ is the conjugate transpose of Ξ; $V\xafH\mu $ is given by Eq. (121); and $\delta \u22121$ is a unit matrix which only raises the mode index.] Alternatively, Ξ can be considered as a function of $(x,k)$. Then, as shown in Appendix B 2,

where the partial derivatives $|\mu $ and $|\mu $ are defined in Sec. IV C. The term *U* is the Stern–Gerlach Hamiltonian mentioned in the introduction (Sec. I), except here it is generalized to an arbitrary metric. This term causes polarization-driven bending of the ray trajectories, which is missed in traditional GO; also, it causes mode conversion, if more than one active mode is present.^{24–28} These effects are discussed in further detail in Sec. V G.

More explicitly, Eq. (165) can be written as

This generalizes the XGO equation derived in Refs. 24–29 to dissipative waves and curved coordinates. Since the vector *a* belongs to the space where the metric is Euclidean by definition (Sec. V D 2), the elements of this vector are covariant, i.e., invariant with respect to coordinate transformations in the physical space. The same applies to the eigenvalues comprising Λ; thus, the whole left-hand side of Eq. (172) is covariant too. This means that the operator on the right-hand side is also covariant, which includes its Hermitian and anti-Hermitian parts separately. Hence, *U* and $\Gamma $ are covariant.

and so Eq. (173) can be viewed as a generalization of Eq. (67). At zero $\Gamma $, the dynamics is conservative, and Eq. (173) reflects conservation of the wave action or quanta.^{54} Accordingly, $\Gamma $ determines the dissipation rate, $J\mu $ can be identified (up to a constant factor) as the action flux density summed over all active modes (for example, see Sec. VI B), and the elements of the diagonal matrix $\Lambda |\mu $ are proportional to the group velocities of the corresponding active modes. We shall call them the group velocities (without “proportional to”) for brevity. We also emphasize that this model applies even when the group velocities of the active modes are very different, unlike the quasioptical model discussed in Sec. V F.

For scalar waves studied in Sec. IV, we had $\Lambda =DH$ and we defined $k(x)$ such that this term be zero [Eq. (64)]. Now, Λ is a diagonal *matrix* with $N\u22651$ nonzero elements, and so it cannot be zeroed entirely by imposing just one scalar constraint on $k(x)$. Hence, there can be more than one natural way to define $k(x)$. One esthetically pleasing and convenient^{29} option is to require that Λ or $\Lambda \u2212U$ be *traceless*. This amounts to choosing the reference-ray Hamiltonian in Sec. III as $H=tr\u2009\Lambda /N$ or $H=tr\u2009(\Lambda \u2212U)/N$, respectively. [Arbitrary constant factors can be introduced instead of *N*, for that only redefines *τ* in Eq. (7).] Another option is to adopt $\Lambda s=0$ for some single $s\u2264N$, since all active modes have close wave vectors anyway; then, $H=\Lambda s$. As mentioned in Sec. III, all such choices of $k(x)$ are equally justified. Although they lead to slightly different envelope equations, those equations are equivalent in the sense that they all describe the same total field by construction. (One might call this a gauge freedom.) However, for the case *N *=* *1, choosing $H=\Lambda \u2212U$ is preferable, as discussed in Sec. V G 2.

### F. Quasioptical model

#### 1. Quasioptical equation

Suppose that a wave propagates largely as a single beam. This implies that all active modes have group velocities close to their average group velocity $Vavr$, which can be defined via $NVavr\mu \u2009\u2250\u2009tr\u2009\Lambda |\mu $. Then,

where $\Delta V\mu $ are matrices with eigenvalues much smaller than $Vavr$. Let us assume a ray-based coordinate system aligned with $Vavr$ with the inner product on the transverse space $M\u22a5n\u22121$ defined similar to Eq. (76) [cf. also Eq. (138)], namely,

Let us also adopt the quasioptical ordering as we did for scalar waves in Sec. IV E. In particular, we allow $\Delta V\mu =O(\u03f5\u22a5)$. Then, Eq. (164) becomes

Here, $K\u0302$ is defined as in Eq. (167) and can be approximated as follows:

where $V\u2009\u2250\u2009Vavr0$ (we used the fact that $Vavr\sigma =0$ by definition of the ray-based coordinates), $U=O(\u03f5||)$ is given by Eq. (171), and $\Gamma $ is given by Eq. (166). Finally,

As shown in Appendix B 3, $G\u0302$ can also be simplified as

and so it is also self-adjoint under the inner product (176).

In summary, the quasioptical equation for vector waves can be written as follows:

Equation (183) is the main result of our paper. Like Eq. (172), this equation is covariant within the accuracy of our theory. As a reminder, *a* is generally a vector ($dim\u2009a=N\u22651$), whose elements are the scalar amplitudes of active modes (Sec. V D 2); Λ is the diagonal matrix formed by the eigenvalues of $DH$; $\Gamma $ is given by Eq. (166); $V=N\u22121tr\u2009\Lambda |\zeta $ is a scalar; $|\zeta \u2009\u2250\u2009\u2202/\u2202k0$; $G\u0302$ is given by Eq. (182); and $\Delta K\u0302$ is given by Eq. (179). There, *U* is given by Eq. (171) and $\Delta V$ is given by Eq. (180). The derivatives of $h\u22c4$ could be neglected within the assumed accuracy, but a purist might want to retain them in order to keep $\Delta K\u0302$ and $G\u0302$ precisely self-adjoint under the inner product (176).

Like in Sec. IV E 2, we require the *α* parameter of the ray-based coordinates to be defined as $\u03f1$-independent. Then, Eq. (183) has the following corollary:

which is similar to Eq. (84). For the case of zero $\Gamma $, Eq. (184) predicts conservation of the wave-action flux through the beam cross section,

Unlike in Eq. (85), this is the flux of all active modes combined, and the fluxes of individual active modes may not be conserved separately.

#### 2. Simplified equations

If the metric is nearly Euclidean (or pseudo-Euclidean) as in Sec. IV E 3, we can drop the metric factors and rewrite Eq. (183) as follows:

Using the variable transformation $\varphi \u2009\u2250\u2009Va$, this equation can be further simplified as

Here, $\chi \u0302$ is an operator self-adjoint under the inner product (176) and given by

Also, *Q*, Σ, and $\u03d2$ are the self-adjoint matrices given by

Accordingly, Eq. (184) becomes

and so $\u27e8\varphi |\varphi \u27e9N\u22a5$ is conserved if $\u03d2$ is zero.

### G. Summary and discussion

The quasioptical model proposed above is equally applicable to scalar beams, single-mode vector beams (*N *=* *1), and multimode vector beams (*N *>* *1).

#### 1. Scalar beams

In the case of a scalar beam (Sec. IV), one has $\Xi =1$, and so

Then, the equations from Sec. IV E are reproduced. In particular, Λ is made zero by the choice of **k** [Eq. (64)]. This implies that the ray Hamiltonian is $H=\Lambda $, which leads to the following ray equations:

#### 2. Single-mode vector beams

In the case of a single-mode vector beam (*N *=* *1), one has $\Xi =\eta $, where *η* is the polarization vector. Then,

and so they are scalars. Also, $\Delta V\sigma =0$, but *U* is generally a nonzero scalar function. There are two natural ways to define **k** in this case. One is to require that $\Lambda =0$, i.e., to adopt $H=\Lambda $. In this case, *U* is left in the envelope equation and can intensify the envelope inhomogeneity in the direction perpendicular to the group velocity. This may eventually result in the violation of the envelope approximation. (For multimode beams, this is less of a concern because they typically split into single-mode beams before the issue becomes significant.) Hence, it is potentially advantageous to define **k** by requiring $\Lambda \u2212U=0$, i.e., by adopting $H=\Lambda \u2212U$. In this case, the amplitude equation is identical to that of a scalar wave, while the effect of *U* is absorbed in the reference phase, leading to modified ray equations,

The effect of *U* on the ray trajectories is known as the (spin) Hall effect of light in optics.^{22,23,61–64} In plasma physics, this effect is typically neglected. (To our knowledge, all existing ray-tracing codes ignore *U* entirely.)

Finally, note that the ray dynamics can also be represented in an alternative form. Since Λ is a scalar, one can rewrite Eq. (171) for $U(x,k)$ as follows:

Since there is only one mode, there is no mode index to manipulate, and so the “metric” factor *δ* has been dropped. Likewise, the anti-Hermitian parts are simply the imaginary parts in this case. We also substituted the GO ray equations (191), where we replaced $d/d\tau $ with dots, and adopted

Then, the phase-space Lagrangian of a ray, $L[x,k]=k\mu x\u0307\mu \u2212H$, has a noncanonical structure, namely,

The equations originating from a special case of this noncanonical phase-space Lagrangian were used, for example, in Ref. 62 to study the Hall effect for light propagating in a nonbirefringent material. A comparison of the resulting noncanonical ray equations with the canonical ray equations (193) can be found in Ref. 27. Also notably, the terms $A\mu (x)$ and $A(k)\mu $ are known as the Berry connection and are linked to the Berry phase or the geometrical phase of light.^{27,44,62}

#### 3. Multimode vector beams

In the case of multiple active modes (*N *>* *1), *a* is an *N*-dimensional vector, and the coefficients in the amplitude equation are matrices (except for *V*, which is a scalar). In particular, the matrices $\Gamma ,\u2009\Delta V\sigma $, and *U* are generally nondiagonal, and so they can cause mode conversion. In the simplest case, when both the transverse gradients and dissipation are negligible, this process is most transparently described by Eq. (187), which then becomes

with *Q* being a self-adjoint matrix. The modes decouple when *Q* is close to diagonal. More generally, the dynamics governed by Eq. (198) is similar to that of an *N*-level quantum system with a Hamiltonian *Q* (and to the dynamics of *N* coupled classical oscillators whose parameters do not change too rapidly with *ζ*^{65}). Alternatively, it can be mapped to a precession equation for a real $(N2\u22121)$-dimensional “spin” vector.^{26} This approach is particularly intuitive at $N=2$, when the spin is three-dimensional. A detailed analysis of mode conversion for this case can be found in Ref. 29.

As a reminder, the model presented here (Sec. V F) relies on the assumption that the group velocities of the active modes are close to each other. Otherwise, beam splitting occurs rapidly, and the assumption that $a,\zeta \u226aa,\sigma $ does not hold. The quasioptical description is inapplicable to such beams; however, the first-order XGO model described in Sec. V E can be used.

## VI. ELECTROMAGNETIC WAVES

### A. Covariant formulation

Here, we shall explain how the above theory applies to EM waves. We start with Maxwell's equation^{68}

where $F\alpha \beta $ is the EM tensor, namely,

*A* is the vector potential, and *J* is the current density. This equation can also be represented as follows:

Here, $D\u0302vac(A)$ is the vacuum dispersion operator,

which is self-adjoint under the inner product (98). By assuming $J=c\u03d1\u0302A$, where $\u03d1\u0302$ is some linear operator, one can cast the equation for *A* in the form (1), namely,

Hence, the theory developed in the previous sections readily applies, specifically, with the dimension of the configuration space being *n *=* *4, the dimension of the vector field *A* being *m *=* *4 (so *m *=* n*), and $\gamma =g$. The corresponding Weyl symbols are calculated as follows. It can be shown straightforwardly that $[D\xaf\u0302vac(A)]\alpha \beta \u2009\u2250\u2009g\alpha \gamma [D\u0302vac(A)]\gamma \beta $ is approximately given by

Here, we omitted second-order derivatives of *g*, which are negligible within the accuracy of our theory. Also, *q* is defined as in Eq. (15), and $\u2113\mu \u2009\u2250\u2009g\u22121g,\mu $, which is in agreement with Eq. (127) because $g=\gamma $. Finally, in the form introduced in Sec. V C, the above result is

where $(\u03d1\xaf0)\alpha \beta $ is the GO limit of $\u03d1\xaf,\u2009(\Delta \u03d1\xaf)\alpha \beta $ is the remaining part of $\u03d1\xaf$, and $\u03d1\xaf$ itself is the symbol of $\u03d1\u0302\xaf$.

### B. Noncovariant formulation

As a special case, suppose a metric of the form

with time-independent spatial metric ** h**. Then, the above equations can be simplified as follows. Let us assume the Weyl gauge ($A0=0$). Then, it is sufficient to consider just the spatial part of the four-dimensional Eq. (203) and replace the vector potential with the electric field,

(As usual, $p\u03020\u2009\u2250\u2009\u2212ic\u22121\u2202t$, which is a self-adjoint operator. Assuming we work with wave fields that have zero time average, $p\u03020$ can also be considered reversible.) Unlike earlier, we now use the Latin indices to denote spatial components, and we shall adopt the bold font for spatial vectors and matrices when using the index-free notation. Specifically, one obtains

where we also multiplied the equation by $(p\u03020)\u22121$. This has the form (1) with *n *=* *4, *m *=* *3, and $\gamma =h$.

Let us introduce the conductivity operator $\sigma \u0302$ via $J=\sigma \u0302E$ and notice that $4\pi \sigma \u0302=icp\u03020\chi \u0302$ by the definition of the susceptibility operator $\chi \u0302$. Then, $4\pi \u03d1\u0302=p\u03020\chi \u0302p\u03020$, and so $D\u0302(E)$ can be expressed as follows:

or equivalently, $D\u0302(E)=(p\u03020)\u22122D\u0302vac(A)+\chi \u0302$. The corresponding Weyl image is $D\xafab(E)=(p0)\u22122[D\xafvac(A)]ab+\chi \xafab$, or

where $\epsilon \xafab\u2009\u2250\u2009hab+(\chi \xaf0)ab$ serves as the dielectric tensor and $\chi \xaf0$ is the GO limit of $\chi \xaf$.

Using the fact that the dispersion operator is defined only up to a constant factor, let us introduce an additional factor $1/(16\pi )$. Then, Eq. (210) becomes

The quantity $C\xafab(A)$ was derived in Sec. VI A, and the vector *q* [Eq. (15)] and the matrices $\u2113a$ [Eq. (127)] can be written as

and so they both can be of order $\u03f5||$; hence, $C=O(\u03f5||)$. Note that such *C* is generally non-negligible even for near-Euclidean metrics such as those discussed in Sec. IV E 3.

From Eq. (174), we obtain

which acts as the true action-flux density.^{54} As a reminder, the action density $I\u2009\u2250\u2009J0$ can be written as

where we used $[D\xafH(E)]abEb\u22480$. Using the latter and Faraday's law $B\u2248ck\xd7E/\omega $ for the magnetic field **B**, one can also rewrite $I$ in another common form,^{4}

One can also show^{54} that the spatial component of the action flux density (215) can be expressed as

where **S** is the (time- or space-averaged) Poynting vector,

The GO equation (173) can be written as

The corresponding density of the wave energy is $\omega I$, and the density of the energy flux is $\omega J$, as flows from the variational principle.^{54}

### C. Stationary waves

For stationary waves with fixed frequency, one has $p\u03020=\u2212\omega /c$ and *ω* can be treated as a constant parameter. Such waves can be studied on the three-dimensional configuration space, namely, the physical space **x**. In this case, $n=m=3$ and $\gamma =h$. If there is no spatial dispersion, then $\chi \u0302=\chi (x\u0302)$, where the dependence on *ω* is assumed but not emphasized. Assuming that the underlying space is flat (which is always the case in laboratory applications), the symbol $\chi ab$ is then identical to the susceptibility of the homogeneous medium. (Other formulations of the susceptibility have also been proposed in this case, e.g., for light in nondispersive dielectric media^{27} and for waves in cold plasmas.^{24,25,67})

These properties can also be extended, approximately, to media with *weak* spatial dispersion, i.e., such that

For example, in plasma, $\chi \xafab(k)$ is proportional to the temperature, and for EM waves, thermal effects often can be considered as small perturbations.^{4} Although the Weyl symbol $\chi \xafab(k)$ is, strictly speaking, different from the corresponding part of the susceptibility in homogeneous medium, the difference is of order $\u03f5\u2225\chi \xafab(k)$, and so it is much smaller than $\u03f5\u2225$ and thus can be neglected. In other words, one can simply replace the small $\chi \xafab(k)$ with its GO limit, which is usually well known.^{4}

In principle, our general method is also applicable to waves in media with strong spatial dispersion. However, $\chi \u0302$ may be hard to calculate in that case unless a medium is homogeneous, and no extrapolation of a known homogeneous-medium model is guaranteed to yield the true general $\chi \u0302$. A related discussion can also be found in Ref. 43.

### D. Waves in flat space

Let us also discuss the special case when the configuration space is flat and Eq. (220) applies, as in typical laboratory applications. Although the ray-based metric ** h** is still non-Euclidean in this case (unless the reference rays are straight), one can use the flatness of the configuration space in the following sense. In the Euclidean laboratory coordinates $x\u0303$, the spatial metric has the form $h\u0303ab=\delta ab$ and $C\u0303$ vanishes. (Here and further, the tildes denote that the corresponding quantities are evaluated in the laboratory coordinates. Note that Papers II and III assume a different notation; there, tildes are quantities in the ray-based coordinates, and nontilded are those in the laboratory coordinates.) This simplifies the expression for the dispersion matrix and makes

*D*a true tensor (modulo the insignificant corrections caused by weak spatial dispersion discussed in Sec. VI C). Then, since

_{H}*U*and $\Gamma $ are frame-invariant (Sec. V E), one can calculate them in the laboratory coordinates using

where both $\delta \u22121$ and $\gamma \u0303$ are omitted for simplicity, since they are unit matrices. This also extends to the quasioptical model. Since calculations in the laboratory coordinates are generally easier than the corresponding calculations in the ray-based coordinates, this simplifies applications of our theory to numerical simulations, as discussed further in Paper II.

Specific applications of this theory, including those to waves in fusion plasmas, will be discussed in future publications. Here, we only note that our formulation relies on the GO ordering and thus is inapplicable to wave transformations near caustics, for example, as in the O–X conversion caused by the density gradient in dense magnetized plasmas.^{68} In contrast, the O–X conversion caused by the magnetic shear in dilute plasmas^{29} can be modeled naturally, as also demonstrated explicitly in Paper III. Any adiabatic transformations of waves along continuous dispersion curves can also be modeled naturally, specifically, within the single-mode approximation summarized in Sec. V G 2. (In special cases like the X–B transformation,^{4} when the group velocity becomes zero on a ray, while *k* remains large, one may want to replace the coordinate *ζ* with *τ*, where $d\tau =d\zeta /V$, to remove the singularity in the amplitude equation.) In all these cases, our theory is expected to yield results which agree with full-wave modeling up to local errors of order $O(\u03f5\u2225)$, because this is the accuracy within which the passive-mode contribution is calculated (Sec. V D 4).

## VII. CONCLUSIONS

In summary, we propose a quasioptical theory of mode-converting wave beams in inhomogeneous media such as plasma. This includes the following. For any given dispersion operator $D\u0302$ that governs the original wave field Ψ, we explicitly calculate the approximate operator $D\u0302$ that governs the wave envelope *ψ* to the second order in the GO parameter *ϵ*. Then, we further simplify this envelope operator by assuming that the gradient of *ψ* transverse to the local group velocity is much larger than the corresponding parallel gradient. This leads to a parabolic differential equation for *ψ* (“quasioptical equation”) on the basis of the GO polarization vectors [Eq. (136)]. Our main results can be found in Sec. V F, which includes the general quasioptical equation (183) and its special case (187). Scalar and mode-converting vector beams are described on the same footing (Sec. V G). We also explain how to apply this model to EM waves considered as a special case (Sec. VI). In the follow-up papers, we report successful quasioptical modeling of radio frequency wave beams in magnetized plasma based on this theory.

## SUPPLEMENTARY MATERIAL

See the supplementary material for an overview of the Weyl calculus on a curved configuration space. There, we present proofs of the statements summarized in Sec. IV A.

## ACKNOWLEDGMENTS

The first author (I.Y.D.) acknowledges the support and hospitality of the National Institute of Fusion Science, Japan. The authors are also thankful to M. A. Oancea for stimulating discussions regarding the XGO generalization to non-Euclidean metrics. This work was supported by JSPS KAKENHI Grant No. JP17H03514. This work was also supported by the U.S. DOE through Contract No. DE-AC02-09CH11466 and by the Laboratory Directed Research and Development program at Sandia National Laboratories. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. DOE National Nuclear Security Administration under Contract No. DE-NA-0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in this paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

### APPENDIX A: SUMMARY OF THE SELECTED NOTATIONS

Here, we present a summary of the selected notations used in the main text.

##### 1. Basic symbols

• $\u2009\u2250\u2009$ denotes a definition.

• ^{̂} denotes an operator.

• $T$ denotes the matrix transpose.

• $H=T*$ denotes the conjugate matrix transpose.

• $+$ is used only in $\Xi +$ and $\Xi \xaf+$ defined in Eqs. (146).

• _{H} and _{A} denote the Hermitian and anti-Hermitian parts of an operator [Eqs. (26)] or those of a matrix (Sec. V B). When applied to a scalar, _{H} and _{A} denote the real and imaginary parts, respectively.

• The underline notation $_$ is explained in Sec. V B and is used for matrices with lower indices only.

• $Wz$ and $W\u22121$ denote the Wigner–Weyl transform and its inverse (Sec. IV A 3), respectively.

##### 2. Index manipulation

On the *n*-dimensional configuration space *M ^{n}*, we assume a general metric tensor

*g*with components $g\mu \nu $. For (co)vector fields on the space (co)tangent to

*M*, the indices are manipulated as usual, namely, via

^{n}(Summation over repeated indices is always assumed unless specified otherwise.) For vector waves, we also introduce an additional vector space that may or may not be the same as the space tangent to *M ^{n}* (Sec. V A). On that space, an additional metric

*γ*is introduced, and the indices are manipulated as follows:

We also introduce the Euclidean metric *δ* on the space of amplitude vectors *a* (Sec. V D 2),

For active modes, which are of our primary interest, the mode indices *s* and $s\u2032$ range from 1 to *N*. The distinction between *a ^{s}* and

*a*is made only for esthetic reasons (consistency of notation) and can be neglected otherwise.

_{s}##### 3. Derivatives

For spatial derivatives, we use the notation that is standard, for example, in general relativity.^{69} In particular, $X\mu ;\mu $ is the divergence, namely,

Here, $g\u22c4\u2009\u2250\u2009|det\u2009g\u2009|$,

For functions of the form $f(x,k)$, we also introduce the following partial derivatives:

Since $k=\u2207\theta (x)$, one has

##### 4. Inner products

The inner product for scalar waves on *M ^{n}* is

and for *m*-dimensional vector waves on *M ^{n}*,

In particular, in the **x** representation,

We also use the dot product

The inner product on the transverse space $M\u22a5n\u22121$ is defined for scalar fields as

and for the *N*-dimensional “active” parts of the vector fields (Sec. V D 2) as

### APPENDIX B: AUXILIARY CALCULATIONS

Here, we present some auxiliary calculations whose results are used in Sec. V.

##### 1. Calculation of $K\u0302$ and $U(x)$

We start with Eq. (167) for $K\u0302$. This equation can be rewritten as follows:

where we used Eq. (52) and introduced

where, at the end, we used $\Lambda =O(\u03f5)$ (Sec. V D 4), so that the last two terms in Eq. (B2) can be neglected.

From Eq. (154), we have $(\Xi +),\mu =\delta \u22121\Xi ,\mu H\gamma +\Xi +\u2113\mu $. Using this and also Eq. (130), one obtains

where the subscript *A* denotes the anti-Hermitian part, as usual.

##### 2. Calculation of $U0(x,k)$

Let us also derive an alternative expression for *U*_{0} [Eq. (B4)] in terms of the partial derivatives $|\mu $ instead of the full derivatives $,\mu $. Using Eq. (59), one obtains $\Xi ,\mu =\Xi |\nu k\nu ,\mu +\Xi |\mu $. Then, Eq. (B4) can be rewritten as follows:

By differentiating Eq. (159) with respect to *k _{μ}*, we obtain

where we used that Λ (but not its partial derivatives) is small. Let us multiply this by *γ* to obtain

where the latter equality is the complex conjugate of the former. Using these and $k\nu ,\mu =\theta ,\nu \mu =\theta ,\mu \nu =O(\u03f5)$, we further obtain, to the leading order,

where we neglected terms of the second order in *ϵ*. We also used the fact that $\Lambda s(x,k(x))\u2272O(\u03f5)$ by the definition of active modes, in addition, Λ_{s} are changing *slowly*, and so $\Lambda ,\mu \u2272O(\u03f52)$. (Note that this is only true for $\Lambda ,\mu $ but not for $\Lambda |\mu $.) Using Eqs. (B7), we also obtain similarly that

Then,

As a reminder, the subscript *A* denotes the anti-Hermitian part, and $\delta \u22121$ is a unit matrix which only raises the mode index. Equation (B10) is similar to the corresponding result derived in Refs. 24–26. Extra terms are included in those papers in the expressions for *U*_{0}, but they are of the second order in *ϵ* and could have been neglected.

##### 3. Calculation of $G\u0302$

Consider Eq. (181) for $G\u0302$ derived under the assumption of the quasioptical ordering. Here, we simplify that equation and derive the explicit formula (182). First, note that

Here, in the first line, we substituted Eq. (128) for $D\u0302H1$ and also Eq. (14) for $p\u0302\mu $. Then, we only kept the terms involving derivatives of *a* along the perpendicular direction of the wave beam. The parentheses in the indices denote symmetrization, namely, for any *A* and *B*,

The metric factor $h\u22c4$ has been added to keep the operator self-adjoint under the inner product (176). (Accounting for the inhomogeneity of ** h** in $G\u0302$ is beyond the accuracy of our theory. However, keeping $G\u0302$ self-adjoint is convenient and physically meaningful, for it is known that the exact operator is not responsible for dissipation.) Hence,

In order to simplify the expression for $\Phi \sigma \sigma \u0303$, note that

where we used $\Xi \xaf+\Xi =0$ (and thus $\Xi \xaf+\Xi |\sigma \u0303=\u2212\Xi \xaf+|\sigma \u0303\Xi $) and neglected terms proportional to $\Lambda =O(\u03f5||)$. Hence,

Using Eqs. (152) for Ξ and $\Xi \xaf$ like we did above, Eq. (156) for $DH$, and the fact that $\Lambda =O(\u03f5)$, we finally obtain

where we substituted $\Lambda |\sigma \u2248V\sigma $ [Eq. (B2)]. The quasioptical approximation implies that $V\sigma \u0303$ is close to a scalar matrix, and so the commutator $[\Xi +\Xi (|\sigma ,V\sigma \u0303)]$ can be neglected. Hence, $\Phi \sigma \sigma \u0303\u2248\Lambda |\sigma \sigma \u0303$.

## References

It is not a purpose of this paper to provide a comprehensive list of existing approaches to reduced modeling of EM waves in plasmas, so the above reference list is not intended as exhaustive.

This does not extend to ray-tracing codes. For example, a ray-tracing code raycon^{1} captures mode conversion by using an approximate yet somewhat generic analytic solution that applies in the limit of a narrow region of resonant coupling. A related ray-tracing model with mode conversion was also proposed in Refs. 24 and 25.

Mode conversion is understood here as the exchange of action between near-resonant but nonintersecting branches of the dispersion relation.^{1} Mathematically, this process is similar to quantum tunneling. Adiabatic transformations of a wave mode along continuous dispersion curves, such as the X–B transformation,^{4} are not mode conversion in the true sense of the term and can also be simulated using already existing reduced models.

This also facilitates the aforementioned applications to related problems in general relativity,^{23} as will be discussed in future publications.

Eventually (Sec. V E), we shall choose $k(x)$ such that $D(x,k(x))$ be small. But that will characterize $k(x)$ rather than the function *D* itself, which is considered order-one when evaluated on general $(x,p)$.

Strictly speaking, Eq. (1) can always be multiplied by a constant matrix to ensure that the new dispersion operator satisfies the ordering (63) at a given location. Hence, the only nontrivial requirement that we actually impose here is that such $DA$ does not grow too much as the wave propagates.

In principle, $D\u0302A$ produces effects qualitatively different from those produced by $D\u0302H$, so one might argue that the accuracy of its approximation does not necessarily need to equal that of $D\u0302H$. Besides, $DA$ is often more abrupt than $DH$, so retaining $VA\sigma $ in Eq. (82) may in fact lead to a better model in principle. However, in such cases, the Weyl symbol *D* of the original dispersion operator is typically known only to the zeroth order (Sec. VI C), and thus so is $DA$. Hence, retaining first-order corrections the damping coefficient does not seem warranted.