This work opens a series of papers where we develop a general quasi-optical theory for mode-converting electromagnetic beams in plasma and implement it in a numerical algorithm. Here, the basic theory is introduced. We consider a general quasimonochromatic multicomponent wave in a weakly inhomogeneous linear medium with no sources. For any given dispersion operator that governs the wave field, we explicitly calculate the approximate operator that governs the wave envelope ψ to the second order in the geometrical-optics parameter. Then, we further simplify this envelope operator by assuming that the gradient of ψ transverse to the local group velocity is much larger than the corresponding parallel gradient. This leads to a parabolic differential equation for ψ (“quasioptical equation”) on the basis of the geometrical-optics polarization vectors. Scalar and mode-converting vector beams are described on the same footing. We also explain how to apply this model to electromagnetic waves in general. In the next papers of this series, we report successful quasioptical modeling of radio frequency wave beams in magnetized plasma based on this theory.
Describing the propagation of waves in inhomogeneous media is a classic problem with a long history.1–3 It is particularly important in fusion research, where quasistationary beams of electromagnetic (EM) radiation are commonly used for many purposes and need to be modeled with fidelity.4 Full-wave modeling, which involves solving complete Maxwell's equations, can be impractical at short (cm and mm) wavelengths, especially when multidimensional simulations with complex geometries are required, such as those of tokamak and stellarator plasmas. Hence, reduced methods have been widely used in practice. These methods are rooted in geometrical optics (GO)1 and include conventional ray tracing,5–8 complex ray tracing,9,10 beam tracing,11–14 and variations thereof.15 There are also other “quasioptical” models, such as in Refs. 16–19, which resolve the evolution of the beam transverse structure without adopting any particular ansatz for the intensity profile. Still, they assume that only one branch of the dispersion relation is excited in each given case,20 i.e., mode conversion does not occur.21 As a result, today's simulations of mode-converting beams mainly rely on full-wave codes19 and thus have to compromise the fidelity by reducing the number of dimensions resolved.8
However, simulating mode conversion does not actually require the full-wave approach; in fact, it can be done within the quasioptical approach too if the latter is generalized properly. Developing the corresponding algorithms would facilitate not only fusion applications. For example, advanced quasioptical models could find use in optics and general relativity, where mode-converting beams are also possible and have been attracting much attention lately.22,23 Hence, it is potentially beneficial to approach the problem of quasioptical modeling as a general-physics problem, without restricting it to fusion applications.
B. General idea
Mode conversion can be described quite generally, particularly without restricting it to narrow regions in space, within “extended geometrical optics” (XGO), which was developed recently.24–28 XGO is a theory which calculates the leading-order correction U to the GO dispersion operator of a general vector wave and shows26 that this correction is analogous to (and a generalization of) the Stern–Gerlach Hamiltonian of a quantum spin-1/2 electron. Accordingly, U is responsible for two effects simultaneously: (i) it modifies the ray equations just like spin–orbital interactions affecting the electron motion and (ii) it also governs mode conversion, which appears as a direct analog of spin up–down transitions.29 An examination of the quasioptical algorithms such as those in Refs. 12 and 16 shows that they already involve calculations of terms similar to U. Hence, adding the mode-conversion capability to quasioptical codes should not be burdensome and is not expected to slow down the codes considerably. However, formulating the corresponding theory is easier to do using the abstract quantumlike formalism of XGO. One only needs to upgrade the existing XGO by adding diffraction, for which it also helps to introduce more general coordinates with curved metric.30
In this series of papers (Papers I–III), we propose such an upgrade of XGO and apply it to numerical simulations. Our goal is to develop a framework which later could be applied to a broad variety of problems both in plasma physics and beyond. Within this framework to be presented, one can assume general dispersion, diffraction, and polarization effects, including mode conversion of not just two but many arbitrary resonant waves. Also importantly, our formulation below is not restricted to EM waves since we do not specify the dispersion operator in the governing wave equation.
Our series of papers is organized as follows. In Paper I, we introduce the basic theory of waves that diffract and mode-convert simultaneously. In Paper II31 and Paper III,32 we apply this theory to perform quasioptical modeling of radiofrequency-wave beams in magnetized plasma as an example. In particular, we consider applications to mode conversion caused by magnetic shear in edge plasma,29 which, for example, is a known problem33,34 in the Large Helical Device.35,36
In this first paper of our series, we consider an arbitrary quasimonochromatic multicomponent wave in a weakly inhomogeneous linear medium. Supposing that is some dispersion operator governing the wave dynamics, we simplify and obtain an approximate operator that governs the wave envelope ψ. Then, we derive a parabolic differential equation for ψ (“quasioptical equation”) by assuming that the gradient of the wave envelope transverse to the local group velocity is much larger than the corresponding parallel gradient. The resulting theory applies to both scalar and mode-converting vector beams. At the end of the paper, we also discuss how this model can be applied to EM waves in particular. However, readers who are mainly interested in simulations (as opposed to the general theory) are encouraged to proceed straight to Paper II, where our key equations are overviewed in a simplified form and without derivations.
This paper is organized as follows: In Sec. II, we introduce the general problem. In Sec. III, we formalize the concept of the envelope dispersion operator. In Sec. IV, we derive an approximation for the envelope dispersion operator for scalar waves, and we also derive its quasioptical approximation. In Sec. V, we extend this model to vector waves. In Sec. VI, we explain how to apply the resulting theory to EM waves in particular. In Sec. VII, we present our main conclusions. In Appendix A, we summarize some of our notations. In Appendix B, some auxiliary calculations are presented. Our paper also contains the supplementary material. There, we overview the Weyl calculus on a curved configuration space, which is used in this work.
II. GENERAL PROBLEM
Consider a wave propagating on an n-dimensional configuration space Mn with coordinates and some general metric tensor . For simplicity, assume that Mn is diffeomorphic to , i.e., the n-dimensional Euclidean space or pseudo-Euclidean space with the same metric signature as Mn. (For more information on why the diffeomorphism with is needed, see Sec. IV A 2.) Suppose that the wave field , which may have multiple components, is governed by a linear equation with no source terms,
where is a differential or, more generally, integral dispersion operator. (For vector waves, is a matrix whose elements are operators; see Sec. V.) We shall assume that the GO parameter ϵ is small, namely,
(the symbol denotes definitions), where λ is the characteristic wave period, or wavelength, and L is the least characteristic scale among those of the wave envelope and of the medium, including the metric.37 Below, we propose a systematic reduction of Eq. (1) using the smallness of ϵ and eventually obtain a quasioptical model based on this equation. The idea of quasioptical modeling will be formalized later (Secs. IV E and V F).
As a side note, we emphasize that the theory to be developed does not describe wave transformations near cutoffs, where the GO parameter (2) is not small. It is possible to waive this limitation, but formulating such generalized theory is left to future publications.
III. ENVELOPE DISPERSION OPERATOR
As the first step, let us introduce a unitary variable transformation
(Other may also be justified in some cases, e.g., for dealing with caustics or quasiperiodic media,38 but we shall not consider this possibility in the present work.) The phase θ, which we call the “reference phase,” serves as a gauge potential. It is a real function such that
is the wave vector identical or close to that predicted by the GO approximation. This implies that the envelope ψ and also k are slow functions [and that the wavelength substituted in Eq. (2) is ]. Then, Eq. (1) becomes
where the “envelope dispersion operator” is (the dagger denotes the adjoint, as usual) or, more explicitly,
The reference phase θ is treated as a prescribed function. As it will become clear later, knowing θ per se is not actually needed for our purposes; instead, it is k that matters. The latter can be calculated on some “reference rays” using the conventional ray equations,1
(τ is any parameter along the ray), which also leads to
Note that, in general, θ and k are defined uniquely only to the leading order, and so there exists some freedom in choosing reference rays and their Hamiltonian H. This means that more than one is possible. Still, envelope equations that (slightly) differ in the choice of k are equivalent in the sense that the total field Ψ that they describe is the same by construction. We shall discuss this in more detail in Secs. V E and V G.
Also note that our approach equally applies to stationary and nonstationary waves. In the case of a stationary wave, we assume that x is a coordinate in physical space (“spatial problem”) and the wave frequency ω serves as a constant parameter. In the case of a nonstationary wave, we assume that x is a coordinate in spacetime (“spacetime problem”), and then, ω is a part of k. In spacetime problems, we assume coordinates such that , where c is the speed of light, t is the time, and the metric signature is ; then, (in the case of the Minkowski metric, this implies that ), and so , as usual. In other respects, spatial and spacetime problems are described on the same footing and will be distinguished only in Sec. VI.
Having defined this terminology, we shall now discuss how can be asymptotically expanded in ϵ for any .
IV. SCALAR WAVES
Our asymptotic theory of mode-converting beams is based on the well-known phase-space methods in quantum and classical wave theory. (For recent overviews, see Refs. 1 and 24. Specific key papers include, but are not limited to, Refs. 39–44. In particular, Ref. 43 is one of the first papers which explicitly discuss applications of the Weyl calculus to plasma problems; although also see Refs. 45–47.) In order to present our framework and notation in a self-contained manner, we restate some basics in Sec. IV A and the supplementary material. Also note that our calculations in Sec. IV B are similar in spirit to those in Ref. 43, except that we perform a higher-order expansion and assume a more general metric. We shall also comment throughout this paper on how our equations reproduce other known results in the corresponding limits.
A. Weyl calculus
1. Basic definitions
Until Sec. V, we shall assume that Ψ is a scalar function. Any given operator acting on it maps Ψ to a new scalar function which can be expressed as follows:
Here, the integral is taken over (and so are all integrals below, up to dimensions), , and is some kernel function that determines . Also, consider a family of all unitary operators , which is a subset of all possible . For a given Ψ, all image functions are mutually equivalent up to an isomorphism, and so their family can be viewed as a single object, a “state vector” , which belongs to a Hilbert space with the inner product
Then, Eq. (9) can be viewed as the “x representation” of , while the operator itself can be understood more generally as a transformation of , i.e., of the whole family . Using this invariant notation, one can formulate a machinery, called the Weyl calculus,48 which allows efficient asymptotic approximation of operators using the smallness of ϵ. Below, we overview the key theorems of the Weyl calculus which are used in our paper. Readers who are interested in details and proof of these theorems can find them in the supplementary material.
2. Coordinate and momentum operators
We start by defining the coordinate and momentum (wave-vector) operators,
such that the x representations of and are as follows:
Here, and pμ are the corresponding eigenvalues, and the factors are introduced to keep self-adjoint under the inner product (10).49 (This would not be the case if Mn were not diffeomorphic to .50) Since
and qν commutes with , the usual commutation relation is satisfied.
Let us consider the eigenvectors and of the coordinate and momentum operators, which are defined as
Since the operators are self-adjoint, these eigenvectors can be chosen as mutually orthogonal, and we shall assume the following normalization:
Here, we introduced
The function can be chosen arbitrarily as long as it is kept positive. It plays a role similar to that of in the Weyl calculus, but note that this is just a normalization factor, and we introduced it only to maintain the symmetry between and . One can show then that our original function Ψ and the envelope ψ can be expressed through the corresponding state vectors as
The p representations of these state vectors are introduced similarly as and . One can also show that and
Here, , and summation over repeated indices is assumed from now on.
3. Wigner–Weyl transform
The set of all eigenvalues of the coordinate and momentum operators form a 2n-dimensional “phase space” . For every given z, we introduce the so-called Wigner operator , which is self-adjoint and defined as
(Note that G = 1 if the x space is Euclidean or pseudo-Euclidean.) Using , we define the Wigner–Weyl transform, which maps a given operator on to a function A (“Weyl symbol”) on the z space. Specifically, the Weyl symbol of a given operator is (“tr” stands for trace); i.e.,
As can be seen easily from this definition, if a given operator is self-adjoint, then its Weyl symbol is real. This also leads to the following corollary. Consider splitting a given as , where the subscripts denote the Hermitian and anti-Hermitian parts,
Both and (not to be confused with ) are self-adjoint by definition. Thus, the corresponding Weyl images AH and AA are real.
We also define the inverse Wigner–Weyl transform , which maps a given function A to the corresponding operator via
The direct and inverse transforms set the “Weyl correspondence” between operators and functions on the space, . As can be checked by a direct calculation, for any function f, one has
However, the Weyl symbols of operators that cannot be represented as are generally more complicated. In particular, one can show that (see the supplementary material)
Overall, the Weyl symbol of an operator, that is, any given combination of and , approaches in the GO limit, when is negligible. However, in general, does not map simply to .
Accordingly, Eq. (5) becomes
where the invariant form of the envelope dispersion operator [Eq. (6)] is as follows:
We shall now approximate in three steps: (i) we map the right-hand side of Eq. (34) onto a function, or symbol, using the Wigner–Weyl transform; (ii) we approximate this symbol using the smallness of ϵ [Eq. (2)]; and (iii) we produce an operator out of the approximated symbol using the inverse Wigner–Weyl transform.
The first two steps can be done by expressing the Weyl symbol of as , where D is the Weyl symbol of and is the Moyal product (see the supplementary material). By expanding in the GO parameter, one can then obtain an approximate symbol of formally. But here we adopt a different approach, which is more transparent. We start by expressing through D explicitly,
Consider a formal Taylor expansion of the reference phase θ in s,
This leads to the following expression for the envelope dispersion operator:
where we integrated by parts and introduced an “effective dispersion function,”
The first term on the right-hand side of Eq. (41) is O(1) because at , it becomes the GO dispersion function, which is assumed order-one.51 (At nonzero ϵ, the Weyl symbol D generally does not coincide with the dispersion function that governs the wave in a homogeneous medium; see Sec. IV A 3.) The second term can be estimated by assuming and , since we are interested in at p close to k. Then,
meaning that the second term in Eq. (41) is roughly ϵ2 times smaller than the first one. This fact will be used below.
The inverse Wigner–Weyl transform will turn the coordinate p into the operator . The latter will act on the wave envelope [Eq. (33)], which is considered slow in the coordinate representation, and so . In this sense, , and so by Taylor-expanding in p, we are effectively expanding in ϵ. It is sufficient for our purposes to adopt the second-order expansion,
where we introduced and
By properties of the inverse Wigner–Weyl transform (Sec. IV A 3), can be written as follows:
The inverse Wigner–Weyl transforms here can be calculated using Eqs. (28)–(31) and . This leads to
One can further simplify Eq. (47) as follows. Suppose that the medium, including the metric, is characterized by some parameters . Within the accuracy of the theory developed in this paper, we shall neglect terms involving and while generally retaining terms such as (as to be explained Sec. IV E 2), where ∂ denotes a generic spatial derivative. Hence, we can ignore the difference between D and ; we can also omit in Eq. (47), and so can be expressed as follows:
Also, and can be calculated through the zeroth-ϵ limit of D, denoted D0, which is just the dispersion function of the homogeneous medium,
Importantly, can depend only on k and on local parameters of the medium (but not on their derivatives). As a gradient of a scalar [Eq. (4)], k is a true covector; hence, is a true scalar. This makes a vector and a tensor within the assumed accuracy.
As a side remark, note that vectors like , which are denoted with upper indices, belong to the space tangent to Mn. (Such vectors must not be confused with multicomponent fields denoted with Latin indices in Sec. V.) Similarly, covectors like kμ, which are denoted with lower indices, belong to the space cotangent to Mn. For any given X, one does not have to distinguish Xμ and Xμ only if the metric is Euclidean. In the case of a general metric gμν, one has
where the matrix is the inverse of . Similar rules apply to manipulating tensors, as usual.
C. Approximate envelope equation
Let us derive the x representation of Eq. (48), which we need to obtain an explicit form of Eq. (5). From the x representations of and given by Eqs. (13) and (14), terms like (for any f) must be interpreted as
where g and ψ are also functions of x. Then,
where all terms are functions of x and denotes the covariant derivative with respect to xμ, so (for any given vector Xμ) is the divergence, namely,
Except for Dψ, all terms in Eq. (51) are covariant (i.e., invariant with respect to coordinate transformations) within the accuracy of our theory. Hence, the symbol D is also covariant within the accuracy of our theory [this includes not only the zeroth-order dispersion function D0 but also the leading-order correction that may appear for inhomogeneous media], even though this may not be obvious directly from the definition of the Weyl transform (Sec. IV A). To emphasize this fact, and also to ensure consistency of notation with the vector-wave theory to be discussed in Secs. V and VI, we denote this covariant function as , and we also adopt the notation
Then, the approximate form of the envelope equation (5) can be summarized as follows:
Here, and are the Hermitian and anti-Hermitian parts of the approximated , respectively, namely,
[from now on, we do not emphasize the approximate nature of Eq. (51), and so ≈ will be replaced with =], and
where we assume a standard notation
Note that is the “full” derivative in the sense that, for any given , it applies to both the first and the second argument of f; namely, the chain rule leads to
The true partial derivatives are introduced as follows:
The derivatives and are equivalent for functions that depend only on x. In particular, satisfies
D. Leading-order approximation: Geometrical optics
For the envelope approximation to hold, i.e., for to remain a slow function, must be chosen such that . [The other terms in Eq. (5) are automatically small on the account of Eq. (2).] Let us adopt the usual GO ordering52
or more rigorously, ; i.e., much smaller than O(ϵ) is also allowed.53 Then, one can define such that
This can be achieved by calculating k as discussed in Sec. III with the ray Hamiltonian and the initial condition [at any chosen and ] such that
To the first order in ϵ, Eq. (5) is
and as a corollary,
At zero , the dynamics is conservative, and Eq. (67) reflects conservation of the wave action, or quanta.54 Accordingly, determines the dissipation rate, can be identified (at least up to a constant factor) as the action flux density, and Vμ is proportional to the group velocity (in space or in spacetime, depending on the problem). Equations (64)–(67) coincide with the equations of the traditional GO theory.1,54 Below, we extend them by retaining the second-order terms neglected in Eq. (66).
E. Quasioptical model
1. Ray-based coordinates
Let us start by introducing ray-based coordinates as follows.55 Suppose multiple rays launched with different within the beam. Their trajectories can be found using the ray equations as discussed in Sec. IV D; this determines the path ζ along each ray as a function of the initial coordinate. We treat ζ as the longitudinal coordinate along the wave beam, and each of its isosurfaces is considered as the transverse space . The coordinates on this space, , can be introduced arbitrarily (yet they will be specified below). Then, we define n – 1 independent vector fields via
and another vector such that it is linearly independent of all . Hence, any can be decomposed as follows:
and summation over repeated indices σ (and ) is henceforth assumed from 1 to n – 1.
Let us also construct the dual basis . (We treat vectors and covectors on the same footing; namely, and equally means and .) By definition,
and so we adopt . Then, the general form of is
where the first sign is determined by the metric signature and , and are arbitrary coefficients. This leads to the following metric representation in the coordinates :
Here, h is the matrix with elements and denotes the transposition. By using a known theorem for the determinant of a block matrix,56 one obtains
Let us also introduce the transverse projection of ,
where is a unit matrix and is a dyad formed out of . Then, one obtains from Eq. (69) that
and so serves as the transverse metric. Below, we assume , and so and do not need to be distinguished, and we can define the inner product on as follows:
We also choose transverse coordinates specifically such that . In other words, the coordinate of a given point in space is the initial location of the ray that arrives at this point from the initial transverse surface . Then, the transverse group velocity is zero,
2. Quasioptical equation
Suppose a wave beam such that its longitudinal scale , which is defined via , is much larger than its perpendicular scale , which is defined via and may or may not be the beam global width. (We assume the notation and .) We introduce two GO parameters,
[so the original parameter (2) is ], and we shall neglect terms smaller than from now on. Also, we require that either α is chosen as independent of , or, more generally, the relative variation of α across the wave beam is small enough such that is negligible. We also assume
where the derivatives can be taken over the longitudinal or transverse coordinates. (Having is indeed common in typical applications.30)
Here, , and we used due to Eq. (77). The operator is self-adjoint under the inner product (76) and given by
(We kept only the transverse derivatives in this term because the longitudinal ones are beyond the assumed accuracy.) Similarly, Eq. (56) becomes
where the higher-order terms are neglected because is assumed small [Eq. (63)]. Although the last term in Eq. (82) is negligible within the assumed accuracy, it is retained anyway to make (not to be confused with ) exactly self-adjoint under the inner product (76). Then, Eq. (5) becomes
and, as a corollary,
where we used the fact that and are self-adjoint. In particular, for zero , Eq. (84) predicts conservation of the wave-action flux through the beam cross section,
which will also simplify our notation. [Also, as a reminder, , and the parameters of the medium vary similarly.] This leads to
Equation (88) is the general quasioptical equation for a scalar-wave beam in an inhomogeneous medium. Since it is a parabolic equation (it contains only the first-order derivative with respect to ζ), Eq. (88) is much easier to solve than Eq. (5) with the original expressions for and given by Eqs. (55) and (56).
3. Simplified equations
A typical metric of interest for quasioptical modeling has the form with .30 This satisfies the assumed scaling (79) and corresponds to . Then,
which is negligible. [In contrast, .] The derivatives of in Eq. (81) are negligible too. Hence, the metric factors in the quasioptical equation (88) can be replaced with unity. (However, the effect of the metric on the dispersion matrix may remain important in this case; see Sec. VI.) Then, Eq. (88) becomes
Using the variable transformation , this equation can be further simplified as follows:
which is just a dissipative Schrödinger equation with
Note that Σ can also be expressed alternatively as follows. Suppose one finds the group velocity and constructs a ray-based metric (as described in Sec. IV E 1) in the vicinity of a given x. Then, Eq. (64) can be considered as an equation for the local k0 as a function of the transverse wave-vector components (and x) in this prescribed metric; i.e.,
By differentiating this with respect to , we obtain . By differentiating the latter formula once again, now with respect to , we also obtain
In the case of a spacetime problem, can also be expressed through the wave frequency in the laboratory frame. Then, the familiar Schrödinger equation for the wave envelope58 can be reproduced. We do not discuss it further because this subject is not essential to our paper.
V. VECTOR WAVES
A. Hilbert space for vector waves
Now, let us generalize the above results to vector waves. Suppose an m-component wave field , so
Here, is a vector on some Hilbert space for which we assume the following general inner product:
Here, the Latin indices that characterize the field components span from 1 to m. (This is in contrast to the Greek indices introduced earlier, which characterize the components of x and span from 0 to n – 1.) The matrix can be any m × m symmetric matrix. It can be considered as an additional metric. This metric does not have to be the same as g, as seen already from the fact that m and n do not have to be the same. (For example, Ref. 27 describes a six-dimensional field on a three-dimensional spacetime; also see Sec. VI B.) This means that the space to which belongs is not necessarily the space tangent to Mn. With that said, having is possible as a special case; see Secs. VI A and VI C.
Using γ as a metric, we introduce the standard rules for manipulating the Latin indices,
where are the elements of . In particular, in the x representation, one has
and we shall also use the following local dot product:
Under this dot product, a matrix A with mixed-index elements is self-adjoint  if the matrix with the corresponding lower-index elements, Aab, is Hermitian. The difference between Hermitian and self-adjoint matrices can be ignored if the metric γ is Euclidean or pseudo-Euclidean.
B. Operators on
Any operator on can be understood as an m × m matrix with elements which are operators on . For any given , we introduce , which is also an operator on . Correspondingly, two types of Weyl images can be defined, namely, and . Below, we assume the notation
Also, A will be an index-free notation of the matrix with elements , and will be an index-free notation of the matrix with elements . Importantly, should not be confused with because the Weyl symbols do not transform as tensors,
except for those which are independent of p. Symbols will not be used below, and so no special notation is introduced for them.
Note that by Eq. (98), the following equality holds for any , and on :
where we invoked the definition of the inner product on [Eq. (10)]. By the definition of the adjoint operator ,
where we used that is self-adjoint on . By comparing this with Eq. (105), one finds that, on one hand,
On the other hand, by Eq. (27), for the inverse Wigner–Weyl transform, one has
Hence, the Weyl image of is simply the conjugate transpose of the Weyl image of in the sense that
In other words, for operators with both indices lowered, the operations and commute.
Equation (109) implies that, if is self-adjoint on , then is a Hermitian matrix, and thus, the corresponding matrix A is self-adjoint (Sec. V A). In particular, this means that the Hermitian and anti-Hermitian parts [Eq. (26)] of an operator are determined by the Hermitian and anti-Hermitian parts of its lower-index Weyl symbol, respectively. We summarize this as follows:
For lower-index matrices, the Hermitian and anti-Hermitian parts are defined as usual,
C. Weyl expansion of the dispersion operator
Now, let us consider the dispersion operator in particular. According to the above definition, it is viewed as an matrix with elements . By lowering the index in the envelope equation (1), one can write this equation as follows:
where . Equivalently, this can be written as the envelope equation,
Then, the approximate operators can be expressed just like for scalar waves (Sec. IV B),
Here, the coefficients are x-dependent matrices; specifically, , and
One can also multiply Eq. (115) by to raise the first index. Then, one obtains
Note that if is self-adjoint, then is Hermitian; then, is self-adjoint too.
where is a true tensor and is a small remainder. [Here, means the same as in Sec. IV C, and so under the GO ordering and under the quasioptical ordering.] The splitting of Eq. (120) is not unique and is a matter of convenience; for example, one can define simply as the zeroth-order dispersion tensor. Hence, we can redefine
which is equivalent to the above definitions within the accuracy of our theory. Then, assuming the same orderings as in the scalar-wave case (also see below), the following approximation is enough both for the first-order theory and for the quasioptical theory:
Here, and , and so ; similar conventions are assumed for C and D. Accordingly, the matrices and DA are self-adjoint and so are the operators and , which are given by
Using and to obtain
one can also express these as
(The term must be kept in , but a similar term in can be neglected.) Also,
and since γ is k-independent, this leads to
D. Active and passive modes
1. Basis from the eigenvectors of
Hence, it is convenient to express ψ through the eigenvectors of . Let us denote these eigenvectors as and the corresponding eigenvalues as , and so
[Note that , where are the eigenvectors of . Likewise, , where are the eigenvalues of .] Since is self-adjoint, it has m eigenvectors ηs which form a complete basis . Let us also introduce the corresponding dual basis as usual, i.e., such that
Then, we can represent ψ as
where as serves as the components of ψ in the basis . [Remember that the dot product (101) includes conjugation.] Since is self-adjoint, it is always possible to make the basis orthonormal, and this choice is assumed below. Then, ; thus, , i.e.,
Also, for any two fields and , the inner product (98) can be written as follows:
where . The matrix δ with elements serves as a Euclidean metric for manipulating the mode indices s and . Those should not to be confused with the coordinate indices denoted with Greek letters and also with other Latin indices that are manipulated by the metric γ [Eq. (99)].
2. Amplitude vectors
The physical meaning of the expansion coefficients as is understood as follows. Consider multiplying Eq. (133) by ηs from the left. This gives , where no summation over s is assumed. This shows that, for a given s, there are two possibilities: either as is small or Λs is small. In the first case, the polarization ηs does not correspond to a propagating wave mode per se; the small nonzero projection of ψ on ηs is only due to the fact that the wave field is not strictly sinusoidal in inhomogeneous medium. We call such “modes” passive. In the second case, the local approximately satisfies the local dispersion relation . Then, as can be understood as the local scalar amplitude of an actual GO mode so that is allowed. We call such modes active, and mode conversion occurs when more than one active mode exists. In other words, for a given boundary or initial conditions, active modes are those that are excited resonantly, while passive modes are those that are nonresonant and, thus, adiabatically isolated. (However, this does not mean that the passive-mode amplitudes are simply negligible; see below.)
Assuming that there are active modes, we shall order them such that they correspond to and the remaining modes with are passive. Let us also adopt the notation
() for the passive-mode amplitudes, polarizations, and eigenvalues, respectively. Then, it is convenient to introduce the “amplitude vectors”
and the corresponding row vectors that are dual to the amplitude vectors under the Euclidean complex dot product, namely,
Below, we seek to derive an approximate envelope equation in terms of these amplitude vectors.
3. Polarization matrices
Before we proceed, let us introduce the following notation. First, consider the “polarization matrices”
These are nonsquare matrices that have active- and passive-mode polarizations as their columns, namely,
Using these, Eq. (136) can be rewritten compactly as
Also, consider the auxiliary polarization matrices
which have as their rows,
However, note that in general, does not mean to represent the adjoint in the common sense [cf. Eq. (154)].
Next, notice that
where the indices s and span from 1 to N. Analogous formulas apply to . Then,
and similarly for . In other words, one has
where is the unit square matrix and is the zero square matrix. (The dimensions of and are different in different equations.) Hence, if one multiplies Eq. (145) by and, separately, by , one obtains concise formulas for the amplitude vectors a and ,
Another property that we shall use later on is
and similarly for . Here, denotes the conjugate transpose () and is the Kronecker matrix with upper-index elements . Hence,
The factor can be omitted if one ignores the difference between upper and lower indices s and that refer to the mode number. If γ is Euclidean, one can also ignore the difference between upper and lower coordinate indices; then, .
As a side remark, note that Ξ and are generally nonsquare, and so and are not unit matrices but rather projectors. Indeed, due to Eqs. (152), one has and . Also, by applying Π and to Eq. (145), one obtains
Thus, is the projection of ψ on the active-mode space and is the projection of ψ on the passive-mode space.
4. Equation for the active modes
Using the eigendecomposition theorem and Eq. (152), one obtains
where Λ and are the diagonal eigenvalue matrices,
These and Eq. (152) also lead to
Hence, the envelope equation (5) can be written as
Then, Eq. (161) becomes an equation for just the N-dimensional active-mode amplitude vector,
As a reminder, this equation is valid up to .
E. Leading-order approximation: Extended geometrical optics
where we introduced
As shown in Appendix B 1,
where Ξ is considered as a function of x. [As a reminder, is the conjugate transpose of Ξ; is given by Eq. (121); and is a unit matrix which only raises the mode index.] Alternatively, Ξ can be considered as a function of . Then, as shown in Appendix B 2,
where the partial derivatives and are defined in Sec. IV C. The term U is the Stern–Gerlach Hamiltonian mentioned in the introduction (Sec. I), except here it is generalized to an arbitrary metric. This term causes polarization-driven bending of the ray trajectories, which is missed in traditional GO; also, it causes mode conversion, if more than one active mode is present.24–28 These effects are discussed in further detail in Sec. V G.
More explicitly, Eq. (165) can be written as
This generalizes the XGO equation derived in Refs. 24–29 to dissipative waves and curved coordinates. Since the vector a belongs to the space where the metric is Euclidean by definition (Sec. V D 2), the elements of this vector are covariant, i.e., invariant with respect to coordinate transformations in the physical space. The same applies to the eigenvalues comprising Λ; thus, the whole left-hand side of Eq. (172) is covariant too. This means that the operator on the right-hand side is also covariant, which includes its Hermitian and anti-Hermitian parts separately. Hence, U and are covariant.
and so Eq. (173) can be viewed as a generalization of Eq. (67). At zero , the dynamics is conservative, and Eq. (173) reflects conservation of the wave action or quanta.54 Accordingly, determines the dissipation rate, can be identified (up to a constant factor) as the action flux density summed over all active modes (for example, see Sec. VI B), and the elements of the diagonal matrix are proportional to the group velocities of the corresponding active modes. We shall call them the group velocities (without “proportional to”) for brevity. We also emphasize that this model applies even when the group velocities of the active modes are very different, unlike the quasioptical model discussed in Sec. V F.
For scalar waves studied in Sec. IV, we had and we defined such that this term be zero [Eq. (64)]. Now, Λ is a diagonal matrix with nonzero elements, and so it cannot be zeroed entirely by imposing just one scalar constraint on . Hence, there can be more than one natural way to define . One esthetically pleasing and convenient29 option is to require that Λ or be traceless. This amounts to choosing the reference-ray Hamiltonian in Sec. III as or , respectively. [Arbitrary constant factors can be introduced instead of N, for that only redefines τ in Eq. (7).] Another option is to adopt for some single , since all active modes have close wave vectors anyway; then, . As mentioned in Sec. III, all such choices of are equally justified. Although they lead to slightly different envelope equations, those equations are equivalent in the sense that they all describe the same total field by construction. (One might call this a gauge freedom.) However, for the case N = 1, choosing is preferable, as discussed in Sec. V G 2.
F. Quasioptical model
1. Quasioptical equation
Suppose that a wave propagates largely as a single beam. This implies that all active modes have group velocities close to their average group velocity , which can be defined via . Then,
where are matrices with eigenvalues much smaller than . Let us assume a ray-based coordinate system aligned with with the inner product on the transverse space defined similar to Eq. (76) [cf. also Eq. (138)], namely,
Here, is defined as in Eq. (167) and can be approximated as follows:
As shown in Appendix B 3, can also be simplified as
and so it is also self-adjoint under the inner product (176).
In summary, the quasioptical equation for vector waves can be written as follows:
Equation (183) is the main result of our paper. Like Eq. (172), this equation is covariant within the accuracy of our theory. As a reminder, a is generally a vector (), whose elements are the scalar amplitudes of active modes (Sec. V D 2); Λ is the diagonal matrix formed by the eigenvalues of ; is given by Eq. (166); is a scalar; ; is given by Eq. (182); and is given by Eq. (179). There, U is given by Eq. (171) and is given by Eq. (180). The derivatives of could be neglected within the assumed accuracy, but a purist might want to retain them in order to keep and precisely self-adjoint under the inner product (176).
Unlike in Eq. (85), this is the flux of all active modes combined, and the fluxes of individual active modes may not be conserved separately.
2. Simplified equations
Using the variable transformation , this equation can be further simplified as
Here, is an operator self-adjoint under the inner product (176) and given by
Also, Q, Σ, and are the self-adjoint matrices given by
Accordingly, Eq. (184) becomes
and so is conserved if is zero.
G. Summary and discussion
The quasioptical model proposed above is equally applicable to scalar beams, single-mode vector beams (N = 1), and multimode vector beams (N > 1).
1. Scalar beams
In the case of a scalar beam (Sec. IV), one has , and so
2. Single-mode vector beams
In the case of a single-mode vector beam (N = 1), one has , where η is the polarization vector. Then,
and so they are scalars. Also, , but U is generally a nonzero scalar function. There are two natural ways to define k in this case. One is to require that , i.e., to adopt . In this case, U is left in the envelope equation and can intensify the envelope inhomogeneity in the direction perpendicular to the group velocity. This may eventually result in the violation of the envelope approximation. (For multimode beams, this is less of a concern because they typically split into single-mode beams before the issue becomes significant.) Hence, it is potentially advantageous to define k by requiring , i.e., by adopting . In this case, the amplitude equation is identical to that of a scalar wave, while the effect of U is absorbed in the reference phase, leading to modified ray equations,
The effect of U on the ray trajectories is known as the (spin) Hall effect of light in optics.22,23,61–64 In plasma physics, this effect is typically neglected. (To our knowledge, all existing ray-tracing codes ignore U entirely.)
Finally, note that the ray dynamics can also be represented in an alternative form. Since Λ is a scalar, one can rewrite Eq. (171) for as follows:
Since there is only one mode, there is no mode index to manipulate, and so the “metric” factor δ has been dropped. Likewise, the anti-Hermitian parts are simply the imaginary parts in this case. We also substituted the GO ray equations (191), where we replaced with dots, and adopted
Then, the phase-space Lagrangian of a ray, , has a noncanonical structure, namely,
The equations originating from a special case of this noncanonical phase-space Lagrangian were used, for example, in Ref. 62 to study the Hall effect for light propagating in a nonbirefringent material. A comparison of the resulting noncanonical ray equations with the canonical ray equations (193) can be found in Ref. 27. Also notably, the terms and are known as the Berry connection and are linked to the Berry phase or the geometrical phase of light.27,44,62
3. Multimode vector beams
In the case of multiple active modes (N > 1), a is an N-dimensional vector, and the coefficients in the amplitude equation are matrices (except for V, which is a scalar). In particular, the matrices , and U are generally nondiagonal, and so they can cause mode conversion. In the simplest case, when both the transverse gradients and dissipation are negligible, this process is most transparently described by Eq. (187), which then becomes
with Q being a self-adjoint matrix. The modes decouple when Q is close to diagonal. More generally, the dynamics governed by Eq. (198) is similar to that of an N-level quantum system with a Hamiltonian Q (and to the dynamics of N coupled classical oscillators whose parameters do not change too rapidly with ζ65). Alternatively, it can be mapped to a precession equation for a real -dimensional “spin” vector.26 This approach is particularly intuitive at , when the spin is three-dimensional. A detailed analysis of mode conversion for this case can be found in Ref. 29.
As a reminder, the model presented here (Sec. V F) relies on the assumption that the group velocities of the active modes are close to each other. Otherwise, beam splitting occurs rapidly, and the assumption that does not hold. The quasioptical description is inapplicable to such beams; however, the first-order XGO model described in Sec. V E can be used.
VI. ELECTROMAGNETIC WAVES
A. Covariant formulation
Here, we shall explain how the above theory applies to EM waves. We start with Maxwell's equation68
where is the EM tensor, namely,
A is the vector potential, and J is the current density. This equation can also be represented as follows:
Here, is the vacuum dispersion operator,
Hence, the theory developed in the previous sections readily applies, specifically, with the dimension of the configuration space being n = 4, the dimension of the vector field A being m = 4 (so m = n), and . The corresponding Weyl symbols are calculated as follows. It can be shown straightforwardly that is approximately given by
Here, we omitted second-order derivatives of g, which are negligible within the accuracy of our theory. Also, q is defined as in Eq. (15), and , which is in agreement with Eq. (127) because . Finally, in the form introduced in Sec. V C, the above result is
where is the GO limit of is the remaining part of , and itself is the symbol of .
B. Noncovariant formulation
As a special case, suppose a metric of the form
with time-independent spatial metric h. Then, the above equations can be simplified as follows. Let us assume the Weyl gauge (). Then, it is sufficient to consider just the spatial part of the four-dimensional Eq. (203) and replace the vector potential with the electric field,
(As usual, , which is a self-adjoint operator. Assuming we work with wave fields that have zero time average, can also be considered reversible.) Unlike earlier, we now use the Latin indices to denote spatial components, and we shall adopt the bold font for spatial vectors and matrices when using the index-free notation. Specifically, one obtains
where we also multiplied the equation by . This has the form (1) with n = 4, m = 3, and .
Let us introduce the conductivity operator via and notice that by the definition of the susceptibility operator . Then, , and so can be expressed as follows:
or equivalently, . The corresponding Weyl image is , or
where serves as the dielectric tensor and is the GO limit of .
Using the fact that the dispersion operator is defined only up to a constant factor, let us introduce an additional factor . Then, Eq. (210) becomes
and so they both can be of order ; hence, . Note that such C is generally non-negligible even for near-Euclidean metrics such as those discussed in Sec. IV E 3.
From Eq. (174), we obtain
which acts as the true action-flux density.54 As a reminder, the action density can be written as
where we used . Using the latter and Faraday's law for the magnetic field B, one can also rewrite in another common form,4
where S is the (time- or space-averaged) Poynting vector,
The GO equation (173) can be written as
The corresponding density of the wave energy is , and the density of the energy flux is , as flows from the variational principle.54
C. Stationary waves
For stationary waves with fixed frequency, one has and ω can be treated as a constant parameter. Such waves can be studied on the three-dimensional configuration space, namely, the physical space x. In this case, and . If there is no spatial dispersion, then , where the dependence on ω is assumed but not emphasized. Assuming that the underlying space is flat (which is always the case in laboratory applications), the symbol is then identical to the susceptibility of the homogeneous medium. (Other formulations of the susceptibility have also been proposed in this case, e.g., for light in nondispersive dielectric media27 and for waves in cold plasmas.24,25,67)
These properties can also be extended, approximately, to media with weak spatial dispersion, i.e., such that
For example, in plasma, is proportional to the temperature, and for EM waves, thermal effects often can be considered as small perturbations.4 Although the Weyl symbol is, strictly speaking, different from the corresponding part of the susceptibility in homogeneous medium, the difference is of order , and so it is much smaller than and thus can be neglected. In other words, one can simply replace the small with its GO limit, which is usually well known.4
In principle, our general method is also applicable to waves in media with strong spatial dispersion. However, may be hard to calculate in that case unless a medium is homogeneous, and no extrapolation of a known homogeneous-medium model is guaranteed to yield the true general . A related discussion can also be found in Ref. 43.
D. Waves in flat space
Let us also discuss the special case when the configuration space is flat and Eq. (220) applies, as in typical laboratory applications. Although the ray-based metric h is still non-Euclidean in this case (unless the reference rays are straight), one can use the flatness of the configuration space in the following sense. In the Euclidean laboratory coordinates , the spatial metric has the form and vanishes. (Here and further, the tildes denote that the corresponding quantities are evaluated in the laboratory coordinates. Note that Papers II and III assume a different notation; there, tildes are quantities in the ray-based coordinates, and nontilded are those in the laboratory coordinates.) This simplifies the expression for the dispersion matrix and makes DH a true tensor (modulo the insignificant corrections caused by weak spatial dispersion discussed in Sec. VI C). Then, since U and are frame-invariant (Sec. V E), one can calculate them in the laboratory coordinates using
where both and are omitted for simplicity, since they are unit matrices. This also extends to the quasioptical model. Since calculations in the laboratory coordinates are generally easier than the corresponding calculations in the ray-based coordinates, this simplifies applications of our theory to numerical simulations, as discussed further in Paper II.
Specific applications of this theory, including those to waves in fusion plasmas, will be discussed in future publications. Here, we only note that our formulation relies on the GO ordering and thus is inapplicable to wave transformations near caustics, for example, as in the O–X conversion caused by the density gradient in dense magnetized plasmas.68 In contrast, the O–X conversion caused by the magnetic shear in dilute plasmas29 can be modeled naturally, as also demonstrated explicitly in Paper III. Any adiabatic transformations of waves along continuous dispersion curves can also be modeled naturally, specifically, within the single-mode approximation summarized in Sec. V G 2. (In special cases like the X–B transformation,4 when the group velocity becomes zero on a ray, while k remains large, one may want to replace the coordinate ζ with τ, where , to remove the singularity in the amplitude equation.) In all these cases, our theory is expected to yield results which agree with full-wave modeling up to local errors of order , because this is the accuracy within which the passive-mode contribution is calculated (Sec. V D 4).
In summary, we propose a quasioptical theory of mode-converting wave beams in inhomogeneous media such as plasma. This includes the following. For any given dispersion operator that governs the original wave field Ψ, we explicitly calculate the approximate operator that governs the wave envelope ψ to the second order in the GO parameter ϵ. Then, we further simplify this envelope operator by assuming that the gradient of ψ transverse to the local group velocity is much larger than the corresponding parallel gradient. This leads to a parabolic differential equation for ψ (“quasioptical equation”) on the basis of the GO polarization vectors [Eq. (136)]. Our main results can be found in Sec. V F, which includes the general quasioptical equation (183) and its special case (187). Scalar and mode-converting vector beams are described on the same footing (Sec. V G). We also explain how to apply this model to EM waves considered as a special case (Sec. VI). In the follow-up papers, we report successful quasioptical modeling of radio frequency wave beams in magnetized plasma based on this theory.
The first author (I.Y.D.) acknowledges the support and hospitality of the National Institute of Fusion Science, Japan. The authors are also thankful to M. A. Oancea for stimulating discussions regarding the XGO generalization to non-Euclidean metrics. This work was supported by JSPS KAKENHI Grant No. JP17H03514. This work was also supported by the U.S. DOE through Contract No. DE-AC02-09CH11466 and by the Laboratory Directed Research and Development program at Sandia National Laboratories. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. DOE National Nuclear Security Administration under Contract No. DE-NA-0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in this paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
APPENDIX A: SUMMARY OF THE SELECTED NOTATIONS
Here, we present a summary of the selected notations used in the main text.
1. Basic symbols
• denotes a definition.
• ̂ denotes an operator.
• denotes the matrix transpose.
• denotes the conjugate matrix transpose.
• is used only in and defined in Eqs. (146).
• H and A denote the Hermitian and anti-Hermitian parts of an operator [Eqs. (26)] or those of a matrix (Sec. V B). When applied to a scalar, H and A denote the real and imaginary parts, respectively.
• The underline notation is explained in Sec. V B and is used for matrices with lower indices only.
• and denote the Wigner–Weyl transform and its inverse (Sec. IV A 3), respectively.
2. Index manipulation
On the n-dimensional configuration space Mn, we assume a general metric tensor g with components . For (co)vector fields on the space (co)tangent to Mn, the indices are manipulated as usual, namely, via
(Summation over repeated indices is always assumed unless specified otherwise.) For vector waves, we also introduce an additional vector space that may or may not be the same as the space tangent to Mn (Sec. V A). On that space, an additional metric γ is introduced, and the indices are manipulated as follows:
We also introduce the Euclidean metric δ on the space of amplitude vectors a (Sec. V D 2),
For active modes, which are of our primary interest, the mode indices s and range from 1 to N. The distinction between as and as is made only for esthetic reasons (consistency of notation) and can be neglected otherwise.
For spatial derivatives, we use the notation that is standard, for example, in general relativity.69 In particular, is the divergence, namely,
For functions of the form , we also introduce the following partial derivatives:
Since , one has
4. Inner products
The inner product for scalar waves on Mn is
and for m-dimensional vector waves on Mn,
In particular, in the x representation,
We also use the dot product
The inner product on the transverse space is defined for scalar fields as
and for the N-dimensional “active” parts of the vector fields (Sec. V D 2) as
APPENDIX B: AUXILIARY CALCULATIONS
Here, we present some auxiliary calculations whose results are used in Sec. V.
1. Calculation of and
We start with Eq. (167) for . This equation can be rewritten as follows:
where we used Eq. (52) and introduced
where the subscript A denotes the anti-Hermitian part, as usual.
2. Calculation of
Let us also derive an alternative expression for U0 [Eq. (B4)] in terms of the partial derivatives instead of the full derivatives . Using Eq. (59), one obtains . Then, Eq. (B4) can be rewritten as follows:
By differentiating Eq. (159) with respect to kμ, we obtain
where we used that Λ (but not its partial derivatives) is small. Let us multiply this by γ to obtain
where the latter equality is the complex conjugate of the former. Using these and , we further obtain, to the leading order,
where we neglected terms of the second order in ϵ. We also used the fact that by the definition of active modes, in addition, Λs are changing slowly, and so . (Note that this is only true for but not for .) Using Eqs. (B7), we also obtain similarly that
As a reminder, the subscript A denotes the anti-Hermitian part, and is a unit matrix which only raises the mode index. Equation (B10) is similar to the corresponding result derived in Refs. 24–26. Extra terms are included in those papers in the expressions for U0, but they are of the second order in ϵ and could have been neglected.
3. Calculation of
Here, in the first line, we substituted Eq. (128) for and also Eq. (14) for . Then, we only kept the terms involving derivatives of a along the perpendicular direction of the wave beam. The parentheses in the indices denote symmetrization, namely, for any A and B,
The metric factor has been added to keep the operator self-adjoint under the inner product (176). (Accounting for the inhomogeneity of h in is beyond the accuracy of our theory. However, keeping self-adjoint is convenient and physically meaningful, for it is known that the exact operator is not responsible for dissipation.) Hence,
In order to simplify the expression for , note that
where we used (and thus ) and neglected terms proportional to . Hence,
where we substituted [Eq. (B2)]. The quasioptical approximation implies that is close to a scalar matrix, and so the commutator can be neglected. Hence, .
It is not a purpose of this paper to provide a comprehensive list of existing approaches to reduced modeling of EM waves in plasmas, so the above reference list is not intended as exhaustive.
This does not extend to ray-tracing codes. For example, a ray-tracing code raycon1 captures mode conversion by using an approximate yet somewhat generic analytic solution that applies in the limit of a narrow region of resonant coupling. A related ray-tracing model with mode conversion was also proposed in Refs. 24 and 25.
Mode conversion is understood here as the exchange of action between near-resonant but nonintersecting branches of the dispersion relation.1 Mathematically, this process is similar to quantum tunneling. Adiabatic transformations of a wave mode along continuous dispersion curves, such as the X–B transformation,4 are not mode conversion in the true sense of the term and can also be simulated using already existing reduced models.
This also facilitates the aforementioned applications to related problems in general relativity,23 as will be discussed in future publications.
Eventually (Sec. V E), we shall choose such that be small. But that will characterize rather than the function D itself, which is considered order-one when evaluated on general .
In principle, produces effects qualitatively different from those produced by , so one might argue that the accuracy of its approximation does not necessarily need to equal that of . Besides, is often more abrupt than , so retaining in Eq. (82) may in fact lead to a better model in principle. However, in such cases, the Weyl symbol D of the original dispersion operator is typically known only to the zeroth order (Sec. VI C), and thus so is . Hence, retaining first-order corrections the damping coefficient does not seem warranted.