An adjoint formulation leveraging a physics-informed neural network (PINN) is employed to advance the density moment of a runaway electron (RE) distribution forward in time. A distinguishing feature of this approach is that once the adjoint problem is solved, its solution can be used to project the RE density forward in time for an arbitrary initial momentum space distribution of REs. Furthermore, by employing a PINN, a parametric solution to the adjoint problem can be learned. Thus, once trained, this adjoint-deep learning framework is able to efficiently project the RE density forward in time across various plasma conditions while still including a fully kinetic description of RE dynamics. As an example application, the temporal evolution of the density of primary electrons is studied, with particular emphasis on evaluating the decay of a RE population when below threshold. Predictions from the adjoint-deep learning framework are found to be in good agreement with a traditional relativistic electron Fokker–Planck solver, for several distinct initial conditions, and across an array of physics parameters. Once trained, the PINN thus provides a means of generating RE density time histories with exceptionally low online execution time.

The treatment of runaway electron (RE) dynamics is a critical component in the integrated description of tokamak disruptions1,2 and startup scenarios.3,4 While first-principle kinetic solvers are available to evaluate the transient RE dynamics during these phases of a tokamak discharge,5–9 such approaches are computationally intensive, thus, sharply limiting their application. This has led to the development of several reduced models of RE formation that may be conveniently coupled to magnetohydrodynamic (MHD) codes.10–12 These reduced models, however, are often based on the assumption of a slowly evolving background plasma and are, thus, unable to accurately capture highly transient phases of RE generation or decay. An exception to this trend includes reduced models of hot tail RE seed formation,13–15 where RE formation during a rapid thermal quench is estimated. Such models, however, have to date been developed specifically for the thermal quench, and, thus, are not applicable to other transient phases of a disruption, and do not self-consistently include the presence of large-angle collisions. The amplification of the RE seed by large-angle collisions, the so-called RE avalanche,16 is instead incorporated by a separate module17–19 that does not directly account for the transiently evolving momentum space distribution of REs.

The present work, together with Paper II,20 aims to address these limitations by employing an adjoint-deep learning framework that will allow the transient dynamics of REs to be evolved with an online execution time that is orders of magnitude less than a traditional relativistic Fokker–Planck solver. To achieve this, we will first focus on the solution to an adjoint problem,14,21–23 rather than directly solving the relativistic Fokker–Planck equation. A distinguishing feature of this approach is that a single solution of the adjoint problem can be used to evolve the RE density forward in time for an arbitrary initial momentum space distribution. Thus, once a solution to the adjoint problem has been identified, the impact of distinct initial momentum space distributions on RE density evolution can be rapidly evaluated for a given set of physics parameters. The second component of the framework will be the use of a physics-informed neural network (PINN)24 to identify a parametric solution to the adjoint of the relativistic Fokker–Planck equation. Physics-informed neural networks are a form of deep learning, whereby physical constraints are embedded in the training of a neural network (NN). While this approach has gained prominence due to its ability to integrate physical constraints with data,25 or treat inverse problems,26 for cases where a closed system of equations is available, PINNs can be used to learn the solution to a PDE across a broad range of physics parameters in the absence of data.27,28 Further noting that while the offline training time of a PINN is long, its online inference time is orders of magnitude faster than a traditional solver. The PINN will thus provide a rapid surrogate of the physical system of interest. Combining the above two components, the resulting adjoint-deep learning framework will be able to generate time histories of the RE number density at a fraction of the cost of a traditional solver, for an arbitrary initial momentum space distribution of REs and across a range of plasma conditions. The present paper, along with Paper II,20 will focus on developing the fundamental components of this adjoint-deep learning framework, while future works will focus on employing it for describing tokamak disruptions.

As specific example problems, we consider the decay of a primary RE distribution, while Paper II20 will treat the avalanche amplification of an initial RE seed. While large-angle collisions are essential to providing a description of RE generation during the current quench phase of a disruption, an accurate treatment of RE decay is essential when describing the decay of plasma current29 during the RE plateau. This decay rate will be inferred as a function of electric field strength, effective charge, and the strength of synchrotron radiation. The RE decay rate is shown to have a strong nonlinear dependence on the strength of the electric field, in contrast to commonly employed analytic expressions for RE growth and decay such as Ref. 17, which indicate a linear dependence. Furthermore, we find that characterizing the decay of REs using a simple decay rate, as is often done in current applications, is often inaccurate either due to the long time required for the RE distribution to settle into a well defined exponential decay or due to the RE density exhibiting temporal dynamics that cannot be fit by an exponential, even in the long time limit. While the present paper does not include large-angle collisions, Paper II20 finds that the decay of the overall RE distribution is well approximated by the decay of the primary population neglecting large-angle collisions. The present estimate, thus, provides an efficient means of evaluating the decay of a RE population when below threshold. As a further application, time histories of the RE density evolution are computed for distinct initial momentum space distributions of REs. It is shown that the adjoint-deep learning framework is able to accurately resolve these time histories for RE distributions with sharply different initial pitch distributions. The specific time histories predicted by the adjoint-deep learning framework are compared with those predicted by the particle-based RE solver RAMc,7 with good agreement evident across a range of initial momentum space distributions.

The remainder of this paper is organized as follows. Section II discusses the adjoint formulation of the relativistic Fokker–Planck equation. A brief description of PINNs is provided in Sec. III, along with a discussion of the tailored NN architecture employed in this paper. The temporal evolution of the number density of primary electrons for distinct initial momentum space distributions of REs is given in Sec. IV, along with evaluating RE decay for a range of plasma parameters. A brief discussion along with conclusions are given in Sec. V.

The adjoint to the relativistic Fokker–Planck equation has received substantial treatment throughout the plasma physics literature, both in the context of wave driven currents21,30–32 along with more recent treatments focusing on RE formation.14,19,22,23,33 In this section, we will describe an adjoint problem, where our interest will be evaluating the probability that an electron initially located at a given energy and pitch, runs away at a later time. We begin by introducing the Green's function F(p,t;p0,t0), which is taken to obey the relativistic Fokker–Planck equation
(1)
(1a)
where F evolves due to electric field acceleration, small-angle collisions, and losses from synchrotron radiation, whose form are defined by
(1b)
(1c)
(1d)
Here, (p0,t0) indicate the momentum and time the electron was injected at, we will consider a bounded system with momentum in the range p[pmin,pmax], the electron's pitch is defined by ξp/p[1,1], time is normalized as tt/τc, with τc4πϵ02me2c3/(e4nelnΛ), lnΛ is the Coulomb logarithm, momentum as pp/(mec), the parallel electric field as EE/Ec, where Ecmec/(eτc) is the Connor–Hastie electric field,34 the Lorentz factor is defined as γ=1+p2, ατc/τs describes losses from synchrotron radiation due to the electron's gyromotion, and τs6πϵ0me3c3/(e4B2) is the timescale associated with synchrotron radiation. The coefficients for the collisional drag CF and pitch-angle scattering CB in Eq. (1c) are defined by
(2a)
(2b)
We have neglected energy diffusion since it has a magnitude of order Te/(mec2) at relativistic energies, such that it will be subdominant for post thermal quench plasmas with temperatures of tens of electron volts or less.1 
Turning to the boundary conditions on F(p,t;p0,t0), from Eq. (1), the momentum space flow Up of electrons through the upper and lower momentum boundaries is given by
(3)
When E>1, regions with both Up>0 and Up<0 will be present at p=pmax. To ensure that no inflow of electrons through the upper momentum boundary is present we will require F to vanish at p=pmax when Up<0. With regard to the lower momentum boundary, we will ensure that pmin is chosen to be small enough such that Up<0 at this surface (leading to a flow directed out of the domain), so no boundary condition on F needs to be applied at p=pmin. With these boundary conditions, a particle balance relation can be derived by integrating Eq. (1a) over momentum and time, yielding
(4)
where the momentum integration is for pminppmax, n̂ is a unit vector pointing out of the bounded domain, Up=p̂Up, and we only include regions on the upper momentum boundary with an outward flux of particles such that n̂·Up>0, since F will vanish otherwise. Noting that F describes the electron distribution resulting from a unit source at (p0,t0), Eq. (3) enforces that the probability of the electron remaining inside the momentum space domain bounded by pmin and pmax [first term in Eq. (4)] or exiting through either the low or high momentum boundaries [second term in Eq. (4)], is one.
Considering now the adjoint to the relativistic Fokker–Planck equation,14,21–23 which may be expressed as
(5)
(5a)
with the adjoint operators defined by
(5b)
(5c)
(5d)
By successive integration by parts, an adjoint relation can be written as
(6)
where we have noted that the pitch flux vanishes at ξ=±1, leaving only the fluxes at the upper and lower momentum boundaries [last term in Eq. (6)]. Inserting Eqs. (1a) and (5a) into Eq. (6) yields
(7)
Evaluating the delta functions and combining the time derivative terms yields an expression for P, i.e.,
(8)
The physical meaning of the quantity P is determined by the choice of the terminal condition P(t=tfinal), and the upper and lower momentum boundary conditions on P. While a range of quantities of interest could be evaluated using this adjoint formulation by considering different terminal and momentum space boundary conditions, for the remainder of this paper we will focus on the runaway probability function (RPF).21 This is accomplished by enforcing the terminal and momentum space boundary conditions
(9)
(9a)
(9b)
(9c)
where Θ(x) is a Heaviside function and Up is defined by Eq. (3) above. Here, pRE is taken to be the momentum an electron must exceed to be labeled as a RE. While there is some degree of arbitrariness in its precise choice, it should be chosen to be larger than the momentum of a thermal electron, but not too much larger than the critical energy to run away to avoid undercounting REs. The lower momentum space boundary condition Eq. (9b) is chosen such that electrons that fall through the low momentum boundary, and are, thus, lost to the bulk plasma, are not counted as REs. The upper momentum space boundary condition Eq. (9c) is selected such that electrons with Up(p=pmax)>0 (which will be accelerated through the upper momentum boundary), are counted as REs. However, for regions on the upper momentum boundary where Up(p=pmax)<0, we will not enforce P(p=pmax)=1 since these electrons would not be immediately accelerated through the upper momentum boundary. Furthermore, we also will not set P(p=pmax)=0 for Up(p=pmax)<0, since these electrons still have a finite probability of running away at a later time. This portion of the upper momentum boundary will, thus, be left unconstrained. We note that its contribution to the surface term in Eq. (8) will be zero, since F vanishes at the upper momentum boundary for Up<0.
With the terminal condition and upper and lower momentum boundary conditions described by Eq. (9), Eq. (8) can be written as
(10)
where ξout defines the region in pitch that satisfies
Comparing Eq. (10) with Eq. (4) implies that P(p=p0,t=t0;tfinal) is the probability that an electron either remains in the simulation domain, but with an energy greater than pRE [first term in Eq. (10)] or that the electron has exited through the upper momentum boundary [second term in Eq. (10)] by a time tfinal. The quantity P(p=p0,t=t0), thus, represents the probability of an electron injected at p=p0 and t=t0 obtaining a momentum of pRE or greater at tfinal and is often referred to as a runaway probability function.14,21–23,33

The physical interpretation of the RPF can be made clear by considering the orbits of several electrons initialized at different pitches as shown in Fig. 1. These orbits have been computed using characteristic equations derived from Eq. (1), where pitch-angle scattering was neglected leading to deterministic orbits. The parameters (E=2 and α=0.1) were selected such that the system is slightly above the avalanche threshold, where effects from synchrotron radiation are most pronounced. After integrating forward in time until t=τc [see Fig. 1(a)], all of the electron orbits have slowed down, except an orbit with an initial pitch of ξ=0.99. This last orbit was initialized with a pitch that put it in a region with Up>0 (below the dashed black curve) allowing it to be accelerated. It is useful to note that the rate at which electrons in the Up<0 region decelerate depends sensitively on their pitch, with electrons near ξ0 decelerating the most rapidly due to synchrotron radiation being maximal in this region, where the presence of synchrotron radiation results in the region with Up>0 contracting as p increases. At a later time [ t=2τc Fig. 1(b)], the majority of electron orbits continue to slow down, but are also tilted toward the negative pitch region. This is due to the electric field pushing the electrons toward the ξ=1 axis. Finally, by t=10τc, all of the electrons have either been accelerated through the p=10 surface, or decelerated to the bulk (i.e., p=0 since we have assumed a cold bulk plasma). Those that have been accelerated through the p=10 surface [which could be taken to be the upper momentum boundary] will be treated as REs according to Eq. (10). In contrast, two orbits in Fig. 1(c), the purple and red orbits initialized with ξ>0.8 have fallen back to the bulk, and are, thus, deemed to not run away. Thus, for the case where pitch-angle scattering is neglected, a separatrix will form delineating runaway and non-runaway orbits.

FIG. 1.

Example deterministic electron orbits. Here, electrons are initialized at the solid circle markers located at p=10 and evolved forward in time including electric field acceleration, collisional drag, and synchrotron radiation. Panel (a) indicates the orbits at t=τc, panel (b) for t=2τc, and panel (c) is for t=10τc. The dashed black curve indicates the contour Up=0, where regions below this curve will have Up>0. The parameters were taken to be E=2 and α=0.1, where Zeff does not enter since pitch-angle scattering was neglected.

FIG. 1.

Example deterministic electron orbits. Here, electrons are initialized at the solid circle markers located at p=10 and evolved forward in time including electric field acceleration, collisional drag, and synchrotron radiation. Panel (a) indicates the orbits at t=τc, panel (b) for t=2τc, and panel (c) is for t=10τc. The dashed black curve indicates the contour Up=0, where regions below this curve will have Up>0. The parameters were taken to be E=2 and α=0.1, where Zeff does not enter since pitch-angle scattering was neglected.

Close modal

The primary impact of pitch-angle scattering on the RPF will be to broaden the sharp transition region and shift its location to higher energy. This is illustrated in Fig. 2, where we have computed the RPF from the steady state adjoint equation, but artificially multiplied the pitch-angle scattering term by factors of 0.01 [Fig. 2(a)], 0.1 [Fig. 2(b)], and 1 [Fig. 2(c)] for a plasma with Zeff=1, while keeping all other terms fixed, including the collisional drag. It is evident that for Fig. 2(a) (where the pitch-angle scattering term is multiplied by 0.01), a sharp transition region is present, where the region with P1 closely aligns with the RE orbits shown in Fig. 1(c). However, as the strength of pitch-angle scattering is increased in Figs. 2(b) and 2(c), the transition region broadens, and the location of the P=0.5 contour shifts to higher energy, implying that RE generation will be less efficient as pitch-angle scattering is increased.

FIG. 2.

Steady state RPF with the pitch-angle scattering term multiplied by a factor of 0.01 [panel (a)], 0.1 [panel (b)], and 1 [panel (c)]. The parameters were taken to be E=2, Zeff=1 and α=0.1. The RPF was evaluated by solving Eq. (5) at steady state using a physics informed neural network, whose specific implementation is described in Ref. 19.

FIG. 2.

Steady state RPF with the pitch-angle scattering term multiplied by a factor of 0.01 [panel (a)], 0.1 [panel (b)], and 1 [panel (c)]. The parameters were taken to be E=2, Zeff=1 and α=0.1. The RPF was evaluated by solving Eq. (5) at steady state using a physics informed neural network, whose specific implementation is described in Ref. 19.

Close modal
An immediate application of the RPF is to predict the number of REs at a later time. Specifically, if an electron distribution fe(init)(p,ξ) was present at t=0, the number density of electrons at a later time tfinal is given by
(11)
Here, P(p,ξ,t=0;tfinal) indicates the probability that an electron initially located at (p,ξ) runs away after a period of duration tfinal. Hence, by weighing the density moment of the initial distribution of electrons by P(p,ξ,t=0;tfinal), this yields the number density of REs at t=tfinal.
The number density of REs at times intermediate between t=0 and t=tfinal can be obtained by a generalization of Eq. (11). A conceptually simple generalization would be to solve a series of terminal value problems for end times tend, where tend falls between t=0 and t=tfinal [see Fig. 3(a)]. Each of these problems would take P(t=tend)=Θ(ppRE) as the terminal condition and integrate backward to t=0, thus allowing the RE density to be projected to an end time tend, such that Eq. (11) would then be written as
(12)
where fe(init)(p,ξ) is the electron distribution at t=0. While conceptually straightforward, this approach would involve a substantial increase in computational cost, since several terminal value problems must be solved. A more efficient means of determining the RE density at intermediate times between t=0 and tfinal can be arrived at by noting that solving a terminal value problem ending at t=tend, and integrating backward until t=0, is identical to defining the terminal value at tfinal and integrating backward for a period of duration tfinaltend [see Fig. 3(b)]. In this latter case, the RE density at tend is given by
(13)
FIG. 3.

Panel (a) indicates an adjoint problem with the terminal condition P(t=tend)=Θ(ppRE), and Eq. (5) is integrated backward to t=0. The solution to this problem will be identical to a problem with the terminal condition at P(t=tfinal)=Θ(ppRE) and integrating backward until t=tfinaltend.

FIG. 3.

Panel (a) indicates an adjoint problem with the terminal condition P(t=tend)=Θ(ppRE), and Eq. (5) is integrated backward to t=0. The solution to this problem will be identical to a problem with the terminal condition at P(t=tfinal)=Θ(ppRE) and integrating backward until t=tfinaltend.

Close modal
Hence, by solving a terminal value problem with P(tfinal)=Θ(ppRE) and integrating backward until t=0, the RPF at all intermediate times will be computed. Once this terminal value problem has been solved, the RE density at a time between 0ttfinal is given by
(14)
where τtfinalt. Equation (14) will allow the time evolution of the density moment of an initial seed population of REs with distribution fe(init)(p,ξ) to be projected to any time between 0 and tfinal.

The adjoint problem discussed in Sec. II allows the RE density to be evolved in time beginning with an arbitrary initial momentum space distribution of REs. Our aim in this section will be to extend the generalizability of the adjoint framework by utilizing a PINN to evaluate the parametric solution to Eq. (5). In this way, the resulting adjoint-deep learning framework will be able to evaluate the RE density for both an arbitrary initial electron distribution, and a range of plasma conditions. Once trained, the resulting framework will allow for nearly instantaneous projections of RE density incorporating fully kinetic physics.

Physics-informed neural networks24,25 have emerged as a prominent example of physics-constrained deep learning methods.35–38 The present discussion will only focus on the essential concepts, where the interested reader is referred to Ref. 25 (and references therein) for a more detailed discussion. Physics-informed neural networks utilize physical constraints to regularize the training of a NN. A PINN in its simplest form can be expressed as24 
(15)
where wPDE is a scalar that determines the weight given to the PDE portion of the training, Pi represents data points for the quantity of interest (the RPF in the present paper), pi are the momentum space coordinates, ti is time, and λi represent parameters of the physical system which in the present paper will be taken to be λ=(E,Zeff,α), Ndata is the number of data points used in the training of the NN, which includes boundary and initial conditions for the PDE as well as available experimental or simulation data, NPDE is the number of points where the physical constraint is sampled, and R(pi,t;λi) is the residual of the PDE, which penalizes deviations from the imposed physical constraint of the system. We note that while the amount of experimental or simulation data are often limited, the value of NPDE may be taken to be very large, where increasing its value will only impact the computational cost of the offline training of the model. This property will be exploited to enable the training of the model across a broad range of physical parameters.

For the present study, we will consider the limit where no data are available, such that we will rely exclusively on physical constraints for training the PINN. A key component when developing a robust PINN will be to enforce several properties of the solution as hard constraints in the NN architecture itself, rather than as soft constraints in the loss function. This prevents the PINN from making unphysical predictions while simultaneously, and perhaps more importantly, severely restricting the solutions that the optimizer searches across, leading to more robust convergence of the PINN. These hard constraints will be enforced by a customized “physics layer,” detailed in Sec. III B below.

As a means of improving the accuracy and robustness of the PINN, we will implement several properties of the solution as hard constraints via the addition of a physics layer to the NN. This additional layer will take the output of the hidden layers of the NN and impose a series of transforms that enforce (i) the terminal condition given by Eq. (9a), (ii) lower momentum boundary condition [Eq. (9b)], and (iii) constrains the RPF to have a range between zero and one. While all of these conditions can be enforced by adding terms into the loss function, the use of a physics layer will act to both simplify the form of the loss function, as well as restrict the range of solutions that the optimizer is allowed to search across. The form of the physics layer that we will introduce is given by
(16)
where PNN is the output of the hidden layers of the neural network, λ is a vector containing the physics parameters λ=(E,Zeff,α), and the terminal condition is defined by
(17)
which transitions from a large negative number to a large positive number as p crosses pRE, with Δp determining how sharp the transition is, and ΔP setting the maximum magnitude. The RPF is computed by passing Eq. (16) through
(18)
Here, it may be verified that this additional layer to the NN ensures the RPF has a range between zero and one [achieved via Eq. (18)] approximates the terminal condition of a Heaviside function centered about pRE at t=tfinal (for Δp1 and ΔP1), and obtains a vanishingly small value at the low momentum boundary when Δp/pmax1 and ΔP1. The physics layer contains the hyperparameters ΔP and Δp/pmax. Each of these parameters should be set to a value much less than one to closely adhere to the physics problem. In particular, ΔP must be chosen to be sufficiently small such that P has a negligible value at p=pmin. The parameter Δp must be small to ensure a sharp interface at p=pRE. However, too small a value of Δp has a negative impact on training, thus reducing the accuracy of the PINN's approximation to the RPF. Its value will, thus, be chosen by a compromise between adhering to the physics problem where Δp0, and the accuracy of the PINN, which for too small a value of Δp will struggle to resolve the initial transient evolution of the sharp interface (see Fig. 5). While the above-mentioned physics layer is not unique, it has the advantage of only introducing two hyperparameters with clear tradeoffs in terms of physics fidelity and ease of training. For all the studies shown below, we have chosen ΔP=0.15 and Δp=0.1pmax.
With the physics layer defined by Eqs. (16)–(18), the loss function will contain a term proportional to the mean square-error of the residual of the adjoint to the relativistic Fokker–Planck equation, along with a term penalizing deviations from the upper momentum space boundary condition defined by Eq. (9c). The weighting of the different energy and parameter regions will have a strong impact on the performance of the PINN. In this study, we will consider a loss function of the following form:
(19)
where
(20)
Here, Nbdy is the number of points on the upper momentum boundary that will be sampled, and R is the residual of Eq. (5a). The factor pi2/(1+pi2) in front of the residual is designed to partially cancel the 1/p3 divergence in the pitch-angle scattering operator, with the ppmin factor in Eq. (16) ensuring the loss doesn't diverge for small p. In addition we have multiplied the residual of the PDE by the function G(p). This function vanishes at p=pmax and increases to one for p<pmax over a momentum scale defined by Δpmax, where Δpmaxpmax. The motivation for introducing this factor is that for E slightly greater than one, the high momentum boundary condition is P=1 at ξ=1. However, for E only slightly greater than one, P will quickly drop to value far below one, particularly in the presence of synchrotron radiation and large Zeff. This sharp variation often leads to a large residual localized near ppmax and ξ1. Even when localized to a very small region of momentum space, this large residual has a negative impact on training, particularly when residual based adaptive sampling techniques are used. The factor G(p) will, thus, reduce the weight placed on the high momentum region, and thus, help the PINN obtain relatively uniform accuracy across the majority of momentum space. Noting that the purpose of this parameter is to prevent the optimizer from placing too much weight on points in the immediate vicinity of the upper momentum boundary, an appropriate value for this parameter may be identified independently from the hyperparameters ΔP and Δp defined in Eq. (17) above. We note that while enforcing the upper momentum boundary condition is essential when evaluating the steady state RPF in order to remove the trivial solution P=0,19 for the time dependent case treated here the presence of a terminal condition removes the trivial solution. Hence, we do not anticipate the high energy boundary condition to strongly impact the solution throughout the majority of momentum space. For all cases, in this paper, we will take Δpmax=0.05pmax, such that only the region very close to pmax is impacted. Finally, the factor wPDE determines the relative weight of the PDE vs the boundary condition. For all cases, we will take wPDE=10, to emphasize the PDE term in the loss function. The Python script used for training the PINN is written using the DeepXDE library39 with TensorFlow40 as the backend and will be made available upon acceptance at https://github.com/cmcdevitt2/RunAwayPINNs.

As a specific application of the above-mentioned framework, we will treat the decay of an initial primary RE distribution. Here, an initial population of REs will be initialized with a given momentum space distribution and the adjoint-deep learning framework described in Secs. II and III will be used to project the RE density forward in time. We will begin by describing representative solutions to the adjoint of the relativistic Fokker–Planck equation (Sec. IV A) and subsequently describe the decay of a primary RE population for a range of initial conditions (Sec. IV B). Finally, the predictions of the adjoint-deep learning framework will be compared with a traditional RE solver in Sec. IV C.

Using the NN architecture and loss function described in Sec. III B above, example solutions to the adjoint problem are shown in Figs. 5 and 7 with the test and training loss history shown in Fig. 4. Here, we have trained the PINN for electric fields in the range E(0,3), effective charges Zeff(1,2), and synchrotron radiation α(0,0.1). A much broader range of physics parameters will be considered in Ref. 20. We have also chosen a domain with a minimum energy of 10keV, a maximum energy of 5MeV, and tfinal=20. The energy defining which electrons are considered to be REs is taken to be pRE=pmax/4, which corresponds to an energy of roughly 1 MeV. The loss history of the PINN is shown in Fig. 4. Here, the first 15 000 epochs are performed using the ADAM optimizer, while L-BFGS is used thereafter. After periods of 50 000 epochs of training with L-BFGS, the training point distribution is resampled using the residual based adaptive distribution (RAD) method described in Ref. 41. Such a sampling routine redistributes roughly half of the training points using a uniform random distribution, with the remaining points selected with a probability proportional to the magnitude of the residual. In this way, the entire training domain is sampled, but with regions that have relatively large residuals receiving more training points [the final training point distribution is shown in Fig. 4(b)]. This training point resampling results in spikes in the training loss every 50 000 epochs as evident from Fig. 7(a). Despite these spikes in the training loss of the PDE, the test loss of the PDE monotonically decays to a value of 107, with the boundary loss term dropping to a lower value, suggesting that the PDE is well satisfied across the range of parameters under consideration.

FIG. 4.

(a) Training and test loss history. The solid blue curve indicates the training loss of the PDE, the solid red curve indicates the training loss for the boundary condition, and the “×” markers indicate the test loss (the test and training losses are the same for the boundary term). 1 000 000 training and testing points are used. The training points initially obey a Hammersley sequence, with the test distribution being uniform random. The training points are resampled using the residual based adaptive distribution described in Ref. 41 every 50 000 epochs of L-BFGS. (b) Projection of the final training point distribution onto the energy and pitch domain.

FIG. 4.

(a) Training and test loss history. The solid blue curve indicates the training loss of the PDE, the solid red curve indicates the training loss for the boundary condition, and the “×” markers indicate the test loss (the test and training losses are the same for the boundary term). 1 000 000 training and testing points are used. The training points initially obey a Hammersley sequence, with the test distribution being uniform random. The training points are resampled using the residual based adaptive distribution described in Ref. 41 every 50 000 epochs of L-BFGS. (b) Projection of the final training point distribution onto the energy and pitch domain.

Close modal

Two example solutions are shown in Figs. 5 and 7 for electric fields below and above the avalanche threshold (the avalanche threshold is E1.8 for these parameters). Considering the subthreshold case first (E=1.5, Fig. 5), the P=0.5 contour (black contour in the top row of Fig. 5) shifts to higher momentum and smaller pitch when moving from the t=20 to t=0 time slice. Physically, this implies that an electron requires an energy near 5 MeV and a pitch near ξ=1 at t=0 to have a better than fifty percent chance of remaining in the RE region or being accelerated through the high momentum boundary, by t=tfinal. This is due to the combination of drag and synchrotron losses exceeding electric field acceleration throughout the majority of momentum space, with only a narrow channel where Up>0 (below the dashed black contour in the bottom row of Fig. 5). Even inside this narrow channel, the presence of pitch-angle scattering will result in electrons initially inside the Up>0 region being scattered to a larger pitch-angle where Up<0, such that they are slowed down42–45 resulting in the RPF taking on a small value across nearly all of momentum space. More insight into the accuracy of the solution in regions where P1 can be gained by considering the log10 of both the RPF and the residual (see Fig. 6). Here, it is evident that the magnitude of the residual of the RPF is small throughout the momentum space, with the magnitude of the residual normalized to the RPF [Fig. 6(c)] being small across regions with significant values of the RPF. We note that at low energies this ratio can exceed one, however, the RPF has already decayed to a value of the order of 105 at these energies, such that the contribution to the RE density will nearly always be negligible from this region.

FIG. 5.

Panels (a)–(c) indicate time slices of the RPF evolution where panels (d)–(f) indicate the associated residual for an electric field of E=1.5 and pRE=pmax/4. The bottom row indicates the residual to Eq. (5a) multiplied by G(p)p2/(1+p2), where G(p) is defined by Eq. (20), such that the low energy divergence is removed, along with the potentially large residual near p=pmax. The first column indicates the terminal condition t=20, the second column t=10 and the last column t=0. The other parameters are given by Zeff=2 and α=0.1. The solid black contour in the top row indicates P=0.5, whereas the dashed black curve in the bottom row indicates the location of the Up=0 contour.

FIG. 5.

Panels (a)–(c) indicate time slices of the RPF evolution where panels (d)–(f) indicate the associated residual for an electric field of E=1.5 and pRE=pmax/4. The bottom row indicates the residual to Eq. (5a) multiplied by G(p)p2/(1+p2), where G(p) is defined by Eq. (20), such that the low energy divergence is removed, along with the potentially large residual near p=pmax. The first column indicates the terminal condition t=20, the second column t=10 and the last column t=0. The other parameters are given by Zeff=2 and α=0.1. The solid black contour in the top row indicates P=0.5, whereas the dashed black curve in the bottom row indicates the location of the Up=0 contour.

Close modal
FIG. 6.

(a) The RPF shown in Fig. 5(c) plotted on a log10 scale. (b) The magnitude of the residual shown in Fig. 5(f) on a log10 scale. (c) Magnitude of the ratio of the residual and RPF, with the result plotted on a log10 scale.

FIG. 6.

(a) The RPF shown in Fig. 5(c) plotted on a log10 scale. (b) The magnitude of the residual shown in Fig. 5(f) on a log10 scale. (c) Magnitude of the ratio of the residual and RPF, with the result plotted on a log10 scale.

Close modal

While the magnitude of the residual in Fig. 5 is small for the three time slices shown, suggesting an accurate solution, the final time slice [ t=0, Fig. 7(c)] suggests a limitation of the PINN approach employed. In particular, the high momentum boundary condition employed is P(p=pmax)=1 when Up>0 [see Eq. (9c)]. This is not satisfied at t=0. Specifically, as evident from Fig. 5(f), a narrow region near ξ=1 is present where Up>0. However, the RPF is not unity at the high momentum boundary in this narrow channel. The reason for this inconsistency is due to pitch-angle scattering quickly diffusing electrons out of the Up>0 channel where they are subsequently slowed down. Thus, the magnitude of the RPF will quickly drop below unity for p<pmax even when Up>0. Since the present PINN implements the high momentum boundary condition as a soft constraint, deviation from the prescribed boundary condition in this narrow boundary region only leads to a modest penalty in the boundary loss term. We also note that the region where the RPF deviates from the boundary condition (p=pmax, ξ1 and t=tfinal) represents a corner in the training regime, it is, thus, likely that it is not directly sampled by the random distribution of training points. While this is indicative of a slight inconsistency in the PINN solution, we do not expect it to strongly impact predictions of the RE density since the PINN correctly captures the value of the RPF throughout the majority of the simulation domain, with only a small inconsistency present in the corner of the simulation domain when near marginality. This will be verified in Sec. IV B, where predictions of the PINN are compared with solutions from a traditional RE solver (see Fig. 13).

FIG. 7.

Same as Fig. 5 but with E=2. Panels (a)–(c) indicate time slices of the RPF evolution where panels (d)–(f) indicate the associated residual.

FIG. 7.

Same as Fig. 5 but with E=2. Panels (a)–(c) indicate time slices of the RPF evolution where panels (d)–(f) indicate the associated residual.

Close modal

Considering now a case above threshold (E=2, Fig. 7), the channel with Up>0 is substantially broadened, facilitating the acceleration of electrons. In particular, the location of the P=0.5 contour near ξ=1 shifts to lower momenta as time evolves, indicating more electrons will be accelerated by the electric field. Since the present case is only modestly above marginality, pitch-angle scattering introduces a substantial broadening of the transition region. In addition, the RPF reaches a near steady state condition after ten τc, such that there is only a modest difference between the RPF at t=10 compared to t=20.

1. Time histories of RE density for different initial conditions

In this section, we will utilize the parametric solution to the time dependent RPF described in Sec. IV A to evaluate the time history of the RE density nRE using Eq. (14). Taking the initial RE population to be of the following form:
(21)
the time evolution of the electron density for three different initial electron distributions is shown in Fig. 8. Here, for Fig. 8(a), the initial electron distribution has a pitch centered around ξ=1 (ξ0=1 and Δξ=0.1), which results in the RE number density remaining approximately constant for E>Eav, and decaying otherwise (for these parameters Eav1.8). In contrast, for a nearly isotropic distribution [Fig. 8(b)], the number of REs initially decays for all electric field values, but then levels out later in time for cases with E>Eav. Finally, for electrons initially with ξ1 [Fig. 8(c)], the number of REs decays rapidly for all electric field values, with the cases with strong electric fields decaying the fastest. This initial decay is due to the electric field slowing the electron population down together with drag and radiation. Later in time, cases with E>Eav again show a sharp increase in number and then level off, though the number of REs does not recover to its initial value. This latter property is due to electrons initially slowing to an energy below pRE, but then being turned by a combination of the electric field and pitch-angle scattering, where they are subsequently reaccelerated along ξ1. Some electrons are lost to the low energy boundary, hence the magnitude of the RE population is reduced after the early transient phase.
FIG. 8.

Runaway electron density evolution for a range of electric fields and for the initial distribution defined by Eq. (21) for (Δξ=0.1,ξ0=1) [panel (a)], (Δξ=10,ξ0=0) [panel (b)] and (Δξ=0.1,ξ0=1) [panel (c)]. Panels (d)–(f) indicate the corresponding initial momentum space distribution. Each case took (Δp=0.1pmax,p0=0.75pmax). The electric fields selected correspond to E=1.5 (red curves), E=2 (cyan curves), and E=2.5 (blue curves). The other parameters are given by Zeff=2 and α=0.1. The bottom row indicates the initial electron distributions.

FIG. 8.

Runaway electron density evolution for a range of electric fields and for the initial distribution defined by Eq. (21) for (Δξ=0.1,ξ0=1) [panel (a)], (Δξ=10,ξ0=0) [panel (b)] and (Δξ=0.1,ξ0=1) [panel (c)]. Panels (d)–(f) indicate the corresponding initial momentum space distribution. Each case took (Δp=0.1pmax,p0=0.75pmax). The electric fields selected correspond to E=1.5 (red curves), E=2 (cyan curves), and E=2.5 (blue curves). The other parameters are given by Zeff=2 and α=0.1. The bottom row indicates the initial electron distributions.

Close modal

2. Runaway electron decay rate

The time evolution of the RE density can be used to estimate the decay rate of a primary electron population when below threshold. Using a RE decay rate to characterize RE evolution comes with two significant caveats. The first is that while the RE density exhibits an exponential decay when slightly below threshold in the long time limit,45 its temporal evolution cannot be described by an exponential when the system is well below threshold. As a concrete example, for an electric field E<1, the entire population of electrons will slow down due to collisional drag exceeding electric field acceleration at all energies and pitches. This will result in the primary population of electrons slowing down to the bulk in a finite amount of time, a process that cannot be described by a decaying exponential. This physics is illustrated in Fig. 9(a), where the temporal evolution of the RE density for electric fields in the vicinity of E=1 for Zeff=1 and α=0 is shown. It is evident that for E<1 (green curve) an exponential curve cannot be fit to the RE density, as evidenced by the poor fit between the dashed and solid green curves. Furthermore, when at threshold (E=1, red curves) the exponential fit curve only approximates the latter portion of the time history of the RE density. Hence, a second caveat is that it often takes a substantial amount of time, tens to hundreds of τc,45 for the REs to settle into a distribution with a clear exponential decay rate. Further noting that for a plasma with ne=1020m3 and a Coulomb logarithm of lnΛ=15, the relativistic collision time is given by τc22ms, implying that the RE density often will not achieve an exponential decay for tens to hundreds of milliseconds even for E1. Considering a case with Zeff=2 and α=0.1 [Fig. 9(b)], and, thus, a larger avalanche threshold of Eav1.8, a good exponential fit is not present for E=1 over twenty τc, with only a marginally better fit present for E=1.5.

FIG. 9.

The time evolution of the RE density for different electric fields. Panel (a) is for Zeff=1 and α=0 where as panel (b) is for Zeff=2 and α=0.1. The solid curves are predictions from the adjoint-deep learning framework, whereas the dashed curves are exponential fits. The last five τc were used in the exponential fits. The other parameters are given by p0=0.75pmax, Δp=0.1pmax, ξ0=1, and Δξ=0.1.

FIG. 9.

The time evolution of the RE density for different electric fields. Panel (a) is for Zeff=1 and α=0 where as panel (b) is for Zeff=2 and α=0.1. The solid curves are predictions from the adjoint-deep learning framework, whereas the dashed curves are exponential fits. The last five τc were used in the exponential fits. The other parameters are given by p0=0.75pmax, Δp=0.1pmax, ξ0=1, and Δξ=0.1.

Close modal

The time required for the RE distribution to settle into a well defined exponential decay depends sensitively on the initial momentum space distribution. Specifically, while the inferred RE decay rate should be independent of the initial momentum space distribution, assuming the initial electrons are at sufficiently high energy, there will inevitably be a transient period while the RE momentum space distribution settles into its saturated form. To assess this uncertainty, a scan of the initial momentum and pitch of the primary electron distributions is shown in Figs. 10(a) and 10(b), respectively. Here, it is evident that the early time evolution of the RE population varies substantially depending on both the initial momentum and pitch of the primary distribution, with resulting fitted exponentially implying a similar, though not identical, decay rate. The modest deviation in inferred decay rates is due to the finite time of the simulations, such that for initial conditions far from those that characterize the decaying primary distribution, twenty τc is not sufficient for the REs to settle into their saturated decay rate. A strength of the present approach is that by predicting the specific time history of the RE density, rather than only the decay rate, the accuracy of the exponential fit can be readily verified, and the specific time history can be used if required.

FIG. 10.

The time evolution of the RE density for initial momentum space distributions centered about different momentum [panel (a)] and pitch [panel (b)]. For panel (a), the blue curve is for p0=0.4pmax, the cyan curve is for p0=0.5pmax, the red curve is for p0=0.6pmax, the green curve is for p0=0.7pmax and the magenta curve is for p0=0.8pmax. The pitch was taken to be ξ0=0.5 for all cases. For panel (b), the blue curve is for ξ0=0.2, the cyan curve is for ξ0=0.4, the red curve is for ξ0=0.6, the green curve is for ξ0=0.8 and the magenta curve is for ξ0=1. The momentum was taken to be p0=0.5pmax for all cases. The other parameters are given by E=1.5, Zeff=2 and α=0.1, with the width of the momentum and pitch distributions given by Δp=0.1pmax and Δξ=0.1, respectively.

FIG. 10.

The time evolution of the RE density for initial momentum space distributions centered about different momentum [panel (a)] and pitch [panel (b)]. For panel (a), the blue curve is for p0=0.4pmax, the cyan curve is for p0=0.5pmax, the red curve is for p0=0.6pmax, the green curve is for p0=0.7pmax and the magenta curve is for p0=0.8pmax. The pitch was taken to be ξ0=0.5 for all cases. For panel (b), the blue curve is for ξ0=0.2, the cyan curve is for ξ0=0.4, the red curve is for ξ0=0.6, the green curve is for ξ0=0.8 and the magenta curve is for ξ0=1. The momentum was taken to be p0=0.5pmax for all cases. The other parameters are given by E=1.5, Zeff=2 and α=0.1, with the width of the momentum and pitch distributions given by Δp=0.1pmax and Δξ=0.1, respectively.

Close modal
While an exponential decay rate is an imperfect measure of the RE density evolution, it does provide a convenient means of estimating the decay rate of a RE beam when below threshold. A plot of the estimated decay rate of the primary electron distribution for a range of electric fields, effective charges and strength of synchrotron radiation is shown in Fig. 11. Here, time histories of the RE density were generated for twenty τc, where the last five τc were used to estimate an effective decay rate. The decay rate nearly vanishes for electric fields greater than E2, due to the system being above the avalanche threshold for those cases. For smaller electric fields, the magnitude of the decay rate increases rapidly, illustrating a strong nonlinear dependence on electric field strength. As a means of identifying the validity of the exponential fits, we have also plotted the Mean Absolute Percentage Error (MAPE) defined by
where nRE,i are the values of RE density predicted by the adjoint-deep learning framework, nfit,i indicates the exponential fit, and N is the number of points used to fit the exponential decay rate. Values of the MAPE are shown in the second row of Fig. 11 as a function of electric field strength. It is evident that for electric fields substantially above the avalanche threshold, the MAPE reaches an exceptionally small value. This is due to the number density of REs being nearly constant in this region which is well fit by the negligibly small decay rates inferred. However, as the electric field is reduced below threshold, the magnitude of the associated MAPE increases rapidly. While the percent error never reaches a large percentage (the maximum MAPE is less than 2%) this is due to the relatively short time increment used to fit data such that deviations between the data and fit are never too large. For reference, a case with an MAPE of 0.75 and parameters E=1,Zeff=2,α=0.1, is shown by the green curve in Fig. 9(b), where a rather poor fit is evident. We, thus, anticipate that the inferred exponential decay rates will be most accurate when only weakly below threshold. For cases with significant MAPE values, large uncertainties are expected due to a decaying exponential providing a poor fit to the RE density over the twenty τc timescale employed.
FIG. 11.

Runaway electron decay rate vs electric field E for different effective charges Zeff [panel (a)] and synchrotron radiation α [panel (b)]. Panel (a) took α=0.1, whereas panel (b) took Zeff=2. The solid curves indicate predictions from the PINN, whereas the blue “×” markers are benchmark values taken from Ref. 45. Panels (c) and (d) indicate the MAPE for the exponential functions used to estimate the RE decay rates in the first row. The initial momentum space distribution assumed p0=0.5pmax, Δp=0.1pmax, ξ0=1, and Δξ=0.1, respectively. The black dashed curve in panel (a) is Eq. (22) for lnΛ=10, Zeff=2, and Eav=1.8, which is consistent with a value of α=0.1.

FIG. 11.

Runaway electron decay rate vs electric field E for different effective charges Zeff [panel (a)] and synchrotron radiation α [panel (b)]. Panel (a) took α=0.1, whereas panel (b) took Zeff=2. The solid curves indicate predictions from the PINN, whereas the blue “×” markers are benchmark values taken from Ref. 45. Panels (c) and (d) indicate the MAPE for the exponential functions used to estimate the RE decay rates in the first row. The initial momentum space distribution assumed p0=0.5pmax, Δp=0.1pmax, ξ0=1, and Δξ=0.1, respectively. The black dashed curve in panel (a) is Eq. (22) for lnΛ=10, Zeff=2, and Eav=1.8, which is consistent with a value of α=0.1.

Close modal

Insight into the accuracy of the exponential decay rate near threshold can be gained by comparing with RE decay rates inferred in Ref. 45. Here, the RE solver RAMc46 was used to evolve the primary RE distribution for 250τc, leading to well converged exponential decay rates. A comparison between the PINN and results from Ref. 45 are shown by the blue “×” markers in Figs. 11(a) and 12. The solid blue curve in both figures indicates results from a PINN trained with tfinal=20τc, whereas the dashed blue curve in Fig. 12 being an analogous PINN, but trained over a time period of tfinal=40τc. Twice as many training points were used in this latter case (2 000 000) resulting in an increase in offline training time. It is evident that both results are in good agreement with the benchmark results, though the PINN with the larger tfinal exhibits modestly improved agreement.

FIG. 12.

Comparison of decay rates predicted by a model with tfinal=20 (solid blue curve) and tfinal=40 (dashed blue curve) with benchmark results from Ref. 45 (“×” markers). The parameters were taken to be Zeff=1, α=0.1 and p0=0.5pmax, Δp=0.1pmax, ξ0=1, and Δξ=0.1.

FIG. 12.

Comparison of decay rates predicted by a model with tfinal=20 (solid blue curve) and tfinal=40 (dashed blue curve) with benchmark results from Ref. 45 (“×” markers). The parameters were taken to be Zeff=1, α=0.1 and p0=0.5pmax, Δp=0.1pmax, ξ0=1, and Δξ=0.1.

Close modal
The inferred RE decay rates differ substantially from commonly employed analytic expressions. Taking the well known approximate analytic expression for RE growth and decay in the completely screened limit defined by Rosenbluth and Putvinski,17 
(22)
as an example, this expression is shown by the dashed black curve in Fig. 11(a). Here, we have modified Eq. (22) to include a non-unity avalanche threshold, to account for effects of synchrotron radiation. The dashed curve was plotted for a Coulomb logarithm of lnΛ=10, Zeff=2, and we chose Eav=1.8, consistent with α=0.1. From the dashed black curve Fig. 11, it is evident that this expression gives a decay rate that drastically underestimates the magnitude of RE decay compared to the prediction of the adjoint-deep learning framework, even when the avalanche threshold is adjusted. We note that while the precise magnitude of the strongly decaying cases is subject to substantial uncertainties, as described above, the weak exponential decay rates predicted by Eq. (22) deviate strongly from the benchmark RE decay rates (“×” markers). Furthermore, while the present adjoint-deep learning framework does not include large-angle collisions, Paper II20 demonstrates that the decay rate including large-angle collisions is well approximated by the decay rate of the primary population. We, thus, anticipate simple analytic expressions such as Eq. (22) to substantially underestimate the decay of REs when below threshold.

As a means of verifying predictions of the adjoint-deep learning framework we will compute several time histories of the RE density using the Monte Carlo code RAMc46 for several different initial momentum space distributions and physical parameters. While a comparison of inferred RE decay rates could readily be performed, due to the ambiguities that often emerge when inferring the RE decay rate from the RE time history (as described in Sec. IV B 2), a more stringent test will be to compare detailed time histories of the RE density. A comparison of several temporal histories of RE density with the Monte Carlo code RAMc46 is illustrated in Fig. 13. Here, we have used the initial momentum space distributions illustrated by the second row of Fig. 13, and the values of α and Zeff are the same as the RPFs shown in Fig. 5 above. The form of these momentum space distributions were chosen to be consistent with the particle initialization scheme used in RAMc. Good agreement is evident for the three electric fields and three initial conditions considered. For the practically important case of an initial RE distribution approximately aligned with the magnetic field (ξ=1) the adjoint-deep learning predictions are in excellent agreement with the RAMc code. For cases with electrons further from ξ=1, progressively larger deviations are present, though good overall agreement is evident with the largest deviation appearing for the lowest electric field with electrons initialized with ξ=1 [see Fig. 13(c) (red curves)]. The origin of this deviation is due to the limited accuracy of the PINN's prediction of the RPF in regions where the RPF is nearly vanishing. In particular, the small number of REs evident in the solid red curve in Fig. 13(c) results from integrating the product of the RPF shown in Fig. 5(c) with the initial momentum space distribution shown in Fig. 13(f). Since the RPF is nearly zero in the region where the initial electron distribution is finite (i.e., at ξ1 and E3.5MeV), this, thus, requires the RPF to be computed very accurately to resolve this near zero value. As evident from Fig. 13(c), the PINN is able to accurately track the initial density decay, but exhibits a quantitative discrepancy once only a trace RE population remains.

FIG. 13.

Panels (a)–(c) show a comparison of the PINN's prediction of the RE density evolution (solid curves) with the Monte Carlo code RAMc (dashed curves). Panels (d)–(f) indicate the corresponding initial primary distribution. The electric fields selected correspond to E=1.5 (red curves), E=2 (cyan curves), and E=2.5 (blue curves). The other parameters are given by Zeff=2 and α=0.1. The bottom row indicates the initial electron distributions.

FIG. 13.

Panels (a)–(c) show a comparison of the PINN's prediction of the RE density evolution (solid curves) with the Monte Carlo code RAMc (dashed curves). Panels (d)–(f) indicate the corresponding initial primary distribution. The electric fields selected correspond to E=1.5 (red curves), E=2 (cyan curves), and E=2.5 (blue curves). The other parameters are given by Zeff=2 and α=0.1. The bottom row indicates the initial electron distributions.

Close modal

An adjoint-deep learning framework for evaluating the time evolution of the density moment of the RE distribution was derived. By solving the adjoint to the relativistic Fokker–Planck equation, this allows the RE density to be evolved from an arbitrary initial momentum space distribution for a given set of parameters. By utilizing a PINN to identify the parametric solution to the adjoint problem, this further generalizes the framework to treat a range of plasma conditions. While this framework only evolves the RE density moment, by solving the adjoint to the relativistic Fokker–Planck equation, this enables a fully kinetic treatment of relativistic electron dynamics. As an initial application, the evolution of a primary electron population was evolved from three distinct initial pitch distributions, for three different values of the electric field, yielding good agreement with a traditional RE solver. While the offline training time was substantially longer than a traditional relativistic Fokker–Planck solver, the online prediction time is substantially less, typically only requiring tens to hundreds of milliseconds. Furthermore, the decay rate of primary electrons was estimated as a function of electric field strength, effective charge, and the strength of synchrotron radiation. The estimated decay rate was found to vary nonlinearly with the electric field strength, in sharp contrast to commonly employed analytic theories17 that yield a linear dependence on electric field strength. This stiff electric field dependence implies that the electric field will need to remain near threshold during the RE plateau,47,48 to sustain the RE population. While substantial ambiguities in the use of a RE decay rate for characterizing RE evolution when below threshold were identified, by predicting specific time histories of the RE density, the present approach allows for such limitations to be quantified. In particular, aside from predicting the RE decay rate, the prediction of the RE time history enables the quality of the exponential fit to be readily evaluated, where the residual of the PDE provides further insight into the accuracy of the solution.

Despite these encouraging initial results, several limitations of the adjoint-deep learning framework are apparent. One limitation in the present approach is the use of collisional coefficients that are only valid in the completely screened limit. Weakly ionized impurities are typically present in the plasma during a tokamak disruption and modify the collisional processes between electrons.49 While accounting for these partial screening effects will require modifying the PINN, we anticipate a straightforward extension of the existing PINN and is ongoing work. A second limitation, evident from Fig. 10(c), is the modest accuracy that PINNs are achievable when solving a PDE in comparison to traditional Fokker–Planck solvers.5,6,8,44 In particular, for the case shown in Fig. 13(c) the initial RE population has decayed by several orders of magnitude, such that the predicted number of REs is determined by a region of momentum space where the RPF is nearly zero. Since such regions where P0 yield small contributions to the loss function defined by Eq. (15), the PINN will not be able to accurately resolve the RPF in these regions without special care. A third limitation is that the range of parameters that the PINN was trained across in this paper is fairly narrow [ E(0,3), Zeff(1,2), α(0,0.1)], and thus, will not be able to describe a broad range of tokamak conditions. This narrow training regime was selected to provide a conceptually clear treatment of the adjoint-deep learning framework, while avoiding the technical complexities that emerge when training across a broad range of parameters. This limitation is addressed in Paper II,20 where an adaptive energy and time domain are introduced that enable a far broader parameter regime to be considered while using a comparable number of training points. In addition, while the present approach allows for a broad range of plasma parameters to be considered, it does not allow for a general time dependent variation of those parameters between t=0 and t=tfinal. As an initial application of the present results we anticipate using the saturated decay rate of the RE population for a given set of parameters as an estimate of RE decay in an integrated description of a disruption. Finally, an accurate description of RE evolution when above threshold requires the inclusion of large-angle collisions in order to describe the exponential growth of the RE population. This limitation is addressed in Paper II,20 where a Rosenbluth–Putvinski secondary source term is used to approximate large-angle collisions.

This work was supported by the Department of Energy, Office of Fusion Energy Sciences at the University of Florida under Award Nos. DE-SC0024649 and DE-SC0024634 and at Los Alamos National Laboratory (LANL) under Contract No. 89233218CNA000001. The authors acknowledge the University of Florida Research Computing for providing computational resources that have contributed to the research results reported in this publication. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a Department of Energy Office of Science User Facility using NERSC Award No. FES-ERCAP0028155.

The authors have no conflicts to disclose.

Christopher J. McDevitt: Conceptualization (lead); Data curation (equal); Formal analysis (lead); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (equal). Jonathan S. Arnaud: Conceptualization (equal); Data curation (lead); Software (equal); Visualization (equal); Writing – review & editing (equal). Xian-Zhu Tang: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Methodology (equal); Project administration (equal); Writing – review & editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
T. C.
Hender
,
J. C.
Wesley
,
J.
Bialek
,
A.
Bondeson
,
A. H.
Boozer
,
R. J.
Buttery
,
A.
Garofalo
,
T. P.
Goodman
,
R. S.
Granetz
,
Y.
Gribov
et al,
Nucl. Fusion
47
,
S128
(
2007
).
2.
B. N.
Breizman
,
P.
Aleynikov
,
E. M.
Hollmann
, and
M.
Lehnen
,
Nucl. Fusion
59
,
083001
(
2019
).
3.
P. C.
De Vries
,
Y.
Lee
,
Y.
Gribov
,
A.
Mineev
,
Y.
Na
,
R.
Granetz
,
B.
Stein-Lubrano
,
C.
Reux
,
P.
Moreau
,
V.
Kiptily
et al,
Nucl. Fusion
63
,
086016
(
2023
).
4.
H.
Knoepfel
and
D. A.
Spong
,
Nucl. Fusion
19
,
785
(
1979
).
5.
R. W.
Harvey
,
V. S.
Chan
,
S. C.
Chiu
,
T. E.
Evans
,
M. N.
Rosenbluth
, and
D. G.
Whyte
,
Phys. Plasmas
7
,
4590
(
2000
).
6.
E.
Nilsson
,
J.
Decker
,
Y.
Peysson
,
R. S.
Granetz
,
F.
Saint-Laurent
, and
M.
Vlainic
,
Plasma Phys. Controlled Fusion
57
,
095006
(
2015
).
7.
C. J.
McDevitt
,
Z.
Guo
, and
X.-Z.
Tang
,
Plasma Phys. Controlled Fusion
61
,
054008
(
2019
).
8.
M.
Hoppe
,
O.
Embreus
, and
T.
Fülöp
,
Comput. Phys. Commun.
268
,
108098
(
2021
).
9.
M.
Beidler
,
D.
del Castillo-Negrete
,
D.
Shiraki
,
L. R.
Baylor
,
E. M.
Hollmann
, and
C.
Lasnier
,
Nucl. Fusion
64
,
076038
(
2024
).
10.
V.
Bandaru
,
M.
Hoelzl
,
C.
Reux
,
O.
Ficker
,
S.
Silburn
,
M.
Lehnen
,
N.
Eidietis
,
JOREK Team
, and
JET Contributors
,
Plasma Phys. Controlled Fusion
63
,
035024
(
2021
).
11.
C.
Liu
,
C.
Zhao
,
S. C.
Jardin
,
N. M.
Ferraro
,
C.
Paz-Soldan
,
Y.
Liu
, and
B. C.
Lyons
,
Plasma Phys. Controlled Fusion
63
,
125031
(
2021
).
12.
A. P.
Sainterme
and
C. R.
Sovinec
,
Phys. Plasmas
31
,
010701
(
2024
).
13.
H. M.
Smith
and
E.
Verwichte
,
Phys. Plasmas
15
,
072502
(
2008
).
14.
C. J.
McDevitt
,
Phys. Plasmas
30
,
092501
(
2023
).
15.
M.
Yang
,
P.
Wang
,
D.
del Castillo-Negrete
,
Y.
Cao
, and
G.
Zhang
,
SIAM J. Sci. Comput.
46
,
C508
(
2024
).
16.
I.
Sokolov
,
JETP Lett.
29
,
218
(
1979
).
17.
M. N.
Rosenbluth
and
S. V.
Putvinski
,
Nucl. Fusion
37
,
1355
(
1997
).
18.
L.
Hesslow
,
O.
Embréus
,
O.
Vallhagen
, and
T.
Fülöp
,
Nucl. Fusion
59
,
084004
(
2019
).
19.
J. S.
Arnaud
,
T. B.
Mark
, and
C. J.
McDevitt
,
J. Plasma Phys.
90
,
905900409
(
2024
).
20.
C. J.
McDevitt
,
J. S.
Arnaud
, and
X.-Z.
Tang
, “
An efficient surrogate model of secondary electron formation and evolution
,”
Phys. Plasmas
(submitted) (
2024
).
21.
C. F.
Karney
and
N. J.
Fisch
,
Phys. Fluids
29
,
180
(
1986
).
22.
C.
Liu
,
D. P.
Brennan
,
A. H.
Boozer
, and
A.
Bhattacharjee
,
Plasma Phys. Controlled Fusion
59
,
024003
(
2017
).
23.
G.
Zhang
and
D.
del Castillo-Negrete
,
Phys. Plasmas
24
,
092511
(
2017
).
24.
M.
Raissi
,
P.
Perdikaris
, and
G. E.
Karniadakis
,
J. Comput. Phys.
378
,
686
(
2019
).
25.
G. E.
Karniadakis
,
I. G.
Kevrekidis
,
L.
Lu
,
P.
Perdikaris
,
S.
Wang
, and
L.
Yang
,
Nat. Rev. Phys.
3
,
422
(
2021
).
26.
S.
Cai
,
Z.
Mao
,
Z.
Wang
,
M.
Yin
, and
G. E.
Karniadakis
,
Acta Mech. Sin.
37
,
1727
(
2021
).
27.
L.
Sun
,
H.
Gao
,
S.
Pan
, and
J.-X.
Wang
,
Comput. Methods Appl. Mech. Eng.
361
,
112732
(
2020
).
28.
C. J.
McDevitt
and
X.-Z.
Tang
,
Phys. Plasmas
31
,
062701
(
2024
).
29.
S. C.
Jardin
,
G. L.
Schmidt
,
E. D.
Fredrickson
,
K. W.
Hill
,
J.
Hyun
,
B. J.
Merrill
, and
R.
Sayer
,
Nucl. Fusion
40
,
923
(
2000
).
30.
T. M.
Antonsen
, Jr.
and
K. R.
Chu
,
Phys. Fluids
25
,
1295
(
1982
).
31.
M.
Taguchi
,
J. Phys. Soc. Jpn.
52
,
2035
(
1983
).
32.
N. J.
Fisch
,
Phys. Fluids
29
,
172
(
1986
).
33.
C.
Liu
,
D. P.
Brennan
,
A.
Bhattacharjee
, and
A. H.
Boozer
,
Phys. Plasmas
23
,
010702
(
2016
).
34.
J. W.
Connor
and
R. J.
Hastie
,
Nucl. Fusion
15
,
415
(
1975
).
35.
I. E.
Lagaris
,
A.
Likas
, and
D. I.
Fotiadis
,
IEEE Trans. Neural Networks
9
,
987
(
1998
).
36.
A.
Karpatne
,
G.
Atluri
,
J. H.
Faghmous
,
M.
Steinbach
,
A.
Banerjee
,
A.
Ganguly
,
S.
Shekhar
,
N.
Samatova
, and
V.
Kumar
,
IEEE Trans. Knowl. Data Eng.
29
,
2318
(
2017
).
37.
B.
Lusch
,
J. N.
Kutz
, and
S. L.
Brunton
,
Nat. Commun.
9
,
4950
(
2018
).
38.
R.
Wang
,
K.
Kashinath
,
M.
Mustafa
,
A.
Albert
, and
R.
Yu
, in
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
(ACM,
2020
), pp.
1457
1466
.
39.
L.
Lu
,
X.
Meng
,
Z.
Mao
, and
G. E.
Karniadakis
,
SIAM Rev.
63
,
208
(
2021
).
40.
M.
Abadi
,
P.
Barham
,
J.
Chen
,
Z.
Chen
,
A.
Davis
,
J.
Dean
,
M.
Devin
,
S.
Ghemawat
,
G.
Irving
,
M.
Isard
et al, in
12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16)
(USENIX,
2016
), pp.
265
283
.
41.
C.
Wu
,
M.
Zhu
,
Q.
Tan
,
Y.
Kartha
, and
L.
Lu
,
Comput. Methods Appl. Mech. Eng.
403
,
115671
(
2023
).
42.
F.
Andersson
,
P.
Helander
, and
L.-G.
Eriksson
,
Phys. Plasmas
8
,
5221
(
2001
).
43.
J.
Decker
,
E.
Hirvijoki
,
O.
Embreus
,
Y.
Peysson
,
A.
Stahl
,
I.
Pusztai
, and
T.
Fülöp
,
Plasma Phys. Controlled Fusion
58
,
025016
(
2016
).
44.
Z.
Guo
,
C. J.
McDevitt
, and
X.-Z.
Tang
,
Plasma Phys. Controlled Fusion
59
,
044003
(
2017
).
45.
C. J.
McDevitt
,
Z.
Guo
, and
X.-Z.
Tang
,
Plasma Phys. Controlled Fusion
60
,
024004
(
2018
).
46.
C. J.
McDevitt
and
X.-Z.
Tang
,
Europhys. Lett.
127
,
45001
(
2019
).
47.
48.
C. J.
McDevitt
and
X.-Z.
Tang
,
Phys. Rev. E
108
,
L043201
(
2023
).
49.
L.
Hesslow
,
O.
Embréus
,
A.
Stahl
,
T. C.
DuBois
,
G.
Papp
,
S. L.
Newton
, and
T.
Fülöp
,
Phys. Rev. Lett.
118
,
255001
(
2017
).