We introduce EFIT-Prime, a novel machine learning surrogate model for EFIT (Equilibrium FIT) that integrates probabilistic and physics-informed methodologies to overcome typical limitations associated with deterministic and ad hoc neural network architectures. EFIT-Prime utilizes a neural architecture search-based deep ensemble for robust uncertainty quantification, providing scalable and efficient neural architectures that comprehensively quantify both data and model uncertainties. Physically informed by the Grad–Shafranov equation, EFIT-Prime applies a constraint on the current density Jtor and a smoothness constraint on the first derivative of the poloidal flux, ensuring physically plausible solutions. Furthermore, the spatial location of the diagnostics is explicitly incorporated in the inputs to account for their spatial correlation. Extensive evaluations demonstrate EFIT-Prime's accuracy and robustness across diverse scenarios, most notably showing good generalization on negative-triangularity discharges that were excluded from training. Timing studies indicate an ensemble inference time of 15 ms for predicting a new equilibrium, offering the possibility of plasma control in real-time, if the model is optimized for speed.

Reconstruction of magnetohydrodynamic (MHD) equilibria from a series of external and internal diagnostic measurements is a crucial aspect of tokamak research and operations worldwide. It provides essential information on the magnetic geometry, current, and pressure profiles that are necessary for tokamak data analysis and interpretation, plasma stability and control, and code and physics model validation. The EFIT code1,2 has been extensively used in many tokamaks around the world to reconstruct MHD equilibria based on experimental constraints derived from measurements. Due to its widespread application, there exist broad experimental equilibrium reconstruction data, some examples of which are DIII-D,3 EAST,4 JET,5 KSTAR,6 and NSTX.7 A more comprehensive list of references for experiments using EFIT can be found in Ref. 8.

EFIT has three general modes of operation corresponding to the applications: real-time operation for use in the plasma control system (PCS), between-shot analysis for experimental planning, and post-processing analysis for experimental interpretation and theory validation. The real-time mode, known as RT-EFIT, which uses either external magnetics data or external magnetics with motional stark effect (MSE) diagnostic data, is a reduced version of EFIT that does not completely numerically solve the equilibrium equation9 because of the time spent in inverting the Grad–Shafranov operator. For each use, the inputs used to constrain the equilibrium can vary. All the modes of operation use magnetic inputs. Between-shot analysis typically uses measurements based on the spectroscopic analysis of neutral beams and ionization. The spectroscopic analysis uses MSE to provide internal measurements of the magnetic field pitch angle for improved accuracy. Measurements of the species densities and temperatures, known as the kinetic profiles, can be used for additional constraints.

Magnetic EFIT solves this inverse problem by carrying out a least squares minimization of the difference between the external magnetic signals consisting of the diagnostic measurements and poloidal field coil currents and their synthetic counterparts that are computed by EFIT. The forward model of this inverse solution and what EFIT primarily solves is the Grad–Shafranov equation,10,11 given by
(1)
(2)
where ψ(R,Z) is the poloidal flux function, Jϕ(R,Z) is the toroidal current density of the plasma, both as functions of the poloidal coordinates (R,Z); and Δ* is the Shafranov operator. The toroidal current density on the right side of Eq. (1) is specified in terms of the gradients of two stream functions, P (plasma pressure) and F2 (toroidal flux function F=RBϕ), that are flux functions under the assumptions used to derive the equation. As magnetic EFIT is not internally constrained by measurements, there is no unique solution of these profiles, nor the resulting flux ψ(R,Z).

The reconstruction problem can be rewritten as yobs=g(d), where y is the outputs of EFIT that encompass fundamental quantities like ψ and derived quantities like internal inductance li and normalized beta βN, and d represents the input data and g() is the observed data generating function of EFIT. The input vector d consists of many different types of internal and external measurements. However, since our aim in this work is to build a surrogate for magnetic EFIT, only the external magnetic data are considered here, thus d=dmag. The goal of this surrogate is then to fit the equation ypred=f(dmag) with a neural network function f().

This work presents a reduced-order or surrogate model of the magnetics-only EFIT reconstructions using DIII-D data. There has been a multitude of recent work on reduced-order models based on artificial neural networks for equilibrium reconstruction in various experiments around the world. A table summarizing surrogates for EFIT specifically is shown in Table I. Outside EFIT, there have also been parallel efforts recently to build surrogate models for the variational moments equilibrium code (VMEC).20 In addition, our focus is developing a surrogate model for the EFIT inverse problem of predicting the poloidal flux, while works such as Ref. 21 build surrogate of the forward problem solved by Grad–Shafranov equation. One of the primary interests in a machine learning surrogate is to replace RT-EFIT because the PCS requires fast feedback and it is possible to have a surrogate that is more accurate than RT-EFIT given its limitations. The PCS does not necessarily need to have a complete representation of EFIT, and thus the interest in using derived quantities, such as the location of the last closed-flux surface (LCFS).

TABLE I.

Several papers looking at neural networks to create a surrogate model for EFIT. These papers differ not only in the techniques used to determine the neural net, but also in the inputs, outputs, and experimental data used. A drop-in replacement for EFIT would require outputting the poloidal flux at a minimum, but often derived quantities, including the last closed-flux surface (LCFS), are the most useful for a plasma control system (PCS). Our goal is to build upon prior work to produce a version that is faster, more accurate, and more robust than prior versions.

Paper Inputs Outputs Experiment
This paper  Magnetics data  ψ, Jϕ  DIII-D 
Joung12   Magnetics data  ψ,Δ*ψ  KSTAR 
Wai13   Magnetics data  ψ  NSTX-U 
Joung14   Magnetics data  ψ,p,FF  KSTAR 
Lu15   Magnetics data  ψ, LCFS  KSTAR 
Shousha16   Magnetics, MSE, TS  p,q,ne,Te,  DIII-D 
  CER, RT-EFIT  Ti,Vtor,Jtor   
Wan17,18  Magnetics data  IP,βN,βT,βP,li,ne,  EAST 
    Vloop,Wmhd,q0,q95,κ   
Wei19   Magnetics data  q(ψ)  DIII-D 
Paper Inputs Outputs Experiment
This paper  Magnetics data  ψ, Jϕ  DIII-D 
Joung12   Magnetics data  ψ,Δ*ψ  KSTAR 
Wai13   Magnetics data  ψ  NSTX-U 
Joung14   Magnetics data  ψ,p,FF  KSTAR 
Lu15   Magnetics data  ψ, LCFS  KSTAR 
Shousha16   Magnetics, MSE, TS  p,q,ne,Te,  DIII-D 
  CER, RT-EFIT  Ti,Vtor,Jtor   
Wan17,18  Magnetics data  IP,βN,βT,βP,li,ne,  EAST 
    Vloop,Wmhd,q0,q95,κ   
Wei19   Magnetics data  q(ψ)  DIII-D 

Prior surrogate models summarized in Table I often relied on low-resolution data, exhibited normalization inconsistencies that do not align with the physics, utilized deterministic frameworks that are not equipped to capture the underlying uncertainties, and employed ad hoc neural network architectures that are not optimized for the tasks. Addressing these issues and developing an accurate and robust model, we introduce EFIT-Prime, a novel machine learning surrogate model for EFIT that integrates probabilistic and physics-informed methodologies to overcome typical limitations associated with deterministic and ad hoc neural network architectures. The probabilistic approach that allows one to quantify the uncertainty in a model's prediction in addition to a point estimate typically seen with deterministic models. We adopt a deep ensembles-based uncertainty quantification approach, where a neural architecture search22,23 is used to obtain a diverse set of ensembles of models in a scalable way by utilizing leadership-class computing systems. Deep ensembles are currently the state-of-the-art for large neural network models, compared to variational inference and other probabilistic modeling approaches, closely performing to the gold standard of Markov Chain Monte Carlo Bayesian methods for larger models in both in-distribution and out-of-distribution predictions and generalization contexts, as highlighted in previous works.24 Deep ensembles allow for reliable quantification of both aleatory (irreducible) and epistemic (reducible) uncertainties. We also introduce physics constraints by using a multi-model neural architecture search, focusing on the magnetic flux (ψ) and the toroidal current density (Jϕ) given ψ in tandem. This hybrid architecture-penalty constraint approach concurrently learns Jϕ along with ψ enforces axisymmetry and Ampère's law.

An important property of accuracy and robustness in an machine learning (ML) model is that of generalizability. A model is said to be generalizable if it gives accurate predictions and uncertainties even in regimes that were not included in its training. As we move into the burning plasma era where we will have fewer diagnostics and less control over plasmas, this becomes increasingly important. One of the unique features of this work is our rigorous tests of generalizability by understanding the ability to predict an extreme plasma shape, negative-triangularity (NT) plasmas, even if they are not included in the training set.

The rest of this paper is organized as follows. The network architecture search (NAS) framework and overall methodology are described in Sec. II. The database creation and the aggregation of the NN input and target vectors are covered in Sec. III. The results of the final model EFIT-Prime are presented in Sec. IV. Variations (or ablation) of EFIT-Prime to study the impacts of the physics-informed components and other modeling strategies are presented in Sec. V, culminating in a rigorous test of generalizability using NT discharges from DIII-D in Sec. V D. A summary of the manuscript, additional discussion, and future work are presented in Sec. VI.

Most machine learning-based surrogate models are developed using standard neural architectures or ad hoc network choices. This includes similar work on EFIT-based neural nets,12,14,25 while versatilely, they may not be specifically tuned to a problem's unique aspects. They often miss out on quantifying prediction uncertainties, which is crucial for reliable decision-making and risk assessment. Here, we describe the neural architecture search (NAS)26 adopted in the work to optimize both architecture (e.g., recurrent neural network, convolutional neural network, multi-layer perceptron) and model hyperparameters (e.g., number of layers, weights) together.

Neural architecture search is formulated as a bi-level optimization: the outer level for architecture parameters and the inner level for model parameters, based on the chosen architecture. This approach ensures a thorough exploration of both parameters. As described in Sec. I, our goal is to train over a large dataset with >100000 equilibria. To meet the computational challenge of searching for the optimal architecture and framework, leadership-class facilities are used to give the computational power needed. Even with these facilities, it is important to choose the correct optimization techniques for efficient search. We use a framework combining Aging Evolution (AgE), rooted in the paradigm of evolutionary algorithms,27,28 for the outer loop with asynchronous Bayesian Optimization (BO)29 for the inner loop, together giving rise to the AgEBO technique.

The outer loop method, Aging Evolution (AgE), creates various neural architectures and enables concurrent training across multiple nodes. AgE follows an iterative mutation process: at each iteration t, it mutates existing architectures, leading to new candidates. This process is described as At=Mutate(At1,Mt), where At and At1 are the architecture populations at iterations t and t1, Mutate is the mutation operation, and Mt are the mutation rules at iteration t. The method is efficient in parallel.

In the context of continuous field prediction (also referred to as regression), which in our case is ψ and/or Jϕ, the likelihood function captures the noise model p(ϵ) of the statistical difference between the observed targets yobs and the predicted response ypred. Under the assumption of Gaussian noise,30 the likelihood distribution takes the form
(3)
where θ denotes the parameters and hyperparameters of the function f that predicts ypred=f(X,θ) and X is the input vector that is created from preprocessing the dmag introduced earlier in the introduction. Under the assumption of maximum likelihood given by Eq. (3), we arrive at the neural net loss function by taking the negative of the logarithm of the likelihood
(4)
where the i subscript indicates that this is the probability and parameters/hyperparameters of each model being optimized in the search. This model is a generalization of the typical neural net loss function, which only has the difference between the predictive value and the trained model, as in (yobsypred)2. The probabilistic interpretation given by Eqs. (3) and (4) along with the ensembling approach presented later in this section is what enables uncertainty quantification for the surrogates built with NAS in this work. We provide further refinement of the loss function above in Sec. II B to impose the physics constraints.

The implementation of our NAS method is done with DeepHyper.23 It employs a modular and flexible framework for defining the space of neural architectures, thus facilitating a comprehensive exploration of various architectural configurations. It encapsulates the architectural space through a high-level abstraction, which allows specifying ranges and types of architectural components, such as layers, activation functions, and other hyperparameters. This formulation is amenable to a variety of search algorithms to navigate the architectural landscape efficiently. Leveraging the capabilities of Ray,31 a distributed computing library, DeepHyper can parallelize the architecture search and training processes across multiple compute nodes or cores. This parallelization is instrumental in substantially accelerating the search, especially when navigating through a vast space of possible architectures.

For each model presented in this manuscript, we run a NAS on a single graphical processing unit (GPU) node, using 8 GPU cores, for 24 h on the Argonne National Laboratory (ANL) cluster Swing, which results in well-converged models as shown in Sec. IV. Depending on the problem size, the NAS runs produce as many as several hundred NN configurations. Once a NAS is complete, we form a deep ensemble32 consisting of N top-performing models. That is, rather than choose a single model that is the “best” in terms of having the best statistical measure of matching the true data during training, multiple models are used to increase robustness. This ensemble facilitates the ability to improve the uncertainty quantification of the overall predictions. Mathematically, it is expressed as
(5)
where the total probability p(yobs|X,D) denotes the likelihood of observing the true target yobs given the input X and dataset D. This formulation averages the predictions from all models in the ensemble to compute the likelihood of the true target yobs, showcasing a common strategy in deep ensembles to derive a consensus prediction. It can be shown that the mean and variance of the ensemble result can be written as
(6)
(7)
The mean represents the prediction of the ensemble. The two terms of the variance represent the aleatory (irreducible) and epistemic (reducible) uncertainties, and it is useful to view them separately. That is, in the rest of this paper, we will be showing the mean (prediction), and the two uncertainties associated with the mean
(8)
(9)
The epistemic uncertainty (also known as the knowledge uncertainty) refers to the model form uncertainty, i.e., how well a model with its chosen (hyper) parameters reproduces the truth or the observation, which, in this case, is taken to be ψ and/or Jϕ calculated by magnetic EFIT. Therefore, “epistemic” in this context only refers to the NN models, and not the different modes of EFIT introduced in Sec. I. The aleatory uncertainty refers to measurement noise, missing data, or data uncertainty.33 It is an inherent property of the data distribution, which in the present context can be related to the underlying limitation of the magnetic EFIT framework itself because of a lack of internal constraints.

The comprehensive NAS combined with the AgEBO approach as formulated here then has the following advantages: (1) automatically optimized neural models can be found using leadership class facilities, (2) improved robustness enabled by an ensemble approach, and (3) uncertainty quantification, with the categorization of two types of uncertainty, in those predictions. These are crucial features needed to build a robust and accurate surrogate that may not be attained by an un-optimized multi-layer perceptron (MLP) NN, even those that may perform similarly on an in-distribution test set. Such naive approaches often perform poorly when inferring on out-of-distribution data.

Up to this point, we have described a general method for forming a performant, probabilistic neural network that is independent of the specific physics that we wish to study. Here, we describe two physics-based constraints that we wish to impose and study their effects on the ability for accurate predictions.

First, we impose a smoothness constraint on the first derivative of ψ, to ensure the continuity of the magnetic field or C1 continuity of the flux. To implement this, we calculate the numerical spatial derivative of the true and predicted ψ using a Sobel–Feldman operator,34 then define a smoothing loss on ψ as the sum of absolute differences between these derivative terms. The Sobel operator was chosen because it is computationally inexpensive and introduces a smoothing in the direction perpendicular to the gradient, which helps reduce artifacts associated with central differencing. The loss term for ψ can be written therefore as
(10)
where the NNloss is defined by Eq. (4) with y=ψ, and the Sloss is the smoothing loss, which is scaled down to ensure it does not dominate the NNloss. To determine the optimal level of smoothing that does not affect the accuracy of ψ, we performed a coarse scan of the amplitude of Sloss, spanning a magnitude from 104 to 0.1. A magnitude of approximately 103 was found to be optimal.

The next constraint that we wish to enforce is that of the Grad–Shafranov itself, referred to as the Jϕ constraint henceforth, and which follows the design philosophy of physics-informed neural networks.35,36 To do so, we introduce a second model that predicts Jϕ from the right side of the Grad–Shafranov equation, given ψ predicted by the first model. The right side is valid only inside the separatrix, and our goal is to include only those contributions. That is, while our flux inference is for both the plasma and coil contributions, the Jϕ constraint should only constrain the current inside the separatrix. This is because calculating Jϕ from Δ*ψ is inaccurate for the coil currents, which EFIT calculates using Green's functions. There are two possible methods for including the current inside the separatrix: (1) Calculate the input vector to arrive at Δ*ψ via automatic differentiation, as was carried out by Ref. 12. After this is calculated, we would then have to have an additional calculation for having only the values inside the separatrix. (2) Use the right-side directly via training. The latter approach is not only less computationally intensive, but using the Grad–Shafranov equation requires one less differentiation and thus gives a Jϕ that is smoother. This yields an indirect constraint on ψ, but works well as seen in Sec. IV. An explanation for how our approach yields a Jϕ consistent with ψ is given in  Appendix A.

The loss function for this additional constraint, LossJ, is given by Eq. (4) with y=Jϕ.

To apply this hybrid-physics constraint, we adopt a dual-model setup where the first model is an MLP that learns ψ as a function of external magnetic data X, and the second model is another MLP that learns Jϕ as a function of ψ. In other words, these two models are linked to enforce the Grad–Shafranov equation.

Under the dual-model approach, the objective function becomes the joint loss function comprising the individual losses from each model Mψ and MJ is
(11)
The probabilistic generalization of Sec. II A then requires the joint distribution of ψ and Jϕ, which can be factorized as p(ψ,Jϕ|X,θψ,θJ)=p(ψ|X,θψ)·p(Jϕ|ψ,θJ). This formulation encapsulates a two-tier modeling paradigm: Mψ predicts p(ψ|X,θψ), and MJ, leveraging the output of Mψ, predicts p(Jϕ|ψ,θJ). The architectures of Mψ and MJ are obtained via NAS to optimally tailor them to their respective tasks. The generalization to ensembles is straightforward from this decomposition and is not shown here. Under the dual-model paradigm, the uncertainties described in Sec. II A are calculated for the prediction of both ψ and Jϕ at every mesh point on the EFIT grid. There is a tacit assumption here that the uncertainties at one mesh point are uncorrelated with those at another mesh point, i.e., we assume a diagonal correlation matrix.

The smoothing loss function described above was also necessary to mitigate the noise that arises in ψ due to the Jϕ constraint. Since the two targets are concurrently learned in our setup, the strong coupling between ψ and Jϕ feeds back into ψ. This effect is strong in the plasma core where current density peaks. By adjusting the amplitude of the Sloss term in Eq. (10), we were able to restore the smoothness in ψ, However, finite-difference calculations of Δ*ψNN indicate that a C2 continuity, i.e., smoothness of the second spatial derivative of ψ, may be needed.

This summarizes the physics constraints, and how they lead to a dual-model setup with a joint loss function. Another important physics-informed training method is how the magnetic data are treated. Specifically, in Sec. III, we discuss how to explicitly encode the correlations inherent in the magnetic signals.

To train our EFIT-Prime model, a dataset was created using approximately 180 000 magnetically constrained equilibria from approximately 800 discharges from the 2019 DIII-D campaign. This dataset features a diverse array of plasma conditions and includes the ramp-up/down (or pre/post-flat top) stages to ensure comprehensive coverage and model robustness.37 We split these data into 80% for training (145 701 samples), 10% for validation (18 214 samples), and 10% for testing (18 212 samples). Preparing the data in this fashion erases the individual identity of a single discharge, i.e., time slices from a single discharge can easily be in training, validation, and test datasets all at once.

The creation of this dataset was greatly facilitated by the development of workflow tools and data standards. Specifically, OMFIT38 was used to generate the bulk of the data. To ensure quality of the data, we implemented an equilibrium quality check and discarded any magnetic equilibrium that has li2.0 or li<0.05 or βN>8.0 or q95>40.0 or q0>10.0 or |Ip[MA]|<0.1, which are usually considered “poor” conditions for the plasma to remain in 2D force-balance. An additional filter that removes the equilibria with |Ip[MA]|/t<2.3 is also applied to remove the disrupting plasmas.

A second dataset, which excludes all negative triangularity (NT) discharges from the above dataset, containing approximately 1.7×105 magnetic equilibria was also created. These data are used to train different variations of EFIT-Prime pursued under ablation studies described in Sec. V. Withholding NT from the training of these other models gives us a stringent test of generalizability by testing each model on out-of-distribution NT discharges, specifically DIII-D discharges 180526-8, 180533, containing 957 equilibria, in this case. Note the NT equilibria contained in the EFIT-prime dataset have no overlap with NT equilibria that come from these four NT discharges. These additional datasets are also summarized in Table II. We next discuss the preparation of the magnetics data, poloidal magnetic flux, and the toroidal current density that are used to train the EFIT-Prime model.

TABLE II.

Summary of data sets used in training and later stress-testing the EFIT-Prime model, where the negative-triangularity discharges are withheld from training but used in inference. While the EFIT-Prime dataset contains some NT equilibria, none of the NT data comes from the four NT discharges used to create the NT inference set shown in row 3.

Data set type Total number of equilibria Training Validation Test
EFIT-Prime dataset  182 122  145 701  18 214  18 212 
NT-withheld dataset  178 282  142 624  17 829  17 829 
NT inference  957  ⋯  ⋯  957 
Data set type Total number of equilibria Training Validation Test
EFIT-Prime dataset  182 122  145 701  18 214  18 212 
NT-withheld dataset  178 282  142 624  17 829  17 829 
NT inference  957  ⋯  ⋯  957 

As discussed in the introduction, Sec. I, the magnetic signals form the basis of equilibrium reconstruction in this work. For DIII-D, these magnetic signals consist of the measurements of the poloidal magnetic field by an array of 76 3-axis magnetic probes (MP), poloidal magnetic flux picked up by 44 flux loops (FL), the absolute value of the plasma current Ip, which is a scalar quantity, and the electrical current in the 18 external poloidal field coils (FC) used in shaping the plasma, and the current in the 6 Ohmic coils (EC). These measurements from the magnetic sensors and currents in the coils, which add up to 145 signals in total, are used to construct the input vector X. The 145 magnetic signals are summarized in Table III, and their locations (not including |Ip|) in the DIII-D poloidal cross section are shown in Fig. 1.

TABLE III.

Summary of the magnetic signals that are used in the neural net input vector.

Input Definition Data size
MP  Poloidal magnetic field  76 
FL  Flux loops  44 
|Ip|  Plasma current 
EC  Ohmic coils 
FC  Poloidal field coils  18 
Total    145 
Input Definition Data size
MP  Poloidal magnetic field  76 
FL  Flux loops  44 
|Ip|  Plasma current 
EC  Ohmic coils 
FC  Poloidal field coils  18 
Total    145 
FIG. 1.

A cross section of the DIII-D tokamak with all of the external diagnostics and PF coils that enter magnetic EFIT as least squares constraints. (a) The red line segments represent the 76 magnetic (b) probes, the blue circles the 44 flux (ψ) loops, and the orange blocks the 18 poloidal field (PF) coils and 6 Ohmic coils. The gray corresponds to the vacuum vessel wall.

FIG. 1.

A cross section of the DIII-D tokamak with all of the external diagnostics and PF coils that enter magnetic EFIT as least squares constraints. (a) The red line segments represent the 76 magnetic (b) probes, the blue circles the 44 flux (ψ) loops, and the orange blocks the 18 poloidal field (PF) coils and 6 Ohmic coils. The gray corresponds to the vacuum vessel wall.

Close modal

For training of the EFIT-Prime model, all 145 input features are normalized by various combinations of the time-varying (toroidal) vacuum magnetic field B0 and the major and (average) minor radii of DIII-D, R0=1.67 m and r0=0.6 m (and vacuum permeability μ0 for the coil currents) to bring the scale of the inputs approximately to within the [1,1] range. For example, the magnetic-probe measurements are scaled by (r0/R0)B0, which is approximately the poloidal equivalent of the vacuum magnetic field. An example of normalized inputs for the DIII-D discharge 180 087 is shown in Fig. 2. The arrows correspond to the mean signal at that particular location over all time slices within the discharge. The range of the vertical bar represents the three-sigma standard deviation of each measurement/current. Not all of the 145 signals are necessarily active during a discharge. Often a few of the magnetic probe measurements are discarded for various reasons, such as poor measurement quality, calibration issues, or other data quality factors that are determined during the experimental run. The present implementation treats the missing signals by replacing them with their synthetic counterparts reconstructed by EFIT.

FIG. 2.

Mean (arrows) normalized magnetic diagnostics and coil currents and their 3σ standard deviations (vertical bar) for DIII-D discharge 180087. All 145 input features are normalized by various combinations of the time-varying (toroidal) vacuum magnetic field B0 and the major and (average) minor radii of DIII-D, R0=1.67 m and a=0.6 m (and vacuum permeability μ0 for the coil currents) to bring the scale of the inputs approximately within the [1,1] range.

FIG. 2.

Mean (arrows) normalized magnetic diagnostics and coil currents and their 3σ standard deviations (vertical bar) for DIII-D discharge 180087. All 145 input features are normalized by various combinations of the time-varying (toroidal) vacuum magnetic field B0 and the major and (average) minor radii of DIII-D, R0=1.67 m and a=0.6 m (and vacuum permeability μ0 for the coil currents) to bring the scale of the inputs approximately within the [1,1] range.

Close modal

The ability of ML models to generalize, that is give good predictions in regimes not included in the training, is highly dependent on the adequacy of the training data to represent the underlying phase-space features and constraints. One of our goals is to investigate an important factor of the training data, and that is the format of the data itself. Traditional methods of ML model training in plasma physics input the data as a 1D vector, i.e., in a tabulated fashion similar to what is seen in Fig. 2, without attempting to explicitly encode spatial correlations between the different sensors/coils. That is, the spatial locations are excluded from the inputs. To overcome this limitation, we embed the magnetic sensor measurements and coil currents in a 2D coordinate plane.

This spatial embedding procedure is as follows: first, the spatial coordinates of the magnetic sensors and PF/Ohmic coils are mapped to the corresponding indices on the 129×129 2D image canvas. If a sensor maps to an index, its value, i.e., measurement is directly assigned to the pixel at that index on the image. In cases where multiple sensors map to the same index, the mean value of the measurements is assigned to the pixel at that index. Finally, we choose to embed Ip at the center of the grid, as it is a scalar quantity. Another option could be to distribute it poloidally into many filaments whose total current would add up to Ip.

Because the embedding inflates the size of the input vector two orders of magnitude from 145 to 129×129, a second step entailing the compression of the input data is carried out. This compression uses principal component analysis (PCA), where only the first 30 principal components are retained, that contains more than 99% of the information contained in the embedded magnetic inputs, as indicated by Fig. 3. This is a compression of the input data by at least a factor 100, which provides huge savings in the computational cost of training otherwise immense NNs where both the inputs—because of the embedding—and outputs would contain approximately 1.6×104 features.

FIG. 3.

Explained variance showing the amount of compressed information in the first 30 PCA components for a 2D image of embedded magnetic inputs.

FIG. 3.

Explained variance showing the amount of compressed information in the first 30 PCA components for a 2D image of embedded magnetic inputs.

Close modal

Next, we describe the targets of the NN training.

During supervised training, the EFIT-Prime model was supplied with true values of a quantity that it attempted to reproduce in its output. These true values are the target vector of the model. For the present applications, the target vector is the poloidal magnetic flux ψ and the toroidal current density Jϕ on the 129×129 uniform EFIT mesh. The flux ψ is first normalized to lie over the range [0, 1] within the plasma via ψN(ψψ0)/(ψbψ0) using the flux at the magnetic axis ψ0 and at the plasma boundary ψb.39 Then, it is flattened to form a 1D array, which yields a total of 1292=16641 features per sample.

The toroidal current is first de-dimensionalized by the following transformation: Jϕμ0aJϕ/(2πB0), then compressed into 300 principal components to accelerate the NN training. These coefficients are further scaled with a standard scalar to have zero mean and a variance of unity, before the training. The NNs are tasked with learning the (scaled) coefficients of these principal components. Similar dimensional reduction of NN targets was carried out in Refs. 40 and 41. Note that the principal components are extracted from the training set alone, representing 80% of the data. It is assumed that the resulting set of basis vectors forms a complete set for Jϕ seen in most of the DIII-D scenarios and discharges. The first four of these basis vectors, i.e., principal components, are shown in Fig. 4. The dominant component of Jϕ with the current centroid appears in the first component; the second and third components correspond to a radial and axial centroid, resulting in an inward and downward shift of the current centroid. The fourth component results in a similar shift of the current centroid as the second and third components.

FIG. 4.

The first four PCA components of the toroidal current density Jϕ for the dataset used in training EFIT-Prime.

FIG. 4.

The first four PCA components of the toroidal current density Jϕ for the dataset used in training EFIT-Prime.

Close modal

In this section, the results obtained with the proposed EFIT-Prime model are discussed in detail.

To evaluate the model's performance, we employ two primary metrics across our test dataset of approximately 18 000 magnetic DIII-D equilibria (time slices). The first is the coefficient of determination (R2),
(12)
where the subscript i loops over N=129×129 mesh points (or pixels) and the overbar ¯ corresponds to the mean of true ψ over the entire mesh for one time slice.
The second is the Structural Similarity Index (SSIM)42 
(13)
where μ is the pixel sample mean, σ is the variance of either the truth (true), or the NN prediction (unscripted), or their covariance if it is σNN,true. C1=(k1L)2 and C2=(k2L)2 are regularizers with k1=0.01 and k2=0.03 by default. L is the dynamic range of the pixel values. SSIM considers a weighted combination of three measures: luminance, contrast, and structure that reduces to Eq. (13) under the choice of setting the weights of all three measures to 1. The SSIM for two identical images is 1, while that for two completely dissimilar images is −1. The SSIM calculation is carried out locally over a small window that slides over the entire image, similar to a convolution operation. For the results shown here, we use a 7×7 window. The final result is a single scalar that is the mean of all the 122×122 local similarity operations for the chosen window size.

The above two metrics, calculated for each ψ and Jϕ slice within the 129×129 mesh, assess accuracy and similarity to observed data, respectively. We then aggregate these individual metrics over the entire test set to form histograms to visualize the overall performance distribution. Furthermore, a visual inspection of the predicted vs observed ψ and Jϕ for the best, median, and worst predictions is provided, giving us a layered understanding of the model's accuracy in predicting the magnetic equilibria.

For this model, an ensemble is formed out of top-five NN configurations, which are then utilized to quantify the overall uncertainty consisting of both the aleatory and epistemic uncertainties for the poloidal flux and the toroidal current density across the 2D grid. Examining the relative uncertainty magnitudes in different regions and under various plasma scenarios gives us a measure of confidence, highlighting the model's predictive reliability, and how uncertainty varies across the plasma environment. Hereafter, the ensemble mean prediction is referred to as the default EFIT-Prime prediction.

The overall performance of the EFIT-Prime model can be assessed from the R2 and SSIM distributions of the prediction of ψ, shown in Fig. 5(a) and Jϕ shown in Fig. 6(a). EFIT-Prime produces good accuracy in its prediction of ψ, as evidenced by the tight clustering of both the R2 (blue) and SSIM (orange) distributions toward the right boundary, suggesting nearly perfect agreement between the mean predictions and the truth. The R2 distribution is clustered tightly toward the R2=1 boundary (right), with more than 99.5% of the samples with R2>0.995, indicating an accurate and robust prediction by the ensemble. Both the mean and median R2=1.0 to within three significant digits. A similar conclusion also holds for SSIM, indicating that EFIT-Prime successfully reproduces the flux surfaces.

FIG. 5.

The mean prediction of the poloidal flux ψ by the EFIT-Prime model, given the principal components of the external magnetic measurements and coil currents embedded in a 2D map: Shown are (a) the R2 (blue) and SSIM (orange) distributions of the predicted flux ψ for nearly 1.8×104 test samples, (b)–(d) the overlay of the true flux surfaces (black) against the NN-predicted flux surfaces (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the flux prediction for the same three samples.

FIG. 5.

The mean prediction of the poloidal flux ψ by the EFIT-Prime model, given the principal components of the external magnetic measurements and coil currents embedded in a 2D map: Shown are (a) the R2 (blue) and SSIM (orange) distributions of the predicted flux ψ for nearly 1.8×104 test samples, (b)–(d) the overlay of the true flux surfaces (black) against the NN-predicted flux surfaces (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the flux prediction for the same three samples.

Close modal
FIG. 6.

The mean prediction of the toroidal current density Jϕ by the EFIT-Prime model, given the principal components of the external magnetic measurements and coil currents embedded in a 2D map. Shown are (a) the R2 (blue) and SSIM (orange) distributions of Jϕ for nearly 1.8×104 test samples, (b)–(d) the overlay of the true Jϕ (black) against the NN-predicted Jϕ (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the Jϕ prediction for the same three samples.

FIG. 6.

The mean prediction of the toroidal current density Jϕ by the EFIT-Prime model, given the principal components of the external magnetic measurements and coil currents embedded in a 2D map. Shown are (a) the R2 (blue) and SSIM (orange) distributions of Jϕ for nearly 1.8×104 test samples, (b)–(d) the overlay of the true Jϕ (black) against the NN-predicted Jϕ (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the Jϕ prediction for the same three samples.

Close modal

To gain further insight, the samples with the worst (left), median (middle), and best (right) R2 are identified, for which we carry out a visual comparison of the poloidal flux ψ predicted by EFIT-Prime (appearing as dashed red contours) against the true flux (appearing as solid black contours), shown in Figs. 5(b) and 5(d). We observe that the flux overlay for the best and median samples is nearly perfect, making it virtually impossible to distinguish the predicted flux surfaces from the true ones. The sample with the worst R2 appears to be somewhat of an outlier and could be from the early-start-up phase or end of a discharge. However, even in this case, the predicted flux surfaces do not deviate too much from the true flux surfaces.

The aleatory (AU) and epistemic (EU) uncertainties are shown in the third (e)–(g) and fourth (h)–(j) rows of Fig. 5 for the same samples mentioned above. The uncertainties are given in the same scale as the normalized flux and normalized Jϕ. Thus, they are dimensionless and also stated as variance squared (σ2), in accordance with Eqs. (8) and (9). For the outlier sample, with the lowest R2, the two types of uncertainty are relatively large. For the best and median predictions, the uncertainties are small, nearly four-to-five orders of magnitude smaller than the normalized flux, amounting to an error less than 102, attesting to the accuracy of the model. For comparison, magnetic EFITs have a convergence error of 104 and a root-mean-square Grad–Shafranov residual of approximately 103. For the best and median samples, the AU is larger than the EU, whereas, for the worst sample, the EU is larger. This is not an established pattern, however, and in general, our observations of EFIT-Prime and other models' results indicate the AU to be on average larger than EU. The regions that show the largest AU are at the center of the grid, mid-core, and the vicinity of the shaping coils on the outboard side of the plasma. The EU is also the largest at the center of the grid. The elevated AU around the outboard side coils is likely due to the variation in the coil currents associated with controlling the plasma shape, which would further increase to generate and control NT plasmas. As the AU is the mean of the variances produced by each NN configuration within the ensemble, its irreducible nature becomes apparent here: each NN configuration picks up some inherent uncertainty from the shaping coils, thereby producing a non-vanishing mean of those uncertainties. However, the EU shows no such error around the shaping coils; as all the five configurations in EFIT-Prime produce a similar level of uncertainty there, so does their mean. This translates to a vanishing model uncertainty, as calculated by Eq. (9), showcasing the reducible nature of this type of uncertainty.

Next, we assess the quality of the EFIT-Prime prediction of the Jϕ on the same test set. The distributions R2 and SSIM for Jϕ are shown in Fig. 6(a), displaying a wider spread than the distribution for ψ, suggesting that the accuracy of the model in learning Jϕ slightly drops compared to that for ψ. In fact, in this case, almost half of the test samples have R2<0.995 (with about 3% of the test samples having an R2<0.95). This slight drop in performance is also indicated by the median R2=0.998 and mean R2=0.995. A comparison of the true to predicted Jϕ for the same three samples is shown in Figs. 6(b) and 6(d). A similar conclusion follows here as well: the worst sample appears to be an outlier. There is a good qualitative agreement between the truth and the models' predictions for the median and best samples, as expected. The peak AU for the Jϕ predictions appears as large as the predicted Jϕ and orders of magnitude larger than the AU for ψ. This is also true for the EU. The two types of uncertainty seem heavily concentrated in the plasma core. This and the fact that the uncertainty is not localized in any of the regions that would be clear indicators of numerical error are thought to be a consequence of the fundamental limitation of magnetic EFITs, which lack internal constraints. This effect is especially pronounced in the plasma core and should affect Jϕ more strongly than ψ since Jϕ is a second-order spatial derivative of ψ. This is consistent with the well-known and documented limitations of the external magnetic-only reconstructions.1,43,44 That our framework can quantify this uncertainty is a feature, and not a bug of our modeling approach, and thus one of the highlights of the present work.

The learning history, i.e., the evolution of the loss function for the best configuration within EFIT-Prime, is shown in Fig. 7, which tracks the ψ training and validation (black and red) loss given by Eq. (10) and Jϕ training and validation (blue and orange) loss given by Eq. (4), as a function of the NN epochs. In this case, the training went as far as 140 epochs before being terminated by the early-stopping criterion. As expected, the validation error is slightly larger than the training error and undergoes a noisier evolution for both ψ and Jϕ, because the validation data are never used to adjust the NN weights. The sudden drop that occurs in the loss function is due to the piecewise constant learning rate schedule that drops the learning rate by pre-defined factor after a given number of epochs. Also note that the loss function for ψ undergoes a much more rapid evolution, dropping several orders of magnitude while the loss for Jϕ evolves more gradually, but still undergoes the same scheduled drop in the learning rate before epoch 50.

FIG. 7.

The training and validation loss for the targeted quantities: ψ and the PCA coefficients of Jϕ as a function of NN epochs for the best-performing NN model from the ensemble forming EFIT-Prime. Under the maximum likelihood assumption, the loss function becomes the negative of the natural logarithm of the likelihood function, which for ψ is given by Eq. (10) (including the smoothing loss described in Sec. II B) and for Jϕ is given by Eq. (4) under the transformation ψJϕ.

FIG. 7.

The training and validation loss for the targeted quantities: ψ and the PCA coefficients of Jϕ as a function of NN epochs for the best-performing NN model from the ensemble forming EFIT-Prime. Under the maximum likelihood assumption, the loss function becomes the negative of the natural logarithm of the likelihood function, which for ψ is given by Eq. (10) (including the smoothing loss described in Sec. II B) and for Jϕ is given by Eq. (4) under the transformation ψJϕ.

Close modal

The neural architecture corresponding to the top-performing configuration of EFIT-Prime is shown and discussed in  Appendix B. NAS produces large configurations with many connections and approximately ten × 106 model parameters (weights and biases) (a standard deterministic MLP would also have nearly as many model parameters to optimize). The total inference time (from 5 configurations that makeup EFIT-Prime ensemble) has been clocked at 15 ms, with 7 ms for ψ and 7.5 ms for Jϕ prediction, the latter taking up as much time because of the operations that undo the scaling and PCA to map the prediction back to (R,Z). This is without any parallelization or optimization; each configuration's prediction is executed sequentially and without any optimization in the Python model evaluation for inference or the NN architecture itself. The latter can be rebuilt under NAS with an architectural sparsity constraint to speed up the inference time, which will be the topic of subsequent work.

We have pursued several, possibly advantageous, modeling strategies in the data preparation and formulation of physics constraints for building the EFIT-Prime model. To understand the contribution of each strategy to the final model, we perform ablation studies where a certain strategy, be it a model component or a different way to prepare the input vector, is removed and the model performance is then reevaluated after retraining. We pursue two main approaches of surrogate modeling in this study, starting with the removal of the Jϕ constraint from the models. In the second approach, we undo the spatial embedding of the input vector and instead use tabulated magnetic inputs, again without Jϕ.

For each approach, a NAS is carried out to determine an ensemble of best-performing models. A crucial aspect of this study is the training of the ablated models on a curated dataset that excludes all negative triangularity (NT) discharges (discussed in Sec. III). To establish a baseline for the ablation study, we first retrain the EFIT-Prime model on this dataset devoid of NT. Altogether, we present three modeling approaches in this section: first, a baseline representing EFIT-Prime without NT data followed by the two aforementioned ablative approaches that are also ignorant of NT. These three approaches are summarized as follows:

  1. EFIT-Prime without negative triangularity,

  2. ψ-only with spatially embedded magnetic inputs,

  3. ψ-only with tabular magnetic inputs.

The performance of models built under the three approaches is first gauged on in-distribution No-NT data in Secs. V A–V C, and then on out-of-distribution data consisting of NT discharges, presented in Sec. V D. This “handicap” of withholding NT from the training provides an excellent platform for assessing the role of each component in improving the generalizability of our models when inferring on out-of-distribution NT discharges in Sec. V D.

This is the same approach as that used in the construction of the final model, EFIT-Prime. It still includes the two crucial modeling components: the spatial embedding of the inputs and Jϕ constraint that we wish to investigate for improved generalizability, except the models, in this case, are trained on a special dataset devoid of NT discharges, shown in the second column of Table II. This is done to establish a baseline for EFIT-Prime against which we can compare the ablated models of Secs. V B and V C.

Similarly to Sec. IV, we use the R2 and SSIM distributions of the ensemble mean prediction of ψ and Jϕ, and their aleatory and epistemic uncertainty to assess the performance of the model. We find that R2 and SSIM distributional metrics for ψ are similar to those reported for EFIT-Prime in Sec. IV where the NT data were part of the training, as shown in Fig. 8(a). Both the mean and median R2/SSIM=1.0 to within three significant digits, as was observed in the EFIT-Prime results.

FIG. 8.

The mean prediction of ψ by the version of EFIT-Prime that is ignorant of negative triangularity (NT), given the principal components of the external magnetic measurements and coil currents embedded in a 2D map. Shown are (a) the R2 (blue) and SSIM (orange) distributions of predicted flux ψ for nearly 1.7×104 test samples, (b)–(d) the overlay of the true flux (black) against the NN-predicted flux (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the flux prediction for the same three samples.

FIG. 8.

The mean prediction of ψ by the version of EFIT-Prime that is ignorant of negative triangularity (NT), given the principal components of the external magnetic measurements and coil currents embedded in a 2D map. Shown are (a) the R2 (blue) and SSIM (orange) distributions of predicted flux ψ for nearly 1.7×104 test samples, (b)–(d) the overlay of the true flux (black) against the NN-predicted flux (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the flux prediction for the same three samples.

Close modal

The R2 and SSIM distributions for the Jϕ are shown in Fig. 9(a). These distributions and the mean/median R2/SSIM for Jϕ show a slight improvement over the Jϕ results from EFIT-Prime, However, this is likely a consequence of the reduced variance in the training data in use here due to the lack of NT equilibria; EFIT-Prime has to accommodate the NT shape compared to the model studied here. To put it in other terms, the present model might fit its dataset better, however, likely at the expense of generalizing to out-of-distribution data like NT plasmas.

FIG. 9.

The mean prediction of the toroidal current density Jϕ by the version EFIT-Prime that is ignorant of NT, given the principal components of the external magnetic measurements and coil currents embedded in a 2D map. Shown are (a) the R2 (blue) and SSIM (orange) distributions of the predicted Jϕ for nearly 1.7×104 test samples, (b)–(d) the overlay of the true Jϕ (black) against the NN-predicted Jϕ (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the Jϕ prediction for the same three samples.

FIG. 9.

The mean prediction of the toroidal current density Jϕ by the version EFIT-Prime that is ignorant of NT, given the principal components of the external magnetic measurements and coil currents embedded in a 2D map. Shown are (a) the R2 (blue) and SSIM (orange) distributions of the predicted Jϕ for nearly 1.7×104 test samples, (b)–(d) the overlay of the true Jϕ (black) against the NN-predicted Jϕ (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the Jϕ prediction for the same three samples.

Close modal

The qualitative comparisons on the worst (left), median (middle), and best (right) R2, for ψ is shown in Figs. 8(b) and 8(d), and Jϕ is shown in Figs. 9(b) and 9(d). Here too, we find that the flux overlay for the best and median samples is nearly perfect, making it virtually impossible to distinguish the predicted flux surfaces from the true ones. However, we note that for ψ, there are three samples with the worst R2<0.9 and the illustrated sample appears to undergo a vertical displacement event as indicated by the strong axial shift of the flux surfaces. When comparing the aleatory and epistemic uncertainty, we found that the magnitude and locations of high relative uncertainty for the ψ remain similar to those of the EFIT-Prime results discussed in Sec. IV. For Jϕ, we find a lower magnitude for the AU overall, but the locations of the high uncertainty remain somewhat consistent. For the EU, which represents the (reducible) model-form uncertainty, we also see that the present case without NT evinces slightly lower values for the displayed samples. This conclusion is also supported by the mean uncertainties taken over all the test samples and over all values on the grid for each sample.

Here, we further remove the Jϕ constraint to study its effect on the ψ predictions. So, in this approach, we only learn ψ, given the spatially embedded magnetic data, using the same approach to learning the model and obtaining the ensembles for uncertainty quantification, with the exception that it is a single model setup, described in Sec. II A. Without the Jϕ constraint, the computational cost goes down a little with the number of model parameters decreasing to approximately 7 × 106 (from 10 × 106) per NN configuration.

In line with the model discussed in Sec. IV, our ensemble mean prediction of ψ shows a highly concentrated R2 distribution near 1, with over 99.5% of samples exceeding 0.995 and both mean and median R2 and SSIM values reaching 1.0 to within three significant digits, indicating good agreement with true flux surfaces (see Fig. 10). The analysis of aleatory (AU) and epistemic (EU) uncertainties reveals them to be significantly smaller for both best and median predictions, nearly five-to-six orders of magnitude less than the normalized flux, with AU consistently larger than EU by a factor of 2–3. Again, the maximum AU is localized near the shaping coils on the plasma's outboard side and the mid-core region, mirroring observations from Sec. IV, with negligible uncertainty in the vacuum region. The structure in EU looks somewhat different compared to that for EFIT-Prime and EFIT-Prime without NT.

FIG. 10.

The mean prediction of the poloidal flux ψ by the ensemble of five best-performing MLP configurations determined by NAS, given the principal components of the external magnetic measurements and coil currents embedded in a 2D map. Shown are (a) the R2 and SSIM distribution of the predicted flux ψ aggregated over the entire test set of nearly 1.8×104 samples, (b)–(d) the overlay of the true flux (black) against the NN predicted flux (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the flux prediction for the same three samples.

FIG. 10.

The mean prediction of the poloidal flux ψ by the ensemble of five best-performing MLP configurations determined by NAS, given the principal components of the external magnetic measurements and coil currents embedded in a 2D map. Shown are (a) the R2 and SSIM distribution of the predicted flux ψ aggregated over the entire test set of nearly 1.8×104 samples, (b)–(d) the overlay of the true flux (black) against the NN predicted flux (red dashed) for three samples with the worst, median, and best R2, (e)–(g) aleatory, and (h)–(j) epistemic uncertainties in the flux prediction for the same three samples.

Close modal

We note that removing the Jϕ constraint has negligible impact on the ψ prediction accuracy for in-distribution data, showcasing the model's robustness within known scenarios. It remains to be seen if the models without the Jϕ constraint will perform as well on out-of-distribution data of which NT is an example. This is the topic of Sec. V D.

This can be considered the least sophisticated and perhaps the baseline approach for building our surrogates. It uses a simplistic approach where the magnetic measurements and coil currents are input into the model as flat, structured data, forming a tabular representation of the inputs. Given these tabular inputs, an MLP configuration is again used in NAS to learn the map from the inputs X to ψ.

This strategy does not explicitly encode any spatial correlation and instead relies on the underlying spatial correlations present in the measurements to deduce relevant physical reconstructions. For example, the measurements from the magnetic probes contain correlations based on the underlying plasma generating the magnetic field, which is contained in the measurement value but otherwise not explicitly encoded. While this approach is computationally less demanding, similar to the model of Sec. V B, with only about 7 × 106 model parameters for each NN configuration in the deep ensemble, it might fall short in capturing the spatial dynamics crucial for accurate prediction of plasma scenarios, especially ψ, which might have intricate dependencies on the spatial configuration of magnetic measurements.

We find that the removal of spatial correlation from magnetic inputs and the Jϕ constraint does not significantly impact predictions for in-distribution data encompassing only positive triangularity equilibria. This observation is supported by the distributions of R2 and SSIM metrics for the ψ predictions over the entire test set, by the mean and median R2/SSIM, which are again 1.0–3 significant digits, and by the relative magnitude and location of aleatory (AU) and epistemic (EU) uncertainties. The summary figure for this model is omitted as it shows results that are very similar to those of Sec. V B.

While the ensemble of models built under this approach seems to perform well on its in-distribution test set, we will see shortly that the exclusion of NT training data and spatial embedding alongside the absence of the Jϕ constraint can profoundly affect predictions for out-of-distribution scenarios. This impact is thoroughly explored in the following Sec. V D.

We have discussed throughout this manuscript the importance of building models with the ability to generalize to cases that may lie out of distribution with respect to their training set. Here, we carry out an inference study on such a case, four DIII-D discharges with negative triangularity (NT) plasma shape, to rigorously test the modeling strategies (including input preparation), studied under ablation in Sec. V, that have led to the final surrogate model: EFIT-Prime. It was noted in Sec. V that assessing these ablated models' performance on the test alone did not yield any conclusive evidence for determining the winning and losing modeling strategies. This is why we turn to the NT scenario here because it embodies the unique characteristic of a “flipped” plasma shape, providing a stringent test of the models' adaptability, especially for those models that have never seen NT in their training. This method of stress-testing the models aims to unveil any potential biases and assess robustness against overfitting to specific plasma scenarios. The final model EFIT-Prime is also tested on the same NT discharges to provide a baseline of expected performance.

The inference set consists of approximately one thousand magnetic equilibria (samples or time slices) from four NT DIII-D discharges (180526-8, 180533) performed during the 2019 campaign (see bottom row of Table II). EFIT-Prime and its ablated “cousins” are tasked with predicting the flux surfaces for all the time slices contained within this inference set. The average triangularity for each time slice is then calculated from the predicted poloidal flux and compared against the “true” triangularity calculated from the true poloidal flux extracted from magnetic EFIT reconstructions of the four NT discharges.

Adhering to the analysis presented in Secs. IV and V, we begin with the R2 and SSIM distributions of the prediction of Ψ by each of the ablated models (a-c) as well as EFIT-Prime. The median and mean R2 for each approach are also displayed in the upper left corner of each subfigure in Fig. 11. The “naive” approach of Sec. V C, which uses tabulated magnetic inputs to predict only ψ, performs poorly, producing no samples with acceptable accuracy with R20.95. In fact, this case produces many samples with R2<0, including the peak of its distribution, but setting the minimum R2=0.1 in Fig. 11 excludes many of them. Upon changing the way the magnetic inputs are fed into the NNs, i.e., switching to the embedded inputs (and their principal components), we see a notable improvement in the prediction accuracy with the peak of the R2 distribution shifting to approximately 0.5 [from the negative peak distribution observed in Fig. 11(a)]. However, this is still far from the kind of accuracy demanded of a reliable surrogate, producing only two predictions with R2>0.95. The inclusion of the Jϕ constraint improves the predictive capability slightly, with the distribution's peak shifting to R2>0.6. This conclusion is also supported by the more than 10% improvement in the median and mean R2, displayed in each subfigure of Fig. 11. Finally, the EFIT-Prime model shows accurate predictions of the flux with more than 80% of samples with R2>0.95, and median/mean R2=0.995/0.962. Interestingly, for all four cases, the SSIM paints a more optimistic picture than R2, but not so much as to contradict the conclusions based on the R2 here.

FIG. 11.

The R2 and SSIM distribution for the magnetic flux surfaces predicted by each ablated model (a)–(c) and EFIT-Prime (d), aggregated over four NT DIII-D discharges.

FIG. 11.

The R2 and SSIM distribution for the magnetic flux surfaces predicted by each ablated model (a)–(c) and EFIT-Prime (d), aggregated over four NT DIII-D discharges.

Close modal

Shown in Fig. 12 is the average triangularity δ calculated from the poloidal flux predicted by EFIT-Prime and the ablated models, representing the three different modeling approaches. The results are shown over the same aforementioned four NT discharges that are stitched together from end to beginning to form the inference set. The true δ is shown as the black trace, showing δ0.16 except at the beginning and end of the discharges (the abrupt jumps occurring around time slices 220, 600, and 900 correspond to the end of one shot and the beginning of the next one). The most naive approach (purple trace) is unable to sense NT, consistently yielding positive triangularities with an average δ0.2. The approach that learns only ψ from the spatially embedded magnetic inputs (green trace) shows a remarkable improvement in the prediction, yielding consistently negative, albeit underestimated, triangularity, with an average of δ0.075. Therefore, it stands to reason that the spatial embedding of the magnetic inputs provides crucial physics information to the NN surrogates about the plasma shape. This embedding partially restores the fact that external magnetics are sufficient to determine the plasma shape. The next approach is the version of EFIT-Prime without NT (blue). It too yields similar, perhaps slightly degraded, predictions for δ as the model with the embedding but without Jϕ. These results suggest that the Jϕ constraint does not add further physics information that can dramatically increase the models predictive capability on NT. However, this is not surprising since the PCA basis functions were created out of only positive-triangularity (PT) equilibria. Thus, there is no way to completely represent NT by a linear combination of any of the 300  Jϕ PCA basis vectors used here, i.e., a few NT basis functions must lie in a space that is orthogonal to the PT basis functions. A similar observation was reported by Ref. 41. Finally, the triangularity calculated from the EFIT-Prime's poloidal flux predictions is shown as the red trace, which tracks the true δ closely, indicating that the model's predictive power improves dramatically when a relatively small set of NT equilibria is included in its training set (in this case about 5%–7% of the training data had NT). In other words, EFIT-Prime's performance on NT could be further improved, if NT had increased representation in the training data.

FIG. 12.

The average triangularity δ calculated from the poloidal flux predicted by the EFIT-Prime (red) and the models built under the approaches of ablation plotted against the true δ.

FIG. 12.

The average triangularity δ calculated from the poloidal flux predicted by the EFIT-Prime (red) and the models built under the approaches of ablation plotted against the true δ.

Close modal

We next provide a qualitative comparison of the plasma boundary calculated from the predicted poloidal flux against the true boundary calculated from the true poloidal flux. This comparison is carried out for many samples we pull randomly from the inference set; however, only a single time slice, which is shown as the vertical dashed line in Fig. 12, is illustrated in Fig. 13. The same coloring scheme as that in Fig. 12 is used here as well to delineate the different approaches. The true boundary, which is calculated from the true ψ (in the same way as the boundaries from the NN models are calculated), is shown as the solid black curve. We use the same method as EFIT to find the plasma boundary: by performing a binary search that checks whether the flux surface is closed or open (or hits the wall), and iterating until the last closed-flux surface is found. The particular sample, displayed in Fig. 13, evinces a lower-diverted plasma, with strong NT in the upper half. Of the four cases shown, only EFIT-Prime [Fig. 13(d)] captures the upper (negative) triangularity of this time slice correctly, showing the best agreement with the truth, while the other approaches shown in Figs. 13(a) and 13(c) all struggle with the plasma shape to varying degrees. Of course, any conclusion about model performance needs to be directed to Fig. 12, which shows the overall trend of the models' evaluation on out-of-distribution NT data, whereas Fig. 13 offers a glimpse in terms of a single time slice.

FIG. 13.

The plasma boundary, i.e., the last closed-flux surface, is shown for one time slice out of the 957 aggregated from four DIII-D NT discharges for the NT inference test. The boundary is calculated from ψ predicted by (a) ψ-only learning with tabular inputs, (b) ψ-only learning with embedded inputs, (c) EFIT-Prime with No NT in its training (concurrent learning of ψ and Jϕ with embedded inputs), and finally, and (d) EFIT-Prime.

FIG. 13.

The plasma boundary, i.e., the last closed-flux surface, is shown for one time slice out of the 957 aggregated from four DIII-D NT discharges for the NT inference test. The boundary is calculated from ψ predicted by (a) ψ-only learning with tabular inputs, (b) ψ-only learning with embedded inputs, (c) EFIT-Prime with No NT in its training (concurrent learning of ψ and Jϕ with embedded inputs), and finally, and (d) EFIT-Prime.

Close modal

EFIT-Prime represents a novel approach to creating an accurate, and robust neural network surrogate for EFIT, aiming to fully replace traditional EFIT processes. This surrogate model focuses on minimizing errors in calculating poloidal flux and toroidal current, specifically targeting the reduction of Grad–Shafranov residual errors. Keys to EFIT-Prime are its probabilistic nature and its incorporation of physics constraints, which collectively enhance its accuracy and robustness.

At the heart of EFIT-Prime's probabilistic approach is the integration of a Bayesian Optimization (BO) with neural architecture search (NAS) using an Aging Evolution (AgE) algorithm. This method, termed AgEBO, systematically explores and optimizes multiple models to iteratively converge to high-performing model architectures. The highest-ranked models from this search are subsequently amalgamated into a deep ensemble, enhancing the robustness of the predictions and facilitating nuanced quantification of uncertainties. Specifically, this ensemble approach allows for the separation of uncertainties into aleatory (irreducible) and epistemic (reducible) types. Moreover, EFIT-Prime innovatively applies a multi-model neural architecture search to enforce physics constraints directly within the neural network architecture, focusing on the magnetic flux (ψ) and the toroidal current density (Jϕ) relationships. This strategy not only leverages the strengths of deep learning but also ensures that the model adheres closely to physical laws, significantly boosting its accuracy, generalizability, and robustness.

An additional physics-informed modeling strategy is embedding the magnetic sensors and coils in a 2D coordinate plane to explicitly retain spatial correlations between different magnetic sensors. This contrasts with the traditional way of inputting the data as a 1D vector, i.e., in a tabulated fashion. The information in the resulting 2D maps is compressed into 30 principal components (PCs), and we feed into EFIT-Prime the coefficients of these 30 PCs as the input vector for each time slice. This embedding of the magnetic inputs has proven to be a winning strategy as far as model generalizability is concerned, as explained further below.

Approximately 180 000 magnetically constrained equilibria from the DIII-D 2019 campaign are used to train, validate, and test (with an 80-10-10 split) many different NN configurations built under NAS. The top five performing models are then used to form a deep ensemble that represents the EFIT-Prime model. The performance metrics: the distribution of the coefficient of determination R2 and structural similarity index SSIM aggregated over the 18 000 test samples indicate high reconstruction accuracy of the poloidal flux, with low epistemic and aleatory uncertainties for more than 99.5% of the samples in the test set. The key result is shown in Fig. 5, where high accuracy is seen for all of the test cases. The worst case, which has an R2=0.93, has much larger uncertainties than all of the other cases. That is, our model not only has high predictive value but also gives us a measure of confidence associated with each of its predictions.

To understand the contribution of each strategy to EFIT-Prime, we performed ablation studies by removing a particular strategy, be it a model component or a nonstandard way to formulate the NN input vector, and then reevaluating the ablated models' performance after the training. We deliberately withheld negative triangularity (NT) discharges from the training of these models. We then tested the ablated models on a special inference set consisting of four NT DIII-D discharges. This method of withholding a particular scenario from a model during its training and then testing that model on that previously withheld scenario makes the ideal platform for gauging the generalizability of our models and determining the winning modeling strategies. Our results indicate the spatial embedding of the magnetic inputs to significantly improve the generalizability of the models. The contribution of the Jϕ constraint for improving the generalizability seems inconclusive in so far as the results of our NT inference test goes. Alternative approaches such as that of Ref. 41, where the contributions to the poloidal flux from the plasma current and coil currents are separated in the same way that EFIT separates them, with the NNs trained on the plasma contributions will be interesting to compare with.

As part of the future work, we plan on extending the training to other DIII-D campaigns from 2018 to 2022 and performing inference across different years to assess the generalizability of the models further. We are also planning on adding the magnetic pitch angle measurements from DIII-D Motional Stark effect diagnostic (MSEd) to the input vector used for EFIT-Prime. Training new models on externally (with the magnetics) and internally (with MSEd) constrained EFIT equilibria with NAS could yield improved predictions of Jϕ with smaller uncertainties in the plasma core. We will also expand the number of quantities to be learned under the NAS approach to include the plasma boundary, certain profiles, and the derived discharge parameters such as internal inductance, the normalized beta, and plasma volume, thereby building on work by Ref. 1 and laying the foundations for a drop-in surrogate for EFIT. In addition, we aim to speed up inference times from the current level of 15 ms per time slice down to approximately millisecond timescale per time slice for real-time deployment of EFIT-Prime in parallel with real-time EFIT. We will also consider other ways to reconstruct Jϕ and models to exploit the temporal correlations in the training data.

Furthermore, the comprehensive probabilistic approach under the NAS framework offers the possibility of moving to dynamic and real-time full kinetic equilibrium reconstructions that are currently limited to offline analysis because they are computationally expensive. To this end, we will pursue the application of the NAS framework to kinetic surrogate modeling as a key follow-up work. In fact, we performed a scoping study of kinetic surrogate modeling and its sensitivity to different sets of diagnostics data,45 which will form a basis for extending EFIT-Prime to incorporate kinetic data for flux prediction.

This work is supported by the U.S. Department of Energy, Office of Fusion Energy Science (Award Nos. DE-AC02-06CH11357, DE-SC0021203, DE-FG02-95ER54309, and DE-SC0021380). The authors acknowledge the computational resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under the contract (No. DE-AC02-06CH11357), and Laboratory Computing Resource Center (LCRC) at the Argonne National Laboratory. The data used in this work are based upon work supported by the U.S. Department of Energy, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility (Award(s) No. DE-FC02-04ER54698).

The authors thank Erik Olofsson for his insightful comments to improve the manuscript.

This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

The authors have no conflicts to disclose.

S. Madireddy: Conceptualization (lead); Formal analysis (lead); Funding acquisition (lead); Investigation (equal); Methodology (lead); Project administration (lead); Resources (lead); Software (lead); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). C. Akcay: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (supporting); Resources (supporting); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead). S. E. Kruger: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Funding acquisition (lead); Investigation (supporting); Project administration (lead); Resources (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). T. Bechtel Amara: Conceptualization (equal); Data curation (lead); Formal analysis (supporting); Investigation (supporting); Resources (lead); Validation (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (supporting). X. Sun: Data curation (supporting); Investigation (supporting); Methodology (supporting); Writing – original draft (supporting); Writing – review & editing (supporting). J. McClenaghan: Data curation (lead); Investigation (supporting); Resources (equal); Writing – original draft (supporting); Writing – review & editing (supporting). J. Koo: Conceptualization (equal); Formal analysis (supporting); Investigation (supporting); Software (supporting). A. Samaddar: Methodology (supporting); Software (supporting). Y. Liu: Conceptualization (supporting); Funding acquisition (equal); Project administration (equal); Supervision (equal); Writing – review & editing (equal). P. Balaprakash: Conceptualization (supporting); Project administration (lead); Supervision (lead). L. L. Lao: Conceptualization (lead); Formal analysis (supporting); Funding acquisition (lead); Investigation (equal); Project administration (lead); Resources (supporting); Software (equal); Supervision (lead); Writing – review & editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

The inclusion of Jϕ in the learning satisfies the GS equation up to EFIT's convergence error. The proof is as follows: Let the objective be a minimum of the following mixed loss function: LM=RJϕNN+Δ*ψ, where the true offline EFIT solution ψ has been inserted in the left side of Eq. (1) and the NN toroidal current density JϕNN in the right side of Eq. (1), neglecting μ0. We next carry out the following:
(A1)
where ϵ1 is the EFIT convergence error described at the end of Sec. I and the MSE loss function for Jϕ has been inserted into the last expression. Solving Eq. (A1) for the loss function for the current density keeping only terms to leading order in ϵ, we obtain
(A2)
Thus, minimizing the loss function of the current density amounts to minimizing LM. A similar argument can be made on minimizing the loss for ψ by constructing another mixed loss function: LM=RJϕ+Δ*ψNN.

The neural architecture corresponding to the best configuration of EFIT-Prime is shown in Figs. 14 and 15. The model Mψ that predicts ψ from external magnetic data X is shown in Fig. 14, and the model (MJ) that predicts Jϕ from ψ is shown in Fig. 15. The neural architecture displays several counter-intuitive feed-forward and skip connections, with some nodes having concurrent connections to multiple nodes. The final layer shows the mean and standard deviation prediction for each ensemble, while the left branch in Fig. 14 shows a sampling layer that samples from the ψ distribution defined by the predicted mean and standard deviation of ψ that is input to the neural architecture prediction Jϕ as seen in Fig. 15. The depicted NN configuration contains approximately 10 × 106 model parameters, most of which (approximately 7 × 106) belong to Mψ since ψ is learned on the entire EFIT mesh, whereas Jϕ enters the NN as 300 PC coefficients. Note any garden variety MLP NN would also produce approximately the same number of parameters because of the immense size of the output vector that is dominated by the high-resolution ψ.

FIG. 14.

The architecture of the best-performing configuration from the deep ensemble used in constructing EFIT-Prime, described in Sec. II. Only the architecture of the model learning ψ(Mψ), containing approximately 14 × 106 parameters, is shown. There are approximately that the NN optimizes A box can represent a dense layer, an activation, or an addition (for residual layers).

FIG. 14.

The architecture of the best-performing configuration from the deep ensemble used in constructing EFIT-Prime, described in Sec. II. Only the architecture of the model learning ψ(Mψ), containing approximately 14 × 106 parameters, is shown. There are approximately that the NN optimizes A box can represent a dense layer, an activation, or an addition (for residual layers).

Close modal
FIG. 15.

The architecture of the best-performing configuration from the deep ensemble used in constructing EFIT-Prime, described in Sec. II. Only the architecture of the second model learning Jϕ (MJ), containing approximately 2 × 106 parameters, is shown. A box can represent a dense layer, an activation, or an addition (for residual layers).

FIG. 15.

The architecture of the best-performing configuration from the deep ensemble used in constructing EFIT-Prime, described in Sec. II. Only the architecture of the second model learning Jϕ (MJ), containing approximately 2 × 106 parameters, is shown. A box can represent a dense layer, an activation, or an addition (for residual layers).

Close modal
1.
L.
Lao
,
H. S.
John
,
R.
Stambaugh
,
A.
Kellman
, and
W.
Pfeiffer
, “
Reconstruction of current profile parameters and plasma shapes in tokamaks
,”
Nucl. Fusion
25
,
1611
1622
(
1985
).
2.
L.
Lao
,
J.
Ferron
,
R.
Groebner
,
W.
Howl
,
H. S.
John
,
E.
Strait
, and
T.
Taylor
, “
Equilibrium analysis of current profiles in tokamaks
,”
Nucl. Fusion
30
,
1035
1049
(
1990
).
3.
L. L.
Lao
,
H. E. S.
John
,
Q.
Peng
,
J. R.
Ferron
,
E. J.
Strait
,
T. S.
Taylor
,
W. H.
Meyer
,
C.
Zhang
, and
K. I.
You
, “
MHD equilibrium reconstruction in the DIII-D tokamak
,”
Fusion Sci. Technol.
48
,
968
977
(
2005
).
4.
Q.
Jinping
,
W.
Baonian
,
L. L.
Lao
,
S.
Biao
,
S. A.
Sabbagh
,
S.
Youwen
,
L.
Dongmei
,
X.
Bingjia
,
R.
Qilong
,
G.
Xianzu
, and
L.
Jiangang
, “
Equilibrium reconstruction in EAST tokamak
,”
Plasma Sci. Technol.
11
,
142
145
(
2009
).
5.
D.
O'Brien
,
L.
Lao
,
E.
Solano
,
M.
Garribba
,
T.
Taylor
,
J.
Cordey
, and
J.
Ellis
, “
Equilibrium analysis of iron core tokamaks using a full domain method
,”
Nucl. Fusion
32
,
1351
1360
(
1992
).
6.
Y.
Park
,
S.
Sabbagh
,
J.
Berkery
,
J.
Bialek
,
Y.
Jeon
,
S.
Hahn
,
N.
Eidietis
,
T.
Evans
,
S.
Yoon
,
J.-W.
Ahn
,
J.
Kim
,
H.
Yang
,
K.-I.
You
,
Y.
Bae
,
J.
Chung
,
M.
Kwon
,
Y.
Oh
,
W.-C.
Kim
,
J.
Kim
,
S.
Lee
,
H.
Park
,
H.
Reimerdes
,
J.
Leuer
, and
M.
Walker
, “
KSTAR equilibrium operating space and projected stabilization at high normalized beta
,”
Nucl. Fusion
51
,
053001
(
2011
).
7.
S.
Sabbagh
,
S.
Kaye
,
J.
Menard
,
F.
Paoletti
,
M.
Bell
,
R.
Bell
,
J.
Bialek
,
M.
Bitter
,
E.
Fredrickson
,
D.
Gates
,
A.
Glasser
,
H.
Kugel
,
L.
Lao
,
B.
LeBlanc
,
R.
Maingi
,
R.
Maqueda
,
E.
Mazzucato
,
D.
Mueller
,
M.
Ono
,
S.
Paul
,
M.
Peng
,
C.
Skinner
,
D.
Stutman
,
G.
Wurden
,
W.
Zhu
, and
N. R. Team
, “
Equilibrium properties of spherical torus plasmas in NSTX
,”
Nucl. Fusion
41
,
1601
1611
(
2001
).
8.
L. L.
Lao
,
S.
Kruger
,
C.
Akçay
,
P.
Balaprakash
,
T.
Bechtel
,
E.
Howell
,
J.
Koo
,
J.
Leddy
,
M.
Leinhauser
,
Y.
Liu
,
S.
Madireddy
,
J.
McClenaghan
,
D.
Orozco
,
A.
Pankin
,
D. P.
Schissel
,
S. P.
Smith
,
X.
Sun
, and
S.
Williams
, “
Application of machine learning and artificial intelligence to extend EFIT equilibrium reconstruction
,”
Plasma Phys. Controlled Fusion
64
(
7
),
074001
(
2022
).
9.
J.
Ferron
,
M.
Walker
,
L.
Lao
,
H. S.
John
,
D.
Humphreys
, and
J.
Leuer
, “
Real time equilibrium reconstruction for tokamak discharge control
,”
Nucl. Fusion
38
,
1055
1066
(
1998
).
10.
H.
Grad
and
H.
Rubin
, “
Hydromagnetic equilibria and force-free fields
,”
J. Nucl. Energy
7
,
284
285
(
1958
).
11.
V. D.
Shafranov
, “
On magnetohydrodynamical equilibrium configurations
,”
Sov. Phys. JETP
6
,
1013
(
1958
).
12.
S.
Joung
,
J.
Kim
,
S.
Kwak
,
J.
Bak
,
S.
Lee
,
H.
Han
,
H.
Kim
,
G.
Lee
,
D.
Kwon
, and
Y.-C.
Ghim
, “
Deep neural network grad–shafranov solver constrained with measured magnetic signals
,”
Nucl. Fusion
60
,
016034
(
2020
).
13.
J.
Wai
,
M.
Boyer
, and
E.
Kolemen
, “
Neural net modeling of equilibria in NSTX-U
,”
Nucl. Fusion
62
,
086042
(
2022
).
14.
S.
Joung
,
Y.-C.
Ghim
,
J.
Kim
,
S.
Kwak
,
D.
Kwon
,
C.
Sung
,
D.
Kim
,
H.
Kim
,
J. G.
Bak
, and
S. W.
Yoon
, “
GS-DeepNet: Mastering tokamak plasma equilibria with deep neural networks and the Grad–Shafranov equation
,”
Sci. Rep.
13
,
15799
(
2023
).
15.
J.
Lu
,
Y.
Hu
,
N.
Xiang
, and
Y.
Sun
, “
Fast equilibrium reconstruction by deep learning on EAST tokamak
,”
AIP Adv.
13
,
075007
(
2023
).
16.
R.
Shousha
,
J.
Seo
,
K.
Erickson
,
Z.
Xing
,
S.
Kim
,
J.
Abbate
, and
E.
Kolemen
, “
Machine learning-based real-time kinetic profile reconstruction in DIII-D
,”
Nucl. Fusion
64
,
026006
(
2024
).
17.
C.
Wan
,
Z.
Yu
,
A.
Pau
,
X.
Liu
, and
J.
Li
, “
EAST discharge prediction without integrating simulation results
,”
Nucl. Fusion
62
,
126060
(
2022
).
18.
C.
Wan
,
Z.
Yu
,
A.
Pau
,
O.
Sauter
,
X.
Liu
,
Q.
Yuan
, and
J.
Li
, “
A machine-learning-based tool for last closed-flux surface reconstruction on tokamaks
,”
Nucl. Fusion
63
,
056019
(
2023
).
19.
X.
Wei
,
S.
Sun
,
W.
Tang
,
Z.
Lin
,
H.
Du
, and
G.
Dong
, “
Reconstruction of tokamak plasma safety factor profile using deep learning
,”
Nucl. Fusion
63
,
086020
(
2023
).
20.
A.
Merlo
,
D.
Böckenhoff
,
J.
Schilling
,
U.
Höfel
,
S.
Kwak
,
J.
Svensson
,
A.
Pavone
,
S. A.
Lazerson
, and
T. S.
Pedersen
, “
Proof of concept of a fast surrogate model of the VMEC code via neural networks in wendelstein 7-X scenarios
,”
Nucl. Fusion
61
,
096039
(
2021
).
21.
D.
Kaltsas
and
G.
Throumoulopoulos
, “
Neural network tokamak equilibria with incompressible flows
,”
Phys. Plasmas
29
,
022506
(
2022
).
22.
R.
Egele
,
R.
Maulik
,
K.
Raghavan
,
B.
Lusch
,
I.
Guyon
, and
P.
Balaprakash
, “
Autodeuq: Automated deep ensemble with uncertainty quantification
,” in
2022 26th International Conference on Pattern Recognition (ICPR)
(
IEEE
,
2022
), pp.
1908
1914
.
23.
P.
Balaprakash
,
R.
Egele
,
M.
Salim
,
S.
Wild
,
V.
Vishwanath
,
F.
Xia
,
T.
Brettin
, and
R.
Stevens
, “
Scalable reinforcement-learning-based neural architecture search for cancer deep learning research
,” in
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
(ACM,
2019
), pp.
1
33
.
24.
Y.
Ovadia
,
E.
Fertig
,
J.
Ren
,
Z.
Nado
,
D.
Sculley
,
S.
Nowozin
,
J.
Dillon
,
B.
Lakshminarayanan
, and
J.
Snoek
, “
Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift
,” in 33rd Conference on Neural Information Processing Systems, arXiv:1906.02530 (NeurIPS, Vancouver, 2019).
25.
B. P.
van Milligen
,
V.
Tribaldos
, and
J. A.
Jiménez
, “
Neural network differential equation and plasma equilibrium solver
,”
Phys. Rev. Lett.
75
,
3594
(
1995
).
26.
T.
Elsken
,
J. H.
Metzen
, and
F.
Hutter
, “
Neural architecture search: A survey
,” arXiv:1808.05377 (
2019
).
27.
D.
Ashlock
,
Evolutionary Computation for Modeling and Optimization
(
Springer
,
2006
), Vol.
571
.
28.
P. A.
Vikhar
, “
Evolutionary algorithms: A critical review and its future prospects
,” in
2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC)
(IEEE,
2016
), pp.
261
265
.
29.
P. I.
Frazier
, “
A tutorial on bayesian optimization
,” arXiv:1807.02811 (
2018
).
30.
C. M.
Bishop
,
Pattern Recognition and Machine Learning
(
Springer
,
2006
).
31.
P.
Moritz
,
R.
Nishihara
,
S.
Wang
,
A.
Tumanov
,
R.
Liaw
,
E.
Liang
,
M.
Elibol
,
Z.
Yang
,
W.
Paul
,
M. I.
Jordan
et al, “
Ray: A distributed framework for emerging AI applications
,” in
13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18),
arXiv:1712.05889 (2017), pp.
561
577
.
32.
B.
Lakshminarayanan
,
A.
Pritzel
, and
C.
Blundell
, “
Simple and scalable predictive uncertainty estimation using deep ensembles
,” arXiv:1612.01474 (
2017
).
33.
M.
Abdar
,
F.
Pourpanah
,
S.
Hussain
,
D.
Rezazadegan
,
L.
Liu
,
M.
Ghavamzadeh
,
P.
Fieguth
,
X.
Cao
,
A.
Khosravi
,
U. R.
Acharya
,
V.
Makarenkov
, and
S.
Nahavandi
, “
A review of uncertainty quantification in deep learning: Techniques, applications and challenges
,”
Inf. Fusion
76
,
243
297
(
2021
).
34.
I.
Sobel
, “
An isotropic 3 × 3 image gradient operator
,” Presentation at Stanford A.I. Project 1968 (
2014
).
35.
M.
Raissi
,
P.
Perdikaris
, and
G.
Karniadakis
, “
Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
,”
J. Comput. Phys.
378
,
686
707
(
2019
).
36.
G. E.
Karniadakis
,
I. G.
Kevrekidis
,
L.
Lu
,
P.
Perdikaris
,
S.
Wang
, and
L.
Yang
, “
Physics-informed machine learning
,”
Nat. Rev. Phys.
3
,
422
440
(
2021
).
37.
“The present dataset covers 40% of 2019. Follow-up work will extend the training to the entire 2019 campaign as well as other years.” (a).
38.
O.
Meneghini
,
S.
Smith
,
L.
Lao
,
O.
Izacard
,
Q.
Ren
,
J.
Park
,
J.
Candy
,
Z.
Wang
,
C.
Luna
,
V.
Izzo
,
B.
Grierson
,
P.
Snyder
,
C.
Holland
,
J.
Penna
,
G.
Lu
,
P.
Raum
,
A.
McCubbin
,
D.
Orlov
,
E.
Belli
,
N.
Ferraro
,
R.
Prater
,
T.
Osborne
,
A.
Turnbull
, and
G.
Staebler
, “
Integrated modeling applications for tokamak experiments with OMFIT
,”
Nucl. Fusion
55
,
083008
(
2015
).
39.
“These two quantities are technically unknowns too, however, could easily be learned with another MLP as part of a set of salient discharge parameters as was done in Ref. 8.” (b).
40.
Y.
Liu
,
C.
Akçay
,
L. L.
Lao
, and
X.
Sun
, “
Surrogate models for plasma displacement and current in 3D perturbed magnetohydrodynamic equilibria in tokamaks
,”
Nucl. Fusion
62
,
126067
(
2022
).
41.
J.
McClenaghan
,
C.
Akçay
,
L. L.
Lao
, and
X.
Sun
, “
Augmenting machine learning of grad-shafranov equilibrium reconstruction with green's functions
,”
Phys. Plasmas
31
,
082507
(
2024
).
42.
Z.
Wang
,
A.
Bovik
,
H.
Sheikh
, and
E.
Simoncelli
, “
Image quality assessment: From error visibility to structural similarity
,”
IEEE Trans. Image Process.
13
,
600
612
(
2004
).
43.
J. L.
Luxon
and
B. B.
Brown
, “
Magnetic analysis of non-circular cross-section tokamaks
,”
Nucl. Fusion
22
,
813
(
1982
).
44.
F.
Alladio
and
F.
Crisanti
, “
Analysis of MHD equilibria by toroidal multipolar expansions
,”
Nucl. Fusion
26
,
1143
(
1986
).
45.
X.
Sun
,
C.
Akçay
,
T. B.
Amara
,
S. E.
Kruger
,
L. L.
Lao
,
Y.
Liu
,
S.
Madireddy
, and
J.
McClenaghan
, “
Impact of various DIII-D diagnostics on the accuracy of neural network surrogates for kinetic EFIT reconstructions
,”
Nucl. Fusion
64
,
086065
(
2024
).