This work presents a method for predicting plasma equilibria in tokamak fusion experiments and reactors. The approach involves representing the plasma current as a linear combination of basis functions using principal component analysis of plasma toroidal current densities ( ) from the EFIT-AI equilibrium database. Then utilizing EFIT's Green's function tables, basis functions are created for the poloidal flux ( ) and diagnostics generated from the toroidal current ( . Similar to the idea of a physics-informed neural network (NN), this physically enforces consistency between , , and the synthetic diagnostics. First, the predictive capability of a least squares technique to minimize the error on the synthetic diagnostics is employed. The results show that the method achieves high accuracy in predicting and moderate accuracy in predicting with median = 0.9993 and = 0.978, respectively. A comprehensive NN using a network architecture search is also employed to predict the coefficients of the basis functions. The NN demonstrates significantly better performance compared to the least squares method with median = 0.9997 and 0.9916 for and , respectively. The robustness of the method is evaluated by handling missing or incorrect data through the least squares filling of missing data, which shows that the NN prediction remains strong even with a reduced number of diagnostics. Additionally, the method is tested on plasmas outside of the training range showing reasonable results.
I. INTRODUCTION
Reconstructing magnetohydrodynamic equilibria from a series of diagnostic measurements is crucial in tokamak research and operations. This process yields invaluable insights into magnetic geometry, current, and pressure profiles, all of which are essential for tokamak data analysis, plasma stability, and control, as well as for validating codes and physics models. The EFIT code, widely utilized across various tokamaks, serves as a powerful tool for reconstructing equilibria. Consequently, a wealth of experimental equilibrium reconstruction data exists due to its extensive usage, with notable examples including DIII-D,1 EAST,2 JET,3 KSTAR,4 and NSTX.5 However, this method of equilibrium reconstruction is computationally intensive, making it infeasible for real-time operation. Recently, the application of machine learning techniques in equilibrium reconstruction has yielded promising results. By training machine learning models on a subset of data, it is now possible to accurately predict equilibria in a given system, thereby significantly reducing the computational cost associated with the reconstruction process. These techniques enable fast and precise equilibrium reconstruction, even for complex and high-dimensional systems.
Initial efforts to solve the equilibrium Grad–Shafranov (GS) equation (1) with machine learning utilized neural networks (NNs), as seen in the work by van Milligen et al.6 Early attempts for equilibrium reconstruction shape parameters with neural networks were performed on DIII-D7 and ASDEX.8 More recent endeavors have involved employing deep neural networks to solve the equilibrium GS equation using measured external magnetic signals in the KSTAR tokamak, building upon the offline EFIT9 equilibrium reconstruction results.10,11 Neural networks have also been trained for the DIII-D tokamak, showcasing promising predictive capabilities.12 Notably, there have been substantial refinements and improvements made to the equilibrium surrogates13 using a comprehensive neural architecture search (NAS).14 However, thus far, these studies have not leveraged the separation of the known external contributions and unknown internal contributions to the Grad–Shafranov equation, with the exception of recent work examining equilibrium in the NSTX tokamak.15 Considering the variability of external coils across different tokamaks, this step holds significant potential for training a robust, general neural network for the Grad–Shafranov equation.
In this paper, we explore leveraging machine learning to speed up the solution of the Grad–Shafranov equation taking advantage of the separation of the toroidal plasma current and the externally generated coil currents. Section II formulates writing of the linear combination of basis functions in toroidal current. Utilizing EFIT's Green's function tables, corresponding basis functions are generated for the poloidal flux and plasma diagnostics, ensuring that these quantities are consistent with each other. A linear least-squares minimization of the synthetic diagnostics is used to predict in Sec. III. Section IV shows neural networks trained with NAS to learn the equilibrium basis functions give a more accurate equilibrium prediction than the least-squares minimization when the equilibrium is in a well-trained parameter space. However, the trained neural network's predictive capability relies on there being no missing input data, which can be achieved by filling missing data with synthetic diagnostic data predicted from least-squares. Neural network predictions outside the training set are examined in Sec. V, showing reasonable but slightly worse than the least-square strategy predictions. Conclusions are given in Sec. VI.
II. FORMULATION
The external contribution to is easily calculated with a simple matrix multiplication. The computation of the plasma portion of requires an integral over the plasma volume and thus takes, by far, the longest amount of time to compute. Thus, if could be pre-computed as a set of basis functions, then the corresponding basis functions of along with the magnetic probe and flux loop signals from the plasma that generates could be also pre-computed. Using EFIT's Green's function tables in this way, the corresponding and share the same basis representation. This ensures that , and , and are all consistent with each other.
By removing the external contributions to , the prediction of becomes simpler and could potentially enable better generalization in solving the Grad–Shafranov equation. To highlight this, Fig. 1 shows example flux contours of a positive triangularity and a negative triangularity plasma and . Despite vastly different plasma shapes in , the produced flux patterns of are nearly indistinguishable by eye. However, is still an integral quantity of defined by the second term of Eq. (3). Thus, there is still some dependence on shape, especially in more complicated equilibria with transport barriers not in the DIII-D magnetics database.
To obtain the functions, we utilize the toroidal currents from 54 755 equilibria from the EFIT-AI magnetics database.12 A matrix of dimensions is constructed and subjected to a principle component analysis (PCA) using the sklearn.decomposition.PCA library, retaining the first n components. Additionally, there is a test database consisting of 14 093 equilibria, which are withheld from the PCA and NN training. As an example of what the PCA components look like, the first four components are plotted in Fig. 2.
First four PCA components are plotted. The solid lines are positive values and dashed on negative values.
First four PCA components are plotted. The solid lines are positive values and dashed on negative values.
Test database median between the true and PCA reconstruction of (blue) and (green) plotted the number of PCA components retained (i.e., maximum possible for the number of components).
Test database median between the true and PCA reconstruction of (blue) and (green) plotted the number of PCA components retained (i.e., maximum possible for the number of components).
To gain an intuitive understanding of the accuracy of a solution for a given , we present an example of a from DIII-D discharge 179 425 at 5640 ms plotted along with equilibria retaining 1, 2, 6, and 16 components, as shown in Fig. 4. The first PCA component yields an , and while the overall shape is roughly matched, the contours are visibly incorrect in both the core and near the edge of the plasma. At with two PCA components, the core is much more accurately represented, but the edge contour, which has a significantly lower current density, remains quite different from the equilibrium. With eight PCA components and an , both the core and edge are better predicted, while with 16 PCA components and an , the core is nearly indistinguishable from the original equilibrium and the edge contour is only slightly off near the separatrix. The jaggedness of the equilibrium could potentially limit the utility of a fast neural prediction with this method. To assess this, the -limit of a PCA decomposed equilibrium is explored in Appendix A.
Plotted in red is the reconstructed equilibrium retaining 1 (top left), 2 (top right), 8 (bottom left), and 16 (bottom right) PCA components for the DIII-D discharge 179 425 at 5640 ms. Plotted in black is the original .
Plotted in red is the reconstructed equilibrium retaining 1 (top left), 2 (top right), 8 (bottom left), and 16 (bottom right) PCA components for the DIII-D discharge 179 425 at 5640 ms. Plotted in black is the original .
III. PREDICTING EQUILIBRIA FROM LEAST SQUARES
The predictive capability of this least squares method is found to be limited in the number of PCA components that can be kept. The median of and predicted by the least squares vs the number of PCA components of and is plotted in Fig. 5. The s improve up to ten PCA components. Using additional components leads to a plateau in predictive capability with the median and , respectively for and at 32 components.
Least square predicted test database median (blue) and (green) vs the number of PCA components retained.
Least square predicted test database median (blue) and (green) vs the number of PCA components retained.
Utilizing this least squares prediction retaining six PCA components of and on the test database of 14 093 equilibria, gives excellent predictions of and moderate predictions of . Figure 6 shows histograms of for individual equilibrium and . Using the experimental diagnostic signals, we see a median and median . The predictive capability of this approach goes up moderately using the reconstructed signals with the median and median .
Histograms depicting the values for the predicted (upper panel) and (lower panel) obtained from the least squares, using 6 PCA components to represent on a dataset of 14 093 equilibria extracted from the DIII-D 2019 campaign. Blue bars depict predictions derived from experimental signal inputs and green bars depict predictions based on inputs reconstructed using EFIT.
Histograms depicting the values for the predicted (upper panel) and (lower panel) obtained from the least squares, using 6 PCA components to represent on a dataset of 14 093 equilibria extracted from the DIII-D 2019 campaign. Blue bars depict predictions derived from experimental signal inputs and green bars depict predictions based on inputs reconstructed using EFIT.
IV. PREDICTING EQUILIBRIA WITH NAS
To build fast, accurate, and robust NN surrogates, we use a comprehensive NAS to carry out a bi-level optimization of architecture at the outer level and hyperparameters at the inner level, based on the chosen general structure to predict the 32 basis coefficients of , with the loss being the mean squared error of the coefficients. The search space includes the maximum number of layers, type of operation each layer executes, and hyperparameters associated with each operation such as the initial learning rate and optimizer. Training of each architecture is performed with tensorflow. This approach ensures a thorough exploration of both parameters. For large datasets, NAS makes use of AgEBO, a framework combining Aging Evolution (AgE) rooted in the paradigm of evolutionary algorithms and asynchronous Bayesian Optimization (BO). Using NAS, we train a multitude of neural network configurations for 24 h on 4 GPU nodes on the Argonne National Laboratory cluster Swing to find the best models tailored to the reconstruction problem outlined above. A deep ensemble is then formed with the top five models to make predictions. The final predictions are the mean of the predictions made by each of the five models in the deep ensemble.
In this section, we use NAS to predict the coefficients of the PCA components for as an alternative to the least squares prediction. The results show a significant performance improvement compared to the least squares method. Scanning the number of PCA components kept in the NN training, it is found that the NN prediction does not degrade with increasing components like the least squares method did. Figure 7 shows the and of the test database plotted vs PCA component. In order to avoid the complication of missing or poor quality diagnostic data, any data with the EFIT weight set to zero is replaced with the EFIT reconstructed values for this first test. The best performance at 32 components has a median and the median predicted . The prediction of s increases with each additional component that is retained. However, the improvement in predictive capability increasing beyond eight components is weak and well below the maximum possible of Fig. 3. Details of the top performing model are shown in Appendix B. Other NAS predicted models have similar architectures to the top performing model.
The median for the NN-predicted (blue) and (green) vs the number of PCA components retained, aggregated over a test set containing approximately 14 000 equilibria.
The median for the NN-predicted (blue) and (green) vs the number of PCA components retained, aggregated over a test set containing approximately 14 000 equilibria.
The histograms of Fig. 8 illustrate the median predicted and values for the test database. Training a NAS NN on solely the reconstructed signals shows a strong improvement in prediction compared to the experimental signals with the median and the median predicted on the test database. To make the neural network robust against missing data, one possibility for training the neural network is dropping out the signals, as was done in.15 In this work, the least squares prediction of the Grad–Shafranov solution is robust to missing data, as (9) and (10) are multiplied by which can be set to zero where data are missing. This allows for another way to handle missing data, which is to utilize the synthetic diagnostic signals generated from the least squares method as inputs to the NN. The orange bars in Fig. 8 show the binned predicted from the neural net. Doing this gives a median predicted and the median predicted , only slightly lower than the s from the neural net trained with experimental signals filled with the EFIT reconstructed signals where data were missing.
Histograms depicting the values obtained from the 30 PCA component single layer neural net for variables (upper panel) and (lower panel) in the DIII-D test database, encompassing a dataset of 14 093 equilibria. Blue bars depict predictions derived from experimental signal inputs, which encompass zeroed channels that were unavailable during prediction. Green bars depict predictions based on inputs reconstructed using EFIT, and orange bars depict predictions using experimental signals with zeroed channels filled by the least squares predicted inputs.
Histograms depicting the values obtained from the 30 PCA component single layer neural net for variables (upper panel) and (lower panel) in the DIII-D test database, encompassing a dataset of 14 093 equilibria. Blue bars depict predictions derived from experimental signal inputs, which encompass zeroed channels that were unavailable during prediction. Green bars depict predictions based on inputs reconstructed using EFIT, and orange bars depict predictions using experimental signals with zeroed channels filled by the least squares predicted inputs.
Next, we evaluate the accuracy of the models by comparing the synthetic signals predicted by the models with those from the magnetic diagnostics. Using Green's function tables to compute synthetic signals, we then calculate the total of the diagnostic signals, a measure utilized in EFIT for assessing the accuracy of an equilibrium reconstruction.9 Figure 9 shows the median of the test database plotted against the number of retained PCA components. The NAS NN predicts an average of a similar magnitude as that obtained from the PCA of the solution. In contrast, the least squares method predicts a lower compared to the PCA decomposition, confirming overfitting to the experimental diagnostics beyond six PCA components.
Median predicted plotted against the number of PCA components. The PCA decomposition calculation is in black, least squares method in blue, and NAS NN in green.
Median predicted plotted against the number of PCA components. The PCA decomposition calculation is in black, least squares method in blue, and NAS NN in green.
The predictive capability of the least-square filled NN remains strong even when the number of magnetic probes and flux loops is reduced. To assess this, we randomly replace the diagnostic signals with the least squares-predicted synthetic diagnostic signals to determine the point at which the neural net prediction starts to degrade. Figure 10 illustrates the median predicted value as a function of the number of removed diagnostics and their replacement with the least squares predicted synthetic diagnostic. When half of the diagnostics are removed, the median predicted exhibits an value of 0.999, while the median predicted shows an value of 0.98. It is only when the number of diagnostics is reduced to 10 out of the original 120 that the predictive capability significantly deteriorates, resulting in and a median predicted .
NAS NN prediction of and characterized by their median aggregated over the test set as a function of the number of diagnostics.
NAS NN prediction of and characterized by their median aggregated over the test set as a function of the number of diagnostics.
V. EXAMINING THIS NN PREDICTION OUTSIDE OF THE TRAINING SET DISTRIBUTION
Going outside of the training set distribution, the trained NN in this paper can still predict reasonable . However, the least squares prediction performs slightly better. To highlight this, negative triangularity data are withheld from the training set. The PCA components are re-computed and NAS NN is retrained on a curated training set that holds out all shots with negative triangularity. Then, we carry out inference on a test set comprising 7011 negative triangularity equilibria to predict and from these negative triangularity plasmas. The NN gives a reasonable prediction with a median and . Figure 11 is the average plasma triangularity calculated from the predicted vs EFIT reconstructed last closed flux surface. The NN can predict the triangularity well at low negative , but as triangularity is decreased further to , the NN begins predicting stronger negative- than the experiment. Since most of the structure of the poloidal flux in negative triangularity is due to the poloidal field coils as noted in Fig. 1, even with moderate predictive capability of , the NN can predict the triangularity reasonably well. However, using just least squares prediction with the negative triangularity withheld PCA components predicts better matching the EFIT , and with a median and a median . Interestingly, both methods tend to under-predict the triangularity at and over-predict it at on average. One might naively expect the predictions to worsen as triangularity becomes more negative. A potential reason for this discrepancy is that plasmas with more negative are not necessarily much closer to typical positive triangularity plasmas. Many of the negative triangularity discharges from 2019 have nearly zero lower triangularity, as they were diverted onto the DIII-D shelf.17
The average triangularity of the last closed flux surfaces calculated from the least squares (top) and NN (bottom) vs that calculated from the EFIT-reconstructed (true) aggregated over 7011 negative triangularity equilibria.
The average triangularity of the last closed flux surfaces calculated from the least squares (top) and NN (bottom) vs that calculated from the EFIT-reconstructed (true) aggregated over 7011 negative triangularity equilibria.
Another advantage of separating out the external coils is that it allows for the use of the PCA basis functions to train a neural network to predict for another tokamak. For example, a neural network can be trained to predict for ITER, only using the PCA components of from DIII-D. To do this, care must be taken to construct a set of Green's function tables with a grid size so that the ITER plasma shape is well aligned with the database plasma shapes. A new training database is created using ITER Green's function tables to generate synthetic ITER magnetic probes, flux loops, and signal data which comes from the . Additionally, new basis functions for and are computed based on the ITER's Green's function tables. As ITER is not yet online, the EFIT reconstructed external coil currents are used for isolating the plasma contributions to each diagnostic signal. A neural network can be trained to predict the strength of each component based on a given probe, loop, and signal from the plasma. Figure 12 shows the least squares and NN predicted contours of and for an ITER 15 MA L-mode equilibrium, along with the original equilibrium quantities. The prediction is reasonable but still requires further improvement. NN gives a median and a median . However, like the negative-triangularity-withheld case, the least squares strategy yields better predictions with the last closed flux surface better matched and a median and . While the NN performs significantly better when within the training set, the least squares method is more resilient when outside, as it is directly constrained by the diagnostics signals. Additionally, we note that while the components are primary to , they are not necessarily primary to and and could potentially be more sensitive to the higher order components. This suggests that a possible way to improve the robustness of the NN may be to add the error of the synthetic signals to the loss function.
Predicted (red) and original (black) equilibrium contours of (left) and (right) for an ITER 15 MA L-mode equilibrium. The prediction from least-squares is on top, and the NN prediction is on the bottom.
Predicted (red) and original (black) equilibrium contours of (left) and (right) for an ITER 15 MA L-mode equilibrium. The prediction from least-squares is on top, and the NN prediction is on the bottom.
VI. CONCLUSIONS
This paper presents a method for predicting the plasma current and poloidal magnetic flux in a tokamak fusion device using a combination of least squares and neural network techniques. The approach presented here exploits the separation of the total poloidal flux into a known external contribution due to the poloidal field coils and an unknown internal contribution due to the plasma current. The external contribution can be computed using the pre-computed Green's function tables. This leaves the remaining task of determining the response due to the unknown plasma current, which here is represented as a linear combination of basis functions, allowing for rapid computation of the poloidal flux.
The results show that the least squares method provides excellent predictions of the poloidal magnetic flux and moderate predictions of the plasma current. The method is robust to missing or incorrect data and can handle uncertainties in experimental measurements. However, its predictive capabilities is limited. The neural network approach using NAS, on the other hand, shows a performance improvement compared to the least squares method when reconstructing signals. The neural network demonstrates high predictive capabilities for and moderate predictive capabilities for . However, the neural network has not been trained to handle missing or incorrect data.
By combining the least squares method and the neural network, it is possible to fill in missing data and enhance the robustness of the prediction in a real-time application. The least squares prediction can be used to generate synthetic diagnostic signals, which can then be used as inputs to the neural network. This approach provides strong predictive capabilities even with a reduced number of magnetic probes and flux loops.
Furthermore, the paper examines the extrapolation of the neural network prediction outside the training set and compares it to the least squares prediction. The results show that the least squares prediction performs better when extrapolating to new plasma configurations, such as from positive triangularity to negative triangularity and from DIII-D to ITER.
ACKNOWLEDGMENTS
The authors would like to thank C. K. Lau for his insights and stimulating discussions. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility, under Awards Nos. DE-SC0021203, DE-FC02-04ER54698, and DE-FG02-95ER54309. Part of the data analysis was performed using the OMFIT integrated modeling framework.18
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
J. McClenaghan: Conceptualization (lead); Data curation (equal); Investigation (lead); Methodology (lead); Writing – original draft (lead). C. Akcay: Investigation (supporting); Methodology (supporting); Writing – review & editing (supporting). T. B. Amara: Data curation (lead); Investigation (supporting); Methodology (supporting); Writing – review & editing (supporting). X. Sun: Investigation (supporting); Methodology (supporting). S. Madireddy: Investigation (supporting); Methodology (supporting). L. L. Lao: Investigation (supporting); Methodology (supporting); Project administration (lead); Writing – review & editing (supporting). S. E. Kruger: Investigation (supporting); Methodology (supporting). O. M. Meneghini: Investigation (supporting); Methodology (supporting); Supervision (supporting).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX A: IDEAL MHD STABILITY OF PCA DECOMPOSED EQUILIBRIUM
To explore the utility of a predicted plasma retaining only a finite number of PCA components of . An initial equilibrium of an example discharge 180 631 at 4000 ms. A scan in is performed holding the flux surface average of fixed using the OMFIT18 EFIT module. Next, a linear ordinary least squares fit to the PCA components. The poloidal flux is then reconstructed retaining a given set of PCA components and . Using OMFIT fluxSurfaces class, quantities derived from such as the safety factor and normalized toroidal flux are computed, and an equilibrium gEQDSK file is generated. Ideal MHD is simulated using DCON19 at the no-wall limit. A summary of the simulations is shown in Fig. 13. The original equilibrium has a predicted stability limit of . With less than eight components, the -limit is significantly under-predicted at . From 8–24 components, the stability limit is slightly over-predicted with at eight components and at 16 components. At 24 PCA components and beyond, the predicted -limit agrees with the full equilibrium solution.
DCON predicted no-wall MHD stability metric for toroidal mode number . The solid black line is , and represents the transition from stability to instability.
DCON predicted no-wall MHD stability metric for toroidal mode number . The solid black line is , and represents the transition from stability to instability.
APPENDIX B: TOP MODEL ARCHITECTURE FOR EXPERIMENTAL NAS NN
This appendix details the top-performing multilayer perceptron model identified through NAS. NAS generated a total of 450 configurations for the experimental diagnostic dataset. The optimizers tested in the search included sgd, rmsprop, adagrad, adam, adadelta, adamax, and nadam. Initial learning rates ranged from to 0.1 on a logarithmic scale. The best model utilizes the adam optimizer and contains 26 896 weights and biases. The configuration of this model is illustrated in Fig. 14. The diagram shows multiple sequential dense layers with one branching point. Training this model took approximately 700 s. It employs an adaptive learning rate, starting at 0.0015 and decreasing to . The model predicts 32 basis coefficients of the . Each node in the graph represents a layer in the neural network. The architecture includes a skip connection following the first dense layer. The final layer provides the mean and standard deviation predictions for each ensemble.
The architecture of the best-performing configuration from the deep ensemble is used to predict 32 basis functions.
The architecture of the best-performing configuration from the deep ensemble is used to predict 32 basis functions.