The optimization of scenarios and design of real-time-control in tokamaks, especially for machines still in design phase, requires a comprehensive exploration of solutions to the Grad–Shafranov (GS) equation over a high-dimensional space of plasma and coil parameters. Emulators can bypass the numerical issues in the GS equation, if a large enough library of equilibria is available. We train an ensemble of neural networks to emulate the typical shape-control targets (separatrix at midplane, X-points, divertor strike point, flux expansion, and poloidal beta) as a function of plasma parameters and active coil currents for the range of plasma configurations relevant to spherical tokamaks with a super-X divertor, with percent-level accuracy. This allows a quick calculation of the classical-control shape matrices, potentially allowing real-time calculation at any point in a shot with submillisecond latency. We devise a hyperparameter sampler to select the optimal network architectures and quantify uncertainties on the model predictions. To generate the relevant training set, we devise a Markov-chain Monte Carlo algorithm to produce large libraries of forward Grad–Shafranov solutions without the need for user intervention. The algorithm promotes equilibria with desirable properties, while avoiding parameter combinations resulting in problematic profiles or numerical issues in the integration of the GS equation.
NOMENCLATURE
- fvac
-
in vacuum
-
Array of poloidal-field coil currents (A)
- Ip
-
Plasma current (MA)
- Isol
-
Solenoid current (A)
- pa
-
Plasma pressure at magnetic axis (kPa)
- R
-
Major radius (from tokamak axis) (m)
- Z
-
Vertical coordinate (from tokamak midplane) (m)
-
Array of plasma- and coil-parameters of a given equilibrium
-
Array of target quantities derived from a GS equation solution
I. INTRODUCTION
The Grad–Shafranov (GS) equation is ubiquitous in the description of magneto-hydrodynamical (MHD) equilibria, from tokamaks to astrophysical plasmas.1,2 It is a nonlinear, partial-differential equation with nontrivial boundary conditions, and its accurate solution is necessary in order to describe the current-density and pressure profiles of a plasma in a magnetic field. In the rest of this paper, we concentrate on issues inherent the use of the Grad–Shafranov equation in tokamak plasmas, and how they can be alleviated by the use of suitably constructed equilibrium libraries and emulator architectures. Several tokamak designs are being developed in order to bring magnetic-confinement fusion to net power generation. In preparation for energy production from tokamaks, such as STEP3 and SPARC,4,5 precursor machines have been in operation to probe fundamental physics, different operational regimes, and control strategies (e.g., JET,6 ASDEX-U,7 NSTX-U,8,9 MAST-U,10–12 DIII-D,13 TCV,14 Globus-M,15,16 and ST4017,18). In a tokamak, equilibria are established on s Alfvén timescales that are much faster than those of the typical current drives and dissipation, so the plasma can indeed be modeled as a sequence of MHD equilibrium solutions.1,19
Tokamak fusion experiments require real-time control of the burning plasma and of its exhaust. A range of physics-based algorithms have been designed to control the plasma's shape targets, its profile, and its exhausts, ultimately to operate the machines reliably and attain longer confinement times. On machines with a large number of campaigns, the experiment data themselves have been used to build emulators of the control targets and predict the state corresponding to a set of actuator requests,20–23 bypassing the use of expensive and sometimes numerically unstable PDE solvers. This route, however, requires large data volumes and is not viable for newer and upcoming machines. There, shot design typically proceeds by constructing sequences of desired equilibria and designing classical (linear) control for selected regimes. Here, we propose to complement this with physics-based emulation. Based on a set of synthetic equilibria covering the whole operational space of a tokamak, we present strategies to build emulators of the plasma and magnetic field configurations, with the aim of facilitating and accelerating scenario design and control. The same approach is also applied to the exploration of different magnetic configurations that can be used in studies of divertor detachment. For the reader's convenience, Nomenclature section summarizes the symbols and notations chosen. Throughout this paper, we use underlined symbols to denote multidimensional arrays of parameters.
A. Classical control
The objective of real-time control is to find changes in the array of requested shaping-coil currents, on top of a pre-programmed coil sweep, that correspond to a desired change in the array of targets of interest. This is done by considering the SVD pseudo-inverse of the sensitivity matrix yielding virtual circuits with the desired linear combinations of coil-current changes.26–29 Once the desired virtual circuits are obtained, they are used as “requested” coil current inputs in classical closed-loop control.19 The details of the chosen targets are specific to each tokamak, but are all broadly related to geometric or poloidal magnetic flux constraints on the shape of the separatrix and its extension into the divertor region. The sensitivity matrix is an inherently local quantity, as it is computed around one given equilibrium. In practice, computing a sensitivity matrix around any foreseen equilibrium configuration in a shot is computationally prohibitive. Therefore, at experiment design, the coil “wave-forms” are constructed by prescribing the plasma shape and divertor strike points, and the virtual circuits are computed for a finite set of expected configurations along these trajectories, using sensitivity matrices calculated by finite differences. The desired shape and strike point may be computed at different values of the Ohmic circuit currents, if they have significant stray field (as e.g., on MAST-U), otherwise different sets of virtual circuits can be designed to control the main shape of the plasma and the divertor.
B. Divertor studies
Modern research tokamaks are endowed with a divertor, to remove impurities and manage the high heat flux.7,12,30–33 The transport of heat along a flux tube can be generally captured, e.g., by the so-called “two-point model,” which explicitly demonstrates the importance of flux expansion and connection length.30,34,35 Refined analyses can be performed with more complete and multi-species descriptions of transport along field lines, or in the full meridional plane.36–39 The design and control of divertors is an active field of research, but one overall objective is the attainment of high flux expansion, i.e., the widening of a flux tube from the scrape-off layer toward the machine tiles.30,40 Finding magnetic configurations that can provide long connections and high flux expansions, compatibly with the operational limits and plasma shape constraints, is an integral part of scenario design.
C. This paper
From the above examples, two general issues emerge: (i) is it possible to quickly obtain magnetic configurations and targets of interest at any point over a wide range of control parameters? and (ii) is it possible to traverse the operational space to search for desirable configurations in the presence of non-trivial boundaries (as e.g., resulting from the need for a diverted plasma)?
In order to minimize manual intervention and also enable uncertainty quantification in follow-up work, we develop smooth emulations of targets of interest over the parameter range, that in principle can be deployed with submillisecond latency. This in turn requires a library of synthetic equilibria, for which we develop an efficient sampling algorithm. The synthetic equilibria, numerical challenges, and the resulting sampling algorithm are discussed in Sec. II. The emulation is discussed in Sec. III, including a sampling algorithm to search hyperparameter space. All results are presented in Sec. IV, and implications are discussed in Sec. V.
We remark that all equilibria in this work are synthetic, i.e., they are computed assuming a perfectly known state of tokamak and plasma, and none are from a reconstruction of experiment shots. This also allows us to control the sources of uncertainty in the emulation, which only come from the numeric of the equilibrium construction and emulation procedures, without any systematics from the profile reconstruction. The equilibria used in this work are computed on a tokamak that resembles the MAST-U machine configuration, although with different coil positions and windings (displayed in Fig. 1). The full machine description is given in the public FreeGS code repository.41 While we are interested in solutions that can be applied to the real MAST-U, here we deemed appropriate to use a synthetic machine as it is fully specified (no measurement uncertainties), while retaining the general complexity of a real-life tokamak. All of the profiles in this work are diverted, i.e., their last-closed flux surface within the main chamber has at least one X-point. This is because, at least on MAST-U, the plasma can be brought from a limited to a diverted configuration shortly after the breakdown phase.
Tokamak configuration for this study, roughly based on MAST-U, and first synthetic equilibrium. The separatrix ( ) is marked in red. Left: names of main shape targets computed and considered for the subsequent emulation. Right: divertor coils (D1–D7, Dp) and shape coils (P4–P6). All coils are up-down symmetric except P6, whose winding is up-down antisymmetric and is used for vertical position control. The divertor tiles are marked by thick black lines, and the divertor “nose” and “baffle” are indicated. The tokamak coil positions and windings are given by the MASTU tokamak class in the public FreeGS repository.
Tokamak configuration for this study, roughly based on MAST-U, and first synthetic equilibrium. The separatrix ( ) is marked in red. Left: names of main shape targets computed and considered for the subsequent emulation. Right: divertor coils (D1–D7, Dp) and shape coils (P4–P6). All coils are up-down symmetric except P6, whose winding is up-down antisymmetric and is used for vertical position control. The divertor tiles are marked by thick black lines, and the divertor “nose” and “baffle” are indicated. The tokamak coil positions and windings are given by the MASTU tokamak class in the public FreeGS repository.
II. SYNTHETIC DATASET
The plasma equilibrium configurations are described by the Grad–Shafranov (GS) equation, constrained by the choice of global parameters for the plasma and coil currents, as discussed in Sec. II A. The library of synthetic equilibria is built through a random walk in parameter space as discussed in Sec. II B, starting from one equilibrium with prescribed shape. To solve for the flux-function of the synthetic equilibria, we use the publicly available FreeGS codebase42 (Dudson et al. in prep.) and the static GS solver presented by Amorisco et al.43 Wherever needed, we build additional functionality as described in the rest of the paper.
The procedure to build the equilibrium library is devised in order to address two specific issues. First, due to the nonlinearities of the GS equation, its integration converges more often if starting from a “reasonable” guess of the solution, which is available if a new trial equilibrium parameters are sampled close to those of a previously computed equilibrium. Second, a good emulation of the control targets is needed primarily in regimes with more desirable shape properties, and as the dimensionality of the parameter space increases, more naïve Monte Carlo methods (e.g., acceptance-rejection, Latin hypercube, pseudo-random) will often sample equilibria far away from those with more desirable properties.
Several conventions are used in the literature for the flux-function ψ, here we define it such that the poloidal magnetic field satisfies . Additionally, fvac denotes the amplitude in vacuum, ψa and ψb are ψ on the magnetic axis and at the boundary, and . We will work with free-boundary problems, where ψ is to be determined together with a toroidal current density , and the plasma boundary is defined as the surface where ψ equals the one at the closest X-point to the magnetic axis.
A. Grad–Shafranov solutions
Given the physics of the problem, the shape targets should only depend on the ratio of coil currents to plasma current, and on with the only caveat that the coil currents (rather than their ratio to Ip) must be kept within some operational limits. The first equilibrium is described by the constraints in Table I and shown in Fig. 1, which also introduces some of the nomenclature. The procedure to build the library, by varying the coil currents, plasma current and pressure on axis, uses the Newton-Krylov solver for the forward GS problem and is described below.
Parameters and constraints used to obtain the first synthetic equilibrium (top, shown in Fig. 1), and the resulting PF coil currents from the inverse GS solution (bottom). All equilibria except the first one are obtained by integrating the forward GS problem under small changes of the coil + solenoid currents and plasma parameters. All coil currents are meant as single-turn currents; the number of turns of each coil for this machine description is publicly available through the “MASTU” tokamak class in the FreeGS code repository.
Constraints/parameters . | Initial values . | Limits . |
---|---|---|
X-point (R,Z) (m) | (0.58,1.189) (0.58,1.185) | |
isoflux (R,Z) (m) | (0.327,0) (1.4475,0), | |
(0.9725, 1.0017) (0.9892, −0.9993), | ||
(0.5035, 0.9997) (0.5081, −1.0018) | ||
Ip (MA) | 0.748 | [0.1, 1.125] |
fvac | −0.475 | ±0.75 |
pa (kPa) | 9.59 | [1.0, 100.0] |
Isol (A) | −1400 | ±55 |
PF currents (single-turn) | Initial values from inverse GS | Limits |
IPX (A) | 2102.4 | ±3.5 |
(A) | 7130.0 | ±11 |
(A) | 1816.1 | ±6.4 |
(A) | 371.3 | ±6.4 |
IDp (A) | −2029.3 | ±6.4 |
(A) | −1022.7 | ±4.0 |
(A) | −320.3 | ±3.2 |
(A) | −957.4 | ±4.6 |
(A) | −2391.3 | ±11 |
(A) | −4377.8 | ±11 |
(A) | 38.2 | ±3.5 |
Constraints/parameters . | Initial values . | Limits . |
---|---|---|
X-point (R,Z) (m) | (0.58,1.189) (0.58,1.185) | |
isoflux (R,Z) (m) | (0.327,0) (1.4475,0), | |
(0.9725, 1.0017) (0.9892, −0.9993), | ||
(0.5035, 0.9997) (0.5081, −1.0018) | ||
Ip (MA) | 0.748 | [0.1, 1.125] |
fvac | −0.475 | ±0.75 |
pa (kPa) | 9.59 | [1.0, 100.0] |
Isol (A) | −1400 | ±55 |
PF currents (single-turn) | Initial values from inverse GS | Limits |
IPX (A) | 2102.4 | ±3.5 |
(A) | 7130.0 | ±11 |
(A) | 1816.1 | ±6.4 |
(A) | 371.3 | ±6.4 |
IDp (A) | −2029.3 | ±6.4 |
(A) | −1022.7 | ±4.0 |
(A) | −320.3 | ±3.2 |
(A) | −957.4 | ±4.6 |
(A) | −2391.3 | ±11 |
(A) | −4377.8 | ±11 |
(A) | 38.2 | ±3.5 |
B. Sequential sampling of synthetic equilibria
This walker update rule, which makes the step size adaptive (i.e., dependent on the local steepness of the score ), is used only after 100 steps of pure MH sampling. A uniform distribution is preferred to a normal distribution for the steps, because it samples more efficiently step sizes above the variance while also truncating steps that would be too large and therefore challenging for the convergence of the Grad–Shafranov integration. If the Grad–Shafranov integration fails to converge, the walker is restarted from one of the equilibria in the chain, chosen at random.
Strictly speaking, stepGW is not the same as in the Goodman and Weare sampling algorithm, but it has the advantage that the chain is parameterized by a single walker, which eases the task of indexing the output configurations . Even though this MCMC may not necessarily satisfy the detailed balance conditions, it results in an efficient exploration.
III. EMULATION
We examine two solutions to obtain fast fitting functions to the targets computed on synthetic equilibria, that can also enable extrapolation and uncertainty quantification: scaling relations and neural networks. All of the neural networks examined in this work are built with the TensorFlow python APIs. We devote a separate subsection to the design of an annealed hyperparameter sampler, which is devised on purpose for this work to produce an ensemble of emulators.
A. Scaling relations
B. Neural networks
Neural networks (hereafter NNs) are amenable to the task of emulation because: (1) Universal Approximation theorems make them appealing candidates in general,; (2) they can also be used to extrapolate in regions of parameter space that are less densely sampled; (3) off-the-shelf packages can compute all derivatives analytically, overcoming issues of numerical noise; (4) one can build simple enough architectures that can be deployed with low latency on a real-time controller; (5) uncertainties can be quantified whenever an ensemble of neural-network emulators has been trained, on different random subsamples of a dataset or with different hyperparameters.
While universal approximation theorems are known, three remarks will be useful for this work. First, a neural network with sigmoid activations is a universal approximator of continuous functions over a compact domain, so any target that may diverge close to the domain boundary or with a discontinuity would need a suitable reparameterization of the inputs and outputs. This is true also for networks with a rectified linear unit activation also as approximators to Lp functions in the Lp norm, which provides only a global goodness-of-fit measure whenever the functions have discontinuities. This is the case, e.g., for the connection length, which can change abruptly as soon as the scrape-off layer grazes the divertor nose region. Another example is the strike-point location, which can be quite sensitive to small changes in Rout or λSOL and can exhibit a sudden discontinuity (primarily in Z) whenever the divertor coils create another X-point in the divertor region. Third, while there are known lower bounds to the network width that is required for universal approximation, there is no known lower bound on the depth (nor a known upper bound on depth and width) for finite datasets or prescribed fitting tolerance, so an extensive exploration of the network architecture is needed.
Figure 2 summarizes the nomenclature and the choices of architecture used in this work.
Nomenclature and general architecture of the NNs considered in this work. All have one input layer with identity activation one dense layer, then d − 1 layers (all dense or all redisual, with the same activation), one dense and one affine layer with the same shape as the outputs. The user can choose the activation function among a sigmoid, a ReLU, a leaky ReLU, an eLU, and a SReLU [see Eq. (17)]. The depth, width, learning rate, L2 regularization, and gradient batch size are all hyperparameters to be explored.
Nomenclature and general architecture of the NNs considered in this work. All have one input layer with identity activation one dense layer, then d − 1 layers (all dense or all redisual, with the same activation), one dense and one affine layer with the same shape as the outputs. The user can choose the activation function among a sigmoid, a ReLU, a leaky ReLU, an eLU, and a SReLU [see Eq. (17)]. The depth, width, learning rate, L2 regularization, and gradient batch size are all hyperparameters to be explored.
With a (large but) finite dataset, wider or deeper networks are more prone to overfitting, so regularization terms must also be introduced. A quadratic regularization is added to the loss function, which in turn can be a mean-squared error or a mean-absolute error. The hyperparameter exploration procedure is described below, together with the choices made to split the data into training, evaluation, and test sets.
C. Hyperparameter tuning
The chosen NN architecture has several hyperparameters to be tuned: depth (how many hidden layers), width (how many nodes per hidden layer), batch size, learning rate and regularization . A grid search is expensive and prone to arbitrary choices in e.g., how one would set the grid mesh in lr and . A “random search” with uniform draws is less expensive, but still rather noisy and hence inefficient. Ideally, we want a sampler that already knows about islands of hyperparameter space that perform better, but with, e.g., a MCMC we would discard an arbitrary number of computed models. Off-the-shelf packages for a “Bayesian” hyperparameter search exist; however, they are not necessarily suited to this particular problem. Population-based training47 replaces the worst performing models with mutations of the best-performing ones, and so it is mostly useful for the scheduling of lr and while in principle it cannot resample models with varying width and depth. Hyperband48 starts from a random draw of hyperparameters and progressively prunes the worst performing models in favor of training the best performers for more epochs. So-called Bayesian optimizers, like Optuna,49 use a density estimator to approximate the loss vs hyperparameters and therefore inform the next hyperparameter draw, however this neglects the inherent scatter in losses (even on the same hyperparameter tuple) from different initial NN weights and different random splits in training and validation sets.
From the synthetic dataset, a random 20% has been set aside as a test set, over which the performance of the sampled models is evaluated at the end of the hyperparameter exploration. Of the remaining 80%, whenever a new model is instantiated, a random 20% (i.e., 16% of the original data) is taken as a validation set for early stopping. For each hyperparameter trial, the training is run until itm such that and . The final loss Lj for hj is then . During training, the neural-network gradients are evaluated on mini-batches, whose size is also one of the hyperparameters. The choice of a random validation subset for every new model instantiation is made to render the hyperparameter search robust against specializing on a particular training-validation split.
IV. RESULTS
Approximately thirty thousand synthetic profiles are computed, together with a number of targets and other integrated properties for each profile. The NN emulation and hyperparameter sampling is done on a subset of the computed targets, while others are provided with the synthetic library as they can be useful for diagnostic purpose, e.g., to select particular equilibria for divertor detachment simulations. The distribution of parameter values explored by the MCMC sampler is shown in Fig. 6 in the Appendix.
A. Synthetic libraries
Figure 3 summarizes the distribution of the shape targets computed on the synthetic equilibria. Solutions with a strike point on the divertor “nose” or “baffle” are highly penalized in the library generation but are not rejected, in order for the subsequent emulators to learn which parameter combinations can lead to unfavorable divertor conditions. Histograms of the separatrix midplane inboard and outboard radii, as well as the magnetic axis radius, are also shown. The sharp cutoff at m is where the limiter of the inner column is touched. The distribution of (lower) X-point locations is merely reflective of the score function privileging configurations with the strike point hitting the bottom divertor tiles. The connection length can exhibit a sharp drop whenever the SOL grazes the divertor nose, mostly due to equilibrium configurations with large outboard separatrix radius, and long tails due to configurations where an additional X-point is created in the divertor region.
Distribution of the shape targets computed on the library of synthetic profiles. In the lower panel, the black symbols indicate the strike-point positions (a majority is on the lower tiles), which result in the distribution of allowed X-point positions (red swarm). The green and points around Z = 0 mark the range of inner/outer separatrix radii and magnetic axis, respectively. Their distribution is shown in the histograms in the top panel.
Distribution of the shape targets computed on the library of synthetic profiles. In the lower panel, the black symbols indicate the strike-point positions (a majority is on the lower tiles), which result in the distribution of allowed X-point positions (red swarm). The green and points around Z = 0 mark the range of inner/outer separatrix radii and magnetic axis, respectively. Their distribution is shown in the histograms in the top panel.
B. Scaling relations
Other targets have very uncertain fits, in particular Rin and . The behaviors with are the most uncertain and they change at high solenoid currents, reflecting the role of different coil combinations in keeping the separatrix within the vessel for different strengths of the solenoid stray field. These scaling relations can be used to aid the real-time reconstruction of pa (or equivalently βp and qa) with a good first guess from the magnetic reconstruction of Rout and Ra. However, they are not necessarily accurate enough to be used directly in control design, especially because the dependence on solenoid current and in particular, because the inboard midplane radius Rin is not well fit.
C. Neural networks
Figure 4 illustrates the hyperparameter exploration, quantified by the validation loss. Shallow but wide network architectures are preferred. A leaky ReLU provides the best results, as it does not have vanishing gradients. In principle, a standard ReLU activation would perform equally well, but that requires wider hidden layers, in order to reproduce enough ReLU combinations to resemble a leaky ReLU and prune nodes with vanishing gradients.
Hyperparameter exploration. The models tend to privilege shallow networks with nodes per hidden layer.
Hyperparameter exploration. The models tend to privilege shallow networks with nodes per hidden layer.
Figure 5 shows test loss metrics and calibration plots for each target, for a subset of the best-performing models. The calibration plots are displayed in terms of standardized targets. The metrics are computed on a separate subset, which was not used during training and early stopping. The test metrics are defined as
- rms relative error:
- mean-absolute-deviation over std.dev:
-
relative bias:
Results of training and hyperparameter exploration when fitting on geometric targets and integrated quantities. Top: Test metrics on a sample of best-fitting models and on their aggregate prediction. Bottom: standardized outputs vs those predicted by the NN emulation, averaged over the best models. The label BX_over_Btar denotes the magnetic flux expansion between the lower X-point and the strike point. The selected models in the legend are labeled by their index in the hyperparameter exploration followed by the total . Many of the sampled models have a small lossval (Fig. 4), but a very strict cut on the six very-best models (indicated in the top-right legend) is already enough to yield sub-percent bias while keeping the overall ensemble model small enough for real-time applications.
Results of training and hyperparameter exploration when fitting on geometric targets and integrated quantities. Top: Test metrics on a sample of best-fitting models and on their aggregate prediction. Bottom: standardized outputs vs those predicted by the NN emulation, averaged over the best models. The label BX_over_Btar denotes the magnetic flux expansion between the lower X-point and the strike point. The selected models in the legend are labeled by their index in the hyperparameter exploration followed by the total . Many of the sampled models have a small lossval (Fig. 4), but a very strict cut on the six very-best models (indicated in the top-right legend) is already enough to yield sub-percent bias while keeping the overall ensemble model small enough for real-time applications.
These values are preferred over the R2 metric (which has been used in other emulation studies21) as it would be > 0.995 in all cases. The combined prediction is obtained for each target as a simple average of its predicted values from the best-fitting neural networks.
The general percent-level performance on the geometric targets and integrated quantities is expected, given the dataset. Given the intrinsic standard deviations of the shape coordinates over the synthetic library (midplane radii and lower X-point position), these metrics translate into mm accuracy, i.e., comparable to the accuracy of the direct calculation on the numeric Grad-Shafranov solutions with a 2–4 cm resolution grid. The flux expansion (BX_over_Btar) is generally the most uncertain as it is a ratio of magnetic fields (computed as derivatives of the flux-function), and because the strike-point location can depend appreciably on the upstream location and on whether a secondary X-point is created in the divertor region. Whenever this happens, a small difference in upstream location or in divertor coil currents can result in a large difference in the strike-point Z-location, hitting the divertor baffle instead of the bottom tiles. In general, the radial strike-point location is fit better than the Z-location.
V. DISCUSSION
We have presented a stable procedure to sample synthetic equilibria from a high-dimensional parameter space with operational constraints, where the “score” function can also be chosen to privilege configurations with more desirable properties. This sequential MCMC approach is also convenient because it curbs some of the numerical instability inherent in solving the non-linear GS equation. Passive-structure currents may be considered as additional parameters, if they can be reduced to a small enough number of modes. While we have shown a case study with a machine resembling MAST-U in this work, this framework is agnostic to the chosen machine configuration. In principle, it is also agnostic to the equilibrium solver, provided it can perform a numerically stable (and accurate) forward GS solution. The advantage of FreeGS over other codes is that, being fully in python, it can be seamlessly integrated with the MCMC sampling. The same MCMC sampler can also be used for design optimization purpose, e.g., tokamak configurations that minimize inter-coil forces while maintaining the plasma shape within desirable bounds. It can also be used for the Bayesian fitting of models to data (to enable uncertainty quantification) whenever the models involve nonlinear differential equations with numerical instabilities.
A. Profile libraries and divertor detachment studies
Flux-function profiles are also available, for divertor studies, as well as the flux expansion along the connection length from the midplane. This synthetic library can aid in refining some working hypotheses, such as a flux-expansion that is constant or proportional to radial position, that are commonly adopted in divertor detachment simulations.40 The choice of an appropriate score function is particularly relevant if super-X configurations are to be adequately sampled. For example, while we have considered the flux expansion from X-point to strike point, the exploration of scenarios for divertor detachment may consider the flux-expansion at some prescribed distance from the divertor tiles along the connection.
B. Emulating the physics
Scaling relations can be used to understand the general behavior of some targets, and may also be used to aid the real-time inference of pressure on axis from a magnetic reconstruction of the separatrix. However, they are not as useful for targets that are more sensitive to the wall, such as strike point position and inboard midplane radius, and in general, they are too uncertain to be used directly in control design. Power-law relations on other quantities have been determined experimentally, on historic ASDEX-U discharges, with comparable R2 values by Mc Carthy et al.,50 using least squares linear regression in log –log space, with a focus on inferring and controlling transport through qa and . In the presence of noisy regressors, Verdoolaege et al.51 argued for more robust Bayesian fitting techniques, which can affect the quality of control from power-law scalings on experimental data—rather than purely synthetic data as in our case.
Neural-network emulation works generally to within percent-level accuracy, but some caution is necessary for targets that can exhibit abrupt changes (strike point Z-location) or that are particularly sensitive to high solenoid currents (Rin). The annealed hyperparameter sampler developed for this work provides an ensemble of emulators, which can also be used to quantify uncertainties in the emulation.
During experiments, a noisy hint of the strike point position is given by real-time magnetic reconstruction. The emulators presented here may be re-trained on subsets corresponding to different wall segments, or adding a “tile segment” feature in the training set. Splitting the wall into segments would not affect the calculation of virtual circuits, since their derivative with respect to other inputs is going to be almost-always zero.
C. Choice of internal profiles
D. From emulation to control
Networks with ReLU activation functions perform best at approximating the control targets. However, a perfect ReLU NN approximant would have discontinuous derivatives on a dense subset of parameter space, which can be a problem, e.g., for virtual circuit emulation. The SReLU activation introduced in this paper alleviates this issue, but in principle there is no upper bound to the norm of the gradients from this choice either. Two solutions can be: (i) estimating as averages of finite-difference derivatives over an ensemble of approximators; (ii) introducing penalties that privilege smoother solutions, either through additional terms in the loss or through a variational intermediate layer.52,53 While the latter is left for further development, the former is already possible with the NNs ensembles from this work.
In the interest of keeping the analysis of physics emulation from the issues of real-time control algorithms, we chose not to present emulations of the virtual circuits themselves, as they involve derivatives of quantities that include effective inductances, for which different control-design choices have been made in the literature,1,19,24,54–56 as summarized in the Appendix.
We remark that this issue is already present in-state-of-the-art calculations of virtual circuits, which are done via finite-element differences that can be affected by numerical noise and departures from linearity. Fully differentiable GS solvers would deliver exact derivatives at each equilibrium configuration, which can then be emulated directly; however, they would already need to be emulated for fast deployment in real-time control, also with safety limits from estimated model uncertainties. The very same techniques presented in this paper can be used directly for these two purpose.
ACKNOWLEDGMENTS
This work was carried out within a Joint Endeavour between the Hartree Centre (Science and Technology Facilities Council, UKRI) and the UK Atomic Energy Authority.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Adriano Agnello: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Investigation (lead); Methodology (lead); Project administration (equal); Software (lead); Supervision (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). Nicola C. Amorisco: Conceptualization (supporting); Software (equal); Writing – review & editing (supporting). Abigail Keats: Data curation (supporting); Software (supporting); Visualization (supporting). George Holt: Conceptualization (supporting); Investigation (supporting). James Christopher Buchanan: Data curation (supporting); Resources (equal); Writing – review & editing (equal). Stanislas J. P. Pamela: Data curation (supporting); Project administration (equal); Resources (equal); Writing – review & editing (supporting). Charles Howard Vincent: Conceptualization (supporting); Writing – review & editing (supporting). Graham McArdle: Conceptualization (equal); Writing – review & editing (supporting).
DATA AVAILABILITY
The synthetic libraries, code to generate them, and the code for the neural-network training and hyperparameter exploration are available upon paper publication at https://github.com/farscape-project/freegs-emu (Ref. 58).
APPENDIX: FURTHER MATHEMATICAL DETAILS
1. Choices of circuit equations for the plasma control design
The linearized-response model (A8) delegates to the designed coil sweep and the MIMO controller the task of minimizing the shielding response of passive structures, and is the most commonly used on tokamak control design.19 One may be tempted to account directly for part of the passive structures that are strongly coupled to the coils, e.g., the coil-cases or the MAST-U divertor nose close to the Dp coil. In fact, the solution of the Grad–Shafranov equation only needs knowledge about the flux-function ψtok from the tokamak in a domain that includes the plasma, and if the domain is chosen as a torus enclosing the plasma and sufficiently far inside the vessel, the contribution of Dp to ψtok is almost indistinguishable from the combined contribution of Dp and divertor nose. If we consider the tokamak flux-function in the domain as an array, we can then use the SVD decomposition where V has dimension and can be projected onto the first nc principal components, that are the first nc rows of . In the coupling between plasma and metals, the vessel can be solved for, which yields slightly different definitions for the effective inductances above. A change in active coil currents then brings a change to and the sensitivity matrix has similar entries as before, but is suitably rotated in the new basis of currents. One reason why this alternative approach is not common in the literature is that, ultimately, the controller deals with requested voltages to the power supplies, which are subject to safety limits and output limits: aiming to directly cancel the shielding of tighly coupled passive structures at all times is not necessarily feasible within those limits, and can also result in unstable open-loop control. The classical approach of designing the sensitivity matrix and, separately, a MIMO controller for the voltages provides a wealth of established techniques to minimize the asymptotic error, and examine the stability and safety of closed-loop control. We also note that the alternative approach still requires a calculation of the derivatives and ultimately the emulation of the targets as a function of plasma and coil currents, which is the content of this paper.
2. Example form of the score function
3. MCMC parameter distributions
The distributions of parameters from the sequential MCMC sampler are shown in Fig. 6. Some of the PF coil currents (D1, P4, P5) and of plasma pressure and current are obliged by the need to keep the plasma separatrix confined, while other coil currents are more free to vary to produce finer changes in the resulting geometry of the separatrix and divertor legs. The distribution of fvac values is mostly reflective of the choice to privilege and longer connection in the score .
Distribution of the plasma parameters, fvac and PF coil currents explored by the MCMC sampler. Most chains are sampeld within the limits in Table I, except for one run with very high solenoid currents, whose high stray-field requires higher D1 currents.
Distribution of the plasma parameters, fvac and PF coil currents explored by the MCMC sampler. Most chains are sampeld within the limits in Table I, except for one run with very high solenoid currents, whose high stray-field requires higher D1 currents.