Emulation Techniques for Scenario and Classical Control Design of Tokamak Plasmas

The optimisation of scenarios and design of real-time-control in tokamaks, especially for machines still in design phase, requires a comprehensive exploration of solutions to the Grad-Shafranov (GS) equation over a high-dimensional space of plasma and coil parameters. Emulators can bypass the numerical issues in the GS equation, if a large enough library of equilibria is available. We train an ensemble of neural networks to emulate the typical shape-control targets (separatrix at midplane, X-points, divertor strike point, flux expansion, poloidal beta) as a function of plasma parameters and active coil currents for the range of plasma configurations relevant to spherical tokamaks with a super-X divertor, with percent-level accuracy. This allows a quick calculation of the classical-control shape matrices, potentially allowing real-time calculation at any point in a shot with sub-ms latency. We devise a hyperparameter sampler to select the optimal network architectures and quantify uncertainties on the model predictions. To generate the relevant training set, we devise a Markov-Chain Monte Carlo algorithm to produce large libraries of forward Grad-Shafranov solutions without the need for user intervention. The algorithm promotes equilibria with desirable properties, while avoiding parameter combinations resulting in problematic profiles or numerical issues in the integration of the GS equation.


I. INTRODUCTION
The Grad-Shafranov equation is ubiquitous in the description of magneto-hydrodynamical (MHD) equilibria, from tokamaks to astrophysical plasmas 1,2 .It is a nonlinear, partialdifferential equation with nontrivial boundary conditions, and its accurate solution is necessary in order to describe the current-density and pressure profiles of a plasma in a magnetic field.In the rest of this paper, we concentrate on issues inherent the use of the Grad-Shafranov equation in tokamak plasmas, and how they can be alleviated by the use of suitably constructed equilibrium libraries and emulator architectures.Several tokamak designs are being developed in order to bring magnetic-confinement fusion to net power generation.In preparation for energy production from tokamaks, such as STEP (Wilson et al. 3 ) and SPARC (Creely et al. 4 , Rodriguez-Fernandez et al. 5 ), precursor machines have been in operation to probe fundamental physics, different operational regimes, and control strategies (e.g.JET 6 , ASDEX-U 7 , NSTX-U 8,9 , MAST-U [10][11][12] , DIII-D 13 , TCV 14 , Globus-M 15,16 , ST40 17,18 ).In a tokamak, equilibria are established on ≈ µs Alfvén timescales that are much faster than those of the typical current drives and dissipation, so the plasma can indeed be modelled as a sequence of MHD equilibrium solutions 1,19 .
Tokamak fusion experiments require real-time control of the burning plasma and of its exhaust.A range of physicsbased algorithms have been designed to control the plasma's shape targets, its profile, and its exhausts, ultimately to operate the machines reliably and attain longer confinement times.On machines with a large number of campaigns, the experiment data themselves have been used to build emulators of the control targets and predict the state corresponding to a set of actuator requests [20][21][22][23] , bypassing the use of expensive and sometimes numerically unstable PDE solvers.This route, however, requires large data volumes and is not viable for newer and upcoming machines.There, shot design typically proceeds by constructing sequences of desired equilibria and designing classical (linear) control for selected regimes.Here, we propose to complement this with physics-based emulation.Based on a set of synthetic equilibria covering the whole operational space of a tokamak, we present strategies to build emulators of the plasma and magnetic field configurations, with the aim of facilitating and accelerating scenario design and control.The same approach is also applied to the exploration of different magnetic configurations that can be used in studies of divertor detachment.For the reader's convenience, Table 1 summarises the symbols and notations chosen.Throughout this paper we use underlined symbols to denote multidimensional arrays of parameters.

A. Classical Control
Coil-current control is traditionally designed by examining how a set of desired targets change under a small change in active-coil currents.Usually, this is done under a set of ap-arXiv:2403.18912v1[physics.plasm-ph]27 Mar 2024 proximations, in particular that the passive-structure transients decay fast enough and that the plasma Ohmic dissipation is negligible over the millisecond timescales of real-time control.These hypotheses are generally satisfied by the choice of solenoid and coil "sweeps", and by the closed-loop control to keep the coil currents close to the desired ones and hence minimise the shielding effects of passive structures 19 .The control design equations assume a lumped-parameter circuit-equation for the coupling of plasma current I p and desired shaping coil currents I c , so that under a small change δ I c in coil currents, a given target y changes as with suitably defined inductances L. Different authors adopt different definitions for the inductances 1,19,24,25 , which we discuss in the Appendix 1 for the sake of completeness.The objective of real-time control is to find changes δ I c in the array of requested shaping-coil currents, on top of a pre-programmed coil sweep, that correspond to a desired change δ y in the array of targets of interest.This is done by considering the SVD pseudo-inverse of the sensitivity matrix S = ∂ y/∂ I c , yielding virtual circuits with the desired linear combinations of coil-current changes [26][27][28][29] .Once the desiderd virtual circuits are obtained, they are used as "requested" coil current inputs in classical closed-loop control 19 .The details of the chosen targets are specific to each tokamak, but are all broadly related to geometric or poloidal magnetic flux constraints on the shape of the separatrix and its extension into the divertor region.The sensitivity matrix is an inherently local quantity, as it is computed around one given equilibrium.In practice, computing a sensitivity matrix around any foreseen equilibrium configuration in a shot is computationally prohibitive.Therefore, at experiment design, the coil "waveforms" are constructed by prescribing the plasma shape and divertor strike points, and the virtual circuits are computed for a finite set of expected configurations along these trajectories, using sensitivity matrices calculated by finite differences.The desired shape and strike point may be computed at different values of the Ohmic circuit currents, if they have significant stray field (as e.g. on MAST-U), otherwise different sets of virtual circuits can be designed to control the main shape of the plasma and the divertor.

B. Divertor Studies
Modern research tokamaks are endowed with a divertor, to remove impurities and manage the high heat flux 7,12,[30][31][32][33] .The transport of heat along a flux tube can be generally captured e.g. by the so called "two-point model", which explicitly demonstrates the importance of flux expansion and connection length 30,34,35 .Refined analyses can be performed with more complete and multi-species descriptions of transport along field lines, or in the full meridional plane [36][37][38][39] .The design and control of divertors is an active field of research, but one overall objective is the attainment of high flux expansion, i.e. the widening of a flux tube from the scrape-off layer towards the machine tiles 30,40 .Finding magnetic configurations that can provide long connections and high flux expansions, compatibly with the operational limits and plasma shape constraints, is an integral part of scenario design.

C. This paper
From the above examples, two general issues emerge: (I) is it possible to quickly obtain magnetic configurations and targets of interest at any point over a wide range of control parameters? and (II) is it possible to traverse the operational space to search for desirable configurations in the presence of non-trivial boundaries (as e.g.resulting from the need for a diverted plasma)?
In order to minimise manual intervention and also enable uncertainty quantification in follow-up work, we develop smooth emulations of targets of interest over the parameter range, that in principle can be deployed with sub-ms latency.This in turn requires a library of synthetic equilibria, for which we develop an efficient sampling algorithm.The synthetic equilibria, numerical challenges and the resulting sampling algorithm are discussed in Section II.The emulation is discussed in Section III, including a sampling algorithm to search hyperparameter space.All results are presented in section IV, and implications are discussed in Section V.
We remark that all equilibria in this work are synthetic, i.e. they are computed assuming a perfectly known state of tokamak and plasma, and none are from a reconstruction of experiment shots.This also allows us to control the sources of uncertainty in the emulation, which only come from the numerics of the equilibrium construction and emulation procedures, without any systematics from the profile reconstruction.The equilibria used in this work are computed on a tokamak that resembles the MAST-U machine configuration, although with different coil positions and windings (displayed in Fig. 1).The full machine description is given in the public FreeGS code repository 41 .While we are interested in solutions that can be applied to the real MAST-U, here we deemed appropriate to use a synthetic machine as it is fully specified (no measurement uncertainties), while retaining the general complexity of a real-life tokamak.All of the profiles in this work are diverted, i.e. their last-closed flux surface within the main chamber has at least one X-point.This is because, at least on MAST-U, the plasma can be brought from a limited to a diverted configuration shortly after the breakdown phase.

II. SYNTHETIC DATASET
The plasma equilibrium configurations are described by the Grad-Shafranov (GS) equation, constrained by the choice of global parameters for the plasma and coil currents, as discussed in Section II A below.The library of synthetic equilibria is built through a random walk in parameter space as discused in Section II B, starting from one equilibrium with prescribed shape.To solve for the flux-function of the synthetic ) and the static GS solver presented by Amorisco et al. 43 .Wherever needed, we build additional functionality as described in the rest of the paper.
The procedure to build the equilibrium library is devised in order to address two specific issues.First, due to the nonlinearities of the GS equation, its integration converges more often if starting from a "reasonable" guess of the solution, which is available if a new trial equilibrium parameters are sampled close to those of a previously computed equilibrium.Second, a good emulation of the control targets is needed primarily in regimes with more desirable shape properties, and as the dimensionality of the parameter space increases, more naïve Monte Carlo methods (e.g.acceptance-rejection, Latin hypercube, pseudo-random) will often sample equilibria far away from those with more desirable properties.
Several conventions are used in the literature for the fluxfunction ψ, here we define it such that the poloidal magnetic field satisfies B R = −∂ Z ψ/R.In what follows, f vac denotes the amplitude B φ R in vacuum, ψ a and ψ b are ψ on the magnetic axis and at the boundary, and ψ n = (ψ − ψ a )/(ψ b − ψ a ).We will work with free-boundary problems, where ψ is to be determined together with a toroidal current density j φ and the plasma boundary is defined as the surface where ψ equals the one at the closest X-point to the magnetic axis.

A. Grad-Shafranov solutions
The flux-function ψ and toroidal current density j φ (R, Z) must statisfy the Grad-Shafranov equation where ∆ * g = Rdiv(R −1 ∇g), the toroidal plasma current is and p and f are the pressure and B φ R along a surface of constant ψ respectively.By j coils we denote any current density in conductors that can be modelled through a set of azimuthally symmetric coils.Given a current density j φ , the corresponding ψ can be obtained by a (suitably defined) Green-function integration over the plasma and coil domain D, so if a dependence ĵφ (ψ, R) on ψ is prescribed through p(ψ) and f (ψ), a mapping can be defined as and solving the GS equation ( 2) amounts to finding with additional constraints.In a forward free-boundary GS problem, the currents in the coils are prescribed, as are the plasma current, f vac and plasma parameters.In an inverse (or interpretative) GS problem, the coil currents or the plasma parameters are not provided and are to be found based on a suitable set of constraints on ψ or on the magnetic field, e.g.X-point and isoflux constraints.During a shot, these in turn result from a real-time inference of the flux-function in vacuum from magnetic measurements (flux loops, pickup coils).Approximate solutions to the inverse GS problem can be found by alternating a Picard iteration and an adjustment of the coil currents towards the constraints.
In our work, adopting the parameterization choices of Lackner 44 and Jeon 45 , the plasma parameters can be found explicitly once current and pressure on axis are prescribed.In the forward GS problem, the Picard iterations are often unstable and a more robust root-finder is needed.Here, we use the static GS solver which is part of the FreeGSNKE package 43 , which uses the Newton-Krylov method and was developed on purpose to extend the FreeGS framework.We choose p(ψ) n for the whole library of equilibria.In particular, with the notation of Jeon 45 : where R 0 is an arbitrary scale-length, and the scalings λ , β 0 can be solved for explicitly at given p a and I p .
Given the physics of the problem, the shape targets should only depend on the ratio of coil currents to plasma current, and on β p ∝ p a /I 2 p , with the only caveat that the coil currents (rather than their ratio to I p ) must be kept within some operational limits.The first equilibrium is described by the constraints in Table II and shown in Figure 1, which also introduces some of the nomenclature.The procedure to build the library, by varying the coil currents, f vac , plasma current and pressure on axis, uses the Newton-Krylov solver for the forward GS problem and is described below.

B. Sequential Sampling of Synthetic Equilibria
Any solution of the forward Grad-Shafranov equation can be identified with its input parameters, so in what follows we will use interchangeably Each synthetic equilibrium is given a score, L (eq.), that privileges better-behaved configurations.In general, the score is given by The details of L (eq.) are not important here, and the particular functional forms of its factors can be varied across different sampling runs to ensure that interesting regions of parameter space are adequately sampled.An example is given in the Appendix.In general, the score L (eq.) increases with the connection length, and in general it privileges equilibria with strike points on the divertor tiles, inboard mid-plane point away from the central column, X-point between the divertor "nose" and the column tile, small displacement of the magnetic axis from the midplane, and safety factor on axis q a > 1.
A penalty term in the coil currents is included in order to ensure that the current in PX is ≈ 0 A when the current in the solenoid is > 2000 A or < −2000 A, to demonstrate the inclusion of physical constraints necessary when running a real machine.The connection length is computed from one scrapeoff-layer width of 3 mm from the midplane outboard radius.
In order to sample L (eq.), a Monte Carlo Markov Chain (MCMC) is built, such that a new proposed equilibrium "prop.eq.( n+1)" is accepted with probability min(1, L (eq.(n + 1))/L (eq.(n))).At each step, the Grad-Shafranov equation is integrated using the previous equilibrium as a first guess.In the formulae below, X j,n denotes the j−th coordinate in parameter space (i.e.plasma parameters or coil currents) of the n-th equilibrium.In a Metropolis-Hastings (MH) sampler, a walker draws a step in parameter space from a chosen distribution, while in the affine-invariant samplers of Goodman and Weare 46 each of 2W + 2 walkers draws the trial step based on the relative positions of W of the other walkers.Here, a combination of the two is adapted into a sequential sampler as: with P = 2dim in + 2 = 32.The prefactor 3/P is such that ⟨step GW,i step GW, j ⟩ is the same as the sample covariance of the last P equilibria.This walker update rule, which makes the step size adaptive (i.e.dependent on the local steepness of the score L eq ), is used only after 100 steps of pure MH sampling.A uniform distribution U is preferred to a normal distribution for the steps, because it samples more efficiently step sizes above the variance while also truncating steps that would be too large and therefore challenging for the convergence of the Grad-Shafranov integration.If the Grad-Shafranov integration fails to converge, the walker is restarted from one of the equilibria in the chain, chosen at random.
Strictly speaking, step GW is not the same as in the Goodman & Weare sampling algorithm, but it has the advantage that the chain is parameterised by a single walker, which eases the task of indexing the output configurations ψ n (R, Z).Even though this MCMC may not necessarily satisfy the detailed balance conditions, it results in an efficient exploration.

III. EMULATION
We examine two solutions to obtain fast fitting functions to the targets computed on synthetic equilibria, that can also enable extrapolation and uncertainty quantification: scaling relations and neural networks.All of the neural networks examined in this work are built with the TensorFlow python APIs.We devote a separate subsection to the design of an annealed hyperparameter sampler, which is devised on purpose for this work to produce an ensemble of emulators.
A. Scaling Relations Some of the targets show a simple behaviour with the inputs, which may be described by simple power-laws.To investigate these trends in the targets, we perform power-law fits via a least-squares linear regression in their logarithms.The only regressor that is fit with an exponential, instead of a power-law, is the solenoid current rescaled by the plasma current.The overall scaling relation to be fit is then, for each target θ : While a power-law scaling can be an adequate hypothesis for most parameters, the solenoid current spans a wide range of positive and negative values and the behaviour of some of the parameters is clearly asymmetric in the solenoid current, so an exponential was the preferred choice.This combination still ensures that the fits of log-targets are all linear combinations of the regressors (α p , α bp , α f , α sol , α c ).We remark that, in this work, all quantities are from synthetic profiles, so the uncertainties are arbitrarily low (except for numerical roundoff) and the best-fit regressors are the same whether the fit is performed in target space or log-target space.

B. Neural Networks
Neural networks (hereafter NNs) are amenable to the task of emulation because: (1) Universal Approximation theorems make them appealing candidates in general; (2) they can also be used to extrapolate in regions of parameter space that are less densely sampled; (3) off-the-shelf packages can compute with the same activation), one dense and one affine layer with the same shape as the outputs.The user can choose the activation function among a sigmoid, a ReLU, a leaky ReLU, an eLU, and a SReLU (see eq. 17).The depth, width, learning rate, L 2 regularisation and gradient batch size are all hyperparameters to be explored.all derivatives analytically, overcoming issues of numerical noise; (4) one can build simple enough architectures that can be deployed with low latency on a real-time controller; (5) uncertainties can be quantified whenever an ensemble of neuralnetwork emulators has been trained, on different random subsamples of a dataset or with different hyperparameters.
While Universal Approximation theorems are known, three remarks will be useful for this work.First, a neural network with sigmoid activations is a universal approximator of continuous functions over a compact domain, so any target that may diverge close to the domain boundary or with a discontinuity would need a suitable reparameterisation of the inputs and outputs.This is true also for networks with a Rectified Linear Unit activation ReLU(x) = (x + |x|)/2, also as approximators to L p functions in the L p norm, which provides only a global goodness-of-fit measure whenever the functions have discontinuities.This is the case e.g. for the connection length, which can change abruptly as soon as the scrape-off layer grazes the divertor "nose" region.Another example is the strike-point location, which can be quite sensitive to small changes in R out or λ SOL and can exhibit a sudden discontinuity (primarily in Z) whenever the divertor coils create another X-point in the divertor region.Third, while there are known lower bounds to the network width that is required for universal approximation, there is no known lower bound on the depth (nor a known upper bound on depth and width) for finite datasets or prescribed fitting tolerance, so an extensive exploration of the network architecture is needed.
The neural networks used here are fully-connected and feed-forward, and are trained to fit all chosen targets simultaneously for each parameter input.All inputs and targets are standardised before training.They can be trained to fit the targets or their logarithm, using the inputs as-is or rescaled by powers of the plasma current as above; the inputs can be further passed as they are, or under logarithm or arcsinh.The simplest architecture (a combination of ReLU activations on log-inputs and log-outputs) is also the simplest extension of the scaling relations considered above.We examine different possible activation functions: the ReLU defined above; a leaky ReLU (i.e.ReLU(x) + 0.3ReLU(−x)) ; an exponential LU (eLU(x) = ReLU(x) + (e x − 1)1 x<0 ); a sigmoid − log(1 + e −x ); and a smoothed ReLU which behaves asymptotically like a leaky ReLU but with smooth gradients everywhere and is a simple combination of TensorFlow built-in primitives.Two kinds of architecture are available: the first has d hidden layers all with the same activation function; the second has one first hidden layer as above, and the remaining d − 1 as residual blocks With a (large but) finite data-set, wider or deeper networks are more prone to overfitting, so regularization terms must also be introduced.A quadratic regularization is added to the loss function, which in turn can be a mean-squared error or a mean-absolute error.The hyperparameter exploration procedure is described below, together with the choices made to split the data into training, evaluation, and test sets.

C. Hyperparameter Tuning
The chosen NN architecture has several hyperparameters to be tuned: depth (how many hidden layers), width (how many nodes per hidden layer), batch size, learning rate l r , regularization l L2 .A grid search is expensive and prone to arbitrary choices in e.g.how one would set the grid mesh in l r and l L2 .A 'random search' with uniform draws is less expensive, but still rather noisy and hence inefficient.Ideally, we want a sampler that already knows about islands of hyperparameter space that perform better, but with e.g. a MCMC we would discard an arbitrary number of computed models.Off-theshelf packages for a "Bayesian" hyperparameter search exist, however they are not necessarily suited to this particular problem.Population-Based Training 47 replaces the worst performing models with mutations of the best-performing ones, and so it is mostly useful for the scheduling of l r and l L2 , while in principle it cannot resample models with varying width and depth.Hyperband 48 starts from a random draw of hyperparameters and progressively prunes the worst performing models in favour of training the best performers for more epochs.So-called "Bayesian" optimizers, like Optuna 49 , use a density estimator to approximate the loss vs hyperparameters and therefore inform the next hyperparameter draw, however this neglects the inherent scatter in losses (even on the same hyperparameter tuple) from different initial NN weights and different random splits in training and validation sets.
We build a different implementation for this work, where in principle all trained models can contribute to inform a random draw of the next hyperparameter trial.Below, h n denotes the n-th hyperparameter combination being examined.The first P hp choices h 1 , ..., h Php are drawn from a prior distribution within some chosen ranges.Then, given the n choices h and losses examined so far, with their validation losses L ( j) ( j = 1, ...n), the choice (n+1) for n = P hp , ..., N mc uses an explore-exploit strategy.In particular, with probability P n the hyperparameter tuple h n+1 is drawn like the first P hp , and with probability 1 − P n it is drawn as follows: Here x ∈ {h|p} means that a hyperparameter h j in h is drawn with probability p j and its value is assigned to x.In other words, in the "exploit" case, P hp models are chosen among the n available (privileging models with lower validation losses) and are used to build a random vector for the next trial model.The steepness α n allows for a more demanding selection of the P hp vertices z l .The "explore" probability threshold is parameterised as For this work, P hp = 10 and ε 0 = 0.1 have been chosen.The loss is the mean-absolute deviation evaluated on the validation set, described below.Similarly to the sequential MCMC above, the prefactors 3/P hp are such that the hyperparameter steps are combinations with standardised random coefficients.We choose this approach over the MCMC just because it retains all of the sampled hyperparameters, and every explored hyperparameter can (in principle) contribute to the choice of a new hyperparameter combination.
From the synthetic dataset, a random 20% has been set aside as a test set, over which the performance of the sampled models is evaluated at the end of the hyperparameter exploration.Of the remaining 80%, whenever a new model is instantiated, a random 20% (i.e.16% of the original data) is taken as a validation set for early-stopping.For each hyperparameter trial, the training is run until it m such that loss val (it m ) ≥ loss val (it m −20) and it m ≤ 500.The final loss L j for h j is then loss val (mod j , it m ).During training, the neuralnetwork gradients are evaluated on mini-batches, whose size is also one of the hyperparameters.The choice of a random validation subset for every new model instantiation is made to render the hyperparameter search robust against specialising on a particular training-validation split.The green and points around Z = 0 mark the range of inner/outer separatrix radii and magnetic axis, respectively.Their distribution is shown in the histograms in the top panel.

IV. RESULTS
Approximately thirty thousand synthetic profiles are computed, together with a number of targets and other integrated properties for each profile.The NN emulation and hyperparameter sampling is done on a subset of the computed targets, while others are provided with the synthetic library as they can be useful for diagnostic purposes, e.g. to select particular equilibria for divertor detachment simulations.The distribu-tion of parameter values explored by the MCMC sampler is shown in Figure 6 in the Appendix.

A. Synthetic libraries
Figure 3 summarizes the distribution of the shape targets computed on the synthetic equilibria.Solutions with a strike point on the divertor "nose" or "baffle" are highly penalised in the library generation but are not rejected, in order for the subsequent emulators to learn which parameter combinations can lead to unfavourable divertor conditions.Histograms of the separatrix midplane inboard and outboard radii, as well as the magnetic axis radius, are also shown.The sharp cutoff at R ≈ 0.26 m is where the limiter of the inner column is touched.The distribution of (lower) X-point locations is merely reflective of the score function L privileging configurations with the strike point hitting the bottom divertor tiles.The connection length can exhibit a sharp drop whenever the SOL grazes the divertor "nose", mostly due to equilibrium configurations with large outboard separatrix radius, and long tails due to configurations where an additional X-point is created in the divertor region.

B. Scaling Relations
Some of the targets behaviour is captured by the scaling relations of equation ( 16), based on the R 2 metric of the leastsquares fit of the logarithm of the parameters, defined as log scores of 0.7, 0.96, 0.8, 0.7, 0.9, 0.74, 0.88 respectively.Some of the exponents are trivially expected (e.g.Z a ∝ I P6 /I p or ψ a ∝ I p ), while others reflect the chosen tokamak configuration (e.g. the impact of the poloidalfield coils P4, P5 and divertor coil D1).The scaling of q a is roughly consistent with what would be expected from the small-radius approximation , although with slightly different exponents, including the small diamagnetic effects in the plasma (∂ log q a /∂ log( f Vac /I p ) = 0.82 < 1).Some of the exponents are trivially expected (e.g.Z a ∝ I P6 /I p or ψ a ∝ I p ), while others reflect the chosen tokamak configuration (e.g. the impact of the poloidal-field coils P4, P5 and divertor coil D1).The scaling of q a is roughly consistent with what would be expected from the small-radius approximation , although with slightly different exponents, including the small diamagnetic effects in the plasma (∂ log q a /∂ log( f Vac /I p ) = 0.82 < 1).
Other targets have very uncertain fits, in particular R in and R X .The behaviours with I sol /I p are the most uncertain and they change at high solenoid currents, reflecting the role of different coil combinations in keeping the separatrix within the vessel for different strengths of the solenoid stray field.These scaling relations can be used to aid the real-time reconstruction of p a (or equivalently β p and q a ) with a good first guess from the magnetic reconstruction of R out and R a .However, they are not necessarily accurate enough to be used directly in control design, especially because the dependence on solenoid current and in particular because the inboard midplane radius R in is not well fit.

C. Neural Networks
Figure 4 illustrates the hyperparameter exploration, quantified by the validation loss.Shallow but wide network architectures are preferred.A leaky ReLU provides the best results, as it does not not have vanishing gradients.In principle, a standard ReLU activation would perform equally well, but that requires wider hidden layers, in order to reproduce enough ReLU combinations to resemble a leaky ReLU and prune nodes with vanishing gradients.
Figure 5 shows test loss metrics and calibration plots for each target, for a subset of the best-performing models.The calibration plots are displayed in terms of standardised targets.The metrics are computed on a separate subset, which was not used during training and early-stopping.The test metrics are defined as • mean-absolute-deviation over std.dev: These values are preferred over the R 2 metric (which has been used in other emulation studies 21 ), as it would be > 0.995 in all cases.The combined prediction is obtained for each target as a simple average of its predicted values from the best-fitting neural networks.
The general percent-level performance on the geometric targets and integrated quantities is expected, given the dataset.Given the intrinsic standard deviations of the shape coordinates over the synthetic library (midplane radii and lower Xpoint position), these metrics translate into ≈ 2 − 3mm accuracy, i.e. comparable to the accuracy of the direct calculation on the numeric Grad-Shafranov solutions with a 2-4cm resolution grid.The flux expansion (BX_over_Btar) is generally the most uncertain as it is a ratio of magnetic fields (computed as derivatives of the flux-function), and because the strikepoint location can depend appreciably on the upstream location and on whether a secondary X-point is created in the divertor region.Whenever this happens, a small difference in upstream location or in divertor coil currents can result in a large difference in the strike-point Z-location, hitting the divertor baffle instead of the bottom tiles.In general, the radial strike-point location is fit better than the Z-location.

V. DISCUSSION
We have presented a stable procedure to sample synthetic equilibria from a high-dimensional parameter space with operational constraints, where the "score" function L can also be chosen to privilege configurations with more desirable properties.This sequential MCMC approach is also convenient because it curbs some of the numerical instability inherent in solving the non-linear GS equation.Passive-structure currents may be considered as additional parameters, if they can be reduced to a small enough number of modes.While we have shown a case study with a machine resembling MAST-U in this work, this framework is agnostic to the chosen machine configuration.In principle, it is also agnostic to the equilibrium solver, provided it can perform a numerically stable (and accurate) forward GS solution.The advantage of FreeGS over other codes is that, being fully in python, it can be seamlessly integrated with the MCMC sampling.The same MCMC sampler can also be used for design optimisation purposes, e.g.tokamak configurations that minimise inter-coil forces while maintaining the plasma shape within desirable bounds.It can also be used for the Bayesian fitting of models to data (to enable uncertainty quantification) whenever the models involve nonlinear differential equations with numerical instabilities.The selected models in the legend are labeled by their index in the hyperparameter exploration followed by the total loss val .Many of the sampled models have a small loss val (Fig. 4), but a very strict cut on the six very-best models (indicated in the top-right legend) is already enough to yield sub-percent bias while keeping the overall ensemble model small enough for real-time applications.

A. Profile libraries and divertor detachment studies
Flux-function profiles are also available, for divertor studies, as well as the flux expansion along the connection length from the midplane.This synthetic library can aid in refining some working hypotheses, such as a flux-expansion that is constant or proportional to radial position, that are commonly adopted in divertor detachment simulations 40 .The choice of an appropriate score function L is particularly relevant if super-X configurations are to be adequately sampled.For example, while we have considered the flux expansion from X-point to strike point, the exploration of scenarios for divertor detachment may consider the flux-expansion at some prescribed distance from the divertor tiles along the connection.

B. Emulating the physics
Scaling relations can be used to understand the general behaviour of some targets, and may also be used to aid the realtime inference of pressure on axis from a magnetic reconstruction of the separatrix.However, they are not as useful for targets that are more sensitive to the wall, such as strike point position and inboard midplane radius, and in general they are too uncertain to be used directly in control design.Power-law relations on other quantities have been determined experimentally, on historic ASDEX-U discharges, with comparable R 2 values by Mc Carthy et al. 50, using least-squares linear regression in log-log space, with a focus on inferring and controlling transport through I p , q a and B φ .In the presence of noisy regressors, Verdoolaege et al. 51 argue for more robust Bayesian fitting techniques, which can affect the quality of control from power-law scalings on experimental datarather than purely synthetic data as in our case.
Neural-network emulation works generally to within percent-level accuracy, but some caution is necessary for targets that can exhibit abrupt changes (strike point Z-location) or that are particularly sensitive to high solenoid currents (R in ).The annealed hyperparameter sampler developed for this work provides an ensemble of emulators, which can also be used to quantify uncertainties in the emulation.
During experiments, a noisy hint of the strike point position is given by real-time magnetic reconstruction.The emulators presented here may be re-trained on subsets corresponding to different wall segments, or adding a "tile segment" feature in the training set.Splitting the wall into segments would not affect the calculation of virtual circuits, since their derivative with respect to other inputs is going to be almost-always zero.

C. Choice of internal profiles
Ultimately, the quality of the emulation can only be as good as the quality of the synthetic inputs, at best.For this reason, in this work we chose to decouple the accuracy of the physics from the aspects more strictly related to the dataset construction and emulation.The parameterisation of current density yields internal inductance values l i (2) in the range 0.55 ± 0.15, more typical of L-mode plasma in purely Ohmic discharges.The sheared profiles of non-inductive scenarios may be better described by 1 with ζ ≈ 3.This functional form only adds one parameter, so it is not inherently difficult to implement within the same MCMC sampling described in this work.

D. From emulation to control
Networks with ReLU activation functions perform best at approximating the control targets.However, a perfect ReLU NN approximant would have discontinuous derivatives on a dense subset of parameter space, which can be a problem e.g. for virtual circuit emulation.The SReLU activation introduced in this paper alleviates this issue, but in principle there is no upper bound to the norm of the gradients from this choice either.Two solutions can be: (I) estimating as averages of finite-difference derivatives over an ensemble of approximators; (II) introducing penalties that privilege smoother solutions, either through additional terms in the loss or through a variational intermediate layer 52,53 .While the latter is left for further development, the former is already possible with the NNs ensembles from this work.
In the interest of keeping the analysis of physics emulation from the issues of real-time control algorithms, we chose not to present emulations of the virtual circuits themselves, as they involve derivatives of quantities that include effective inductances, for which different control-design choices have been made in the literature 1,19,24,[54][55][56] , as summarised in the Appendix.
We remark that this issue is already present in state-of-theart calculations of virtual circuits, which are done via finiteelement differences that can be affected by numerical noise and departures from linearity.Fully-differentiable GS solvers would deliver exact derivatives at each equilibrium configuration, which can then be emulated directly, however they would already need to be emulated for fast deployment in real-time control, also with safety limits from estimated model uncertainties.The very same techniques presented in this paper can be used directly for these two purposes.
Data Curation.The data-sets and model-import scripts have been assembled by A.Agnello ad A.Keats.Examples of equilibrium configurations and oeprational limits relevant to MAST-U have been provided by S.Pamela, J.Buchanan, C.Vincent.
Writing and Reviewing.The manuscript has been written by A.Agnello, with reviews by N.C.Amorisco, J.Buchanan and S.Pamela.
If we adopt a stateful representation, where the plasma current is described by total plasma current, metal currents and other parameters "α" (e.g.p a in this paper), then we have where we allow for external disturbances dε W in the thermal evolution, and where The control design equation is then obtained by decomposing the plasma and coil currents into a designed and perturbed currents.The zeroth-order I (0) (t) set of designed currents is from a sweep of poloidal-field coils (including the Ohmic circuits) that are necessary to drive the plasma currents and mantain a desired shape.These cancel the right-hand side of equation (A.5), while the left-hand side involves only variations in the perturbed currents I (1) (t).These are the current changes (still coupled between plasma and coils) that are required in order to correct a deviation in the shape targets from the one predicted by the programmed zeroth-order coil sweeps..Under a small change δ I c in shaping-coil currents, any target that depends on plasma-and coil-currents changes as and the perturbed plasma current changes as δ I p ≈ −L T cp δ I c /L p , which yields the control-design equation (1).The RZIp model of rigid displacement 24 uses a similar notation for the inductances, although it uses a radial and vertical displacement to parameterise the plasma instead of the coil currents, getting slightly different expressions for the R and L terms.Other lumped-parameter models 1,25 neglect the W T derivatives, and either prescribe the behaviour of other plasma parameters 56,57 or derive it from a 0D transport model or from the current-diffusion equation 55 .Other derivations 19 instead replace I T y with an integration over the plasma domain, writing Faraday's law while also neglecting changes in the plasma current distribution, and with the definitions L p = 1 T y M yy I y /I p ; L mp = 1 T y M mp ; R p = 1 T y R yy I y /I p .(A.9) The linearised-response model (A.8) delegates to the designed coil sweep and the MIMO controller the task of minimising the shielding response of passive structures, and is the most commonly used on tokamak control design 19 .One may be tempted to account directly for part of the passive structures that are strongly coupled to the coils, e.g. the coil-cases or the MAST-U divertor "nose" close to the Dp coil.In fact, the solution of the Grad-Shafranov equation only needs knowledge about the flux-function ψ tok from the tokamak in a domain that includes the plasma, and if the domain D is chosen as a torus enclosing the plasma and sufficiently far inside the vessel, the contribution of Dp to ψ tok is almost indistinguishable from the combined contribution of Dp and divertor "nose".If we consider the tokamak flux-function in the domain D as an array, ψ tok = A tok I c , we can then use the SVD decomposition A tok = UΣV T , where V has dimension n c × n c and ψ tok can be projected onto the first n c principal components, that are the first n c rows of U. In the coupling between plasma and metals, the vessel can be solved for, which yields slightly different definitions for the effective inductances above.A change in active coil currents then brings a change to ψ tok , and the sensitivity matrix has similar entries as before, but is suitably rotated in the new basis of currents.One reason why this alternative approach is not common in the literature is that, ultimately, the controller deals with requested voltages to the power supplies, which are subject to safety limits and output limits: aiming to directly cancel the shielding of tighlycoupled passive structures at all times is not necessarily feasible within those limits, and can also result in unstable openloop control.The classical approach of designing the sensitivity matrix and, separately, a MIMO controller for the voltages provides a wealth of established techniques to minimise the asymptotic error, and examine the stability and safety of closed-loop control.We also note that the alternative approach still requires a calculation of the derivatives ∂ y/∂ I, and ultimately the emulation of the targets as a function of plasma and coil currents, which is the content of this paper.The last term encourages larger flux-expansion, evaluated between the X-point and the strike point, and the min(ψ n /0.1|"nose") is evaluated on the divertor "nose and baffle" region with 0.75 < |Z| < 1.6 and 0.75 < R < 1.6.

MCMC Parameter distributions
The distributions of parameters from the sequential MCMC sampler are shown in Figure 6.Some of the PF coil currents (D1, P4, P5) and of plasma pressure and current are obliged by the need to keep the plasma separatrix confined, while other coil currents are more free to vary to produce finer changes in the resulting geometry of the separatrix and divertor legs.The

5 FIG. 1 .
FIG.1.Tokamak configuration for this study, roughly based on MAST-U, and first synthetic equilibrium.The separatrix (ψ = ψ b = ψ X ) is marked in red.Left: names of main shape targets computed and considered for the subsequent emulation.Right: divertor coils (D1-D7, Dp) and shape coils (P4-P6).All coils are up-down symmetric except P6, whose winding is up-down antisymmetric and is used for vertical position control.The divertor tiles are marked by thick black lines, and the divertor "nose" and "baffle" are indicated.The tokamak coil positions and windings are given by the "MASTU" tokamak class in the public FreeGS repository.

FIG. 2 .
FIG.2.Nomenclature and general architecture of the NNs considered in this work.All have one input layer with identity activation σ (x) = x, one dense layer, then d − 1 layers (all dense or all redisual, with the same activation), one dense and one affine layer with the same shape as the outputs.The user can choose the activation function among a sigmoid, a ReLU, a leaky ReLU, an eLU, and a SReLU (see eq. 17).The depth, width, learning rate, L 2 regularisation and gradient batch size are all hyperparameters to be explored.

FIG. 3 .
FIG.3.Distribution of the shape targets computed on the library of synthetic profiles.In the lower panel, the black symbols indicate the strike-point positions (a majority is on the lower tiles), which result in the distribution of allowed X-point positions (red swarm).The green and points around Z = 0 mark the range of inner/outer separatrix radii and magnetic axis, respectively.Their distribution is shown in the histograms in the top panel.

TABLE I .
Symbols and notations used in this paper.