Aligning pedestal models and associated magnetohydrodynamic codes with experimental data is an important challenge in order to be able to generate predictions for future devices, e.g., ITER. Previous efforts to perform calibration of unknown model parameters have largely been a manual process. In this paper, we construct a framework for the automatic calibration of JOREK. More formally, we reformulate the calibration problem into a black-box optimization task, by defining a measure of the discrepancy between an experiment and a reference quantity. As this discrepancy relies on JOREK simulations, the objective becomes computationally intensive and, hence, we resort to batch Bayesian optimization methodology to allow for efficient, gradient-free optimization. We apply this methodology to two different test cases with different discrepancies and show that the calibration is achievable.

Scientific research to demonstrate the viability of fusion energy as a source of electricity at industrial scales is progressing with the construction of the international device ITER1 as well as other domestic programs like STEP in the UK.2 In order to maximize the performance of fusion devices, loss of energy from the core plasma must be minimized. Edge localized modes (ELMs) are a type of instability which occur at the edge transport barrier of the plasma in tokamak devices.3 ELMs are a cyclic instability that release energy from the edge, in the form of plasma filaments aligned to the magnetic field.4 This energy travels along magnetic field lines, outside the confined plasma, toward wall components of the device. Experimental evidence and theoretical works have supported the idea that these events can be characterized using magnetohydrodynamic (MHD) equations and more specifically, viewed as a peeling–ballooning instability.5 These equations serve as a basis for modeling Tokamak plasma instabilities. In recent years various nonlinear codes6–11 for solving these equations have been developed, among these we focus on JOREK.7 These are able to replicate complex physical dynamics, such as two-fluid diamagnetic rotation at low resistivity and viscosity levels, as well as high poloidal–toroidal resolutions.12,13

Efforts have been devoted to calibrating JOREK simulations to be as close as possible to reality, with the aim of generating outputs that are quantitatively consistent with experimental observations.12,14 Energy losses, ELM intensity and duration, pressure in the pedestal region, divertor heat fluxes, and kinetic energy are some of the quantities which are under scrutiny when comparing simulations and real experiments. As expected, calibrating simulations to be experimentally relevant depends strongly on the judicious selection of simulation parameters, which are not always directly observed, and, therefore, have to be inferred from experiments. Prior to calibration, parameter ranges must first be identified. These are typically selected to reflect physical feasibility based on theory and prior knowledge elicited from previous experiments.14 However, parameter ranges might also be restricted as some parameter values may lead to numerical instabilities when performing the simulation. This often gives rise to a challenging trade-off as the most experimentally relevant values of certain parameters, such as resistivity or viscosity lie close or within the region of numerical instability. This trade-off can be mitigated by allocating more computational budget to allow simulations to be run at a higher fidelity.

In this work, we propose a framework that defines a strategy to automatically calibrate JOREK, in order to retrieve the combination of parameters that can replicate as closely as possible a target experiment or target data, depending on the diagnostics measurements available. In other words, the aim is to minimize a loss function, dependent on the fixed input parameters, which quantifies the mismatch between observations and the output of the JOREK simulation. More specifically, let x θ denote a vector of quantities of interest (QoIs) obtained from the output of a JOREK simulation, given a specific set of input parameters θ Θ. Given the measurements of the same QoIs, x o, arising either from experiment or generated from a simulation, we define a loss function d ( x o , x θ ), which measures the discrepancy between the simulated and measured QoIs. The process of calibration can then be expressed in terms of the global optimization problem as follows:
θ * arg min θ Θ d ( x o , x θ ) .
(1)
It is straightforward to generalize this to the situation of having M independent QoI measurements. It is worth highlighting the two main challenges of this task.
  1. Running a JOREK simulation is an extremely computationally intensive process. In addition to computational cost, simulations might fail and need to be restarted,14 and gradients with respect to input parameters are not available. These characteristics are, therefore, inherited by the discrepancy, as its computation directly involves the evaluation of JOREK.

  2. Due to the complexity of the simulation underpinning JOREK, gradients of the discrepancy are not readily computed. As the direct evaluation of gradients is not possible, we would need to numerically estimate derivatives. This operation would involve running multiple JOREK simulations, which would make any standard gradient-based minimization strategy infeasible for calibration.

It is, therefore, necessary to rely on gradient-free methods, such as Bayesian optimization (BO). For this calibration problem, a novel batch Batch BO (BBO) method15 is deployed and generalized by introducing a probabilistic extension of the acquisition function and using particle gradient flows (PGFs). Sections II and III present more details on the JOREK simulations and subsequently on the BO methodology used. Results are shown on how calibration was tested on different JOREK models, describing the construction of the discrepancy function on which the BBO framework was applied.

JOREK is a nonlinear MHD code developed as a versatile method to study instabilities of tokamak plasmas and their control.6,7,16 More specifically, it was created to gain more knowledge on ELMs, and in recent years qualitative and quantitative comparisons between JOREK output and simulations have been explored.6,12,13

The model used in the work presented here17 focuses on five variables: the poloidal magnetic flux ψ, the electric potential Φ, the parallel velocity v / /, mass density ρ, and temperature T. It is a so-called reduced MHD model as it assumes that the perpendicular velocity lies in the poloidal plane and the toroidal magnetic field is constant in time. The code allows for a grid that realistically represents the tokamak, with a geometry that includes an X-point and the divertor and a particular focus on the scrape-off layer and other areas of interest for ELMs. For a more complete description of the reduced MHD equations and numerical schemes of the JOREK code, the reader may refer to the main reference for the JOREK code.6 

In the use-cases presented here, two reduced models are considered, where the parallel velocity v / / is either included or excluded. The first use-case focuses on examples of tearing mode instabilities without parallel velocity, while the second use-case addresses simulations of ELMs in X-point geometry, using a representative equilibrium, and including the parallel velocity. Section III describes the BBO framework to calibrate JOREK simulations.

Bayesian optimization (BO) methods18 are a class of general purpose optimization algorithms aimed at optimizing black-box functions, i.e., functions which are computationally expensive to evaluate and for which ancillary information, such as gradients, is not readily available.

More formally, BO seeks to solve the global optimization problem as follows:
x * arg min x D f ( x ) ,
(2)
where f : D R is a function over a domain D R d and point evaluations of f are expensive and possibly noisy.
Suppose we have already made n evaluations of the objective function f given by D n = { ( x 1 , y 1 ) , , ( x n , y n ) }, where y i = f ( x i ). In order to circumvent the partial knowledge we have on the function f, a probabilistic surrogate model of the objective is constructed from previous function evaluations using Bayesian inference. This will yield an estimate of the objective function, thanks to which we are able to assess the uncertainty linked to the unobserved area of the domain over which we are optimizing the function, while also retaining information on the current points that we evaluated the objective at. Gaussian processes (GP)19 are a common choice for surrogate in this setting. More formally, we impose a Gaussian process prior f G P ( μ , k ) to model the intractable objective. This means that if we consider any collection of points x 1 , , x n, the joint distribution of the vector [ f ( x 1 ) , , f ( x n ) ] is multivariate Gaussian and the process is completely identified by its mean function μ ( x i ) = E [ f ( x i ) ] and by a kernel function of choice k ( x i , x j ) = Cov ( f ( x i ) , f ( x j ) ), which will determine the covariance structure between any two points. In order to incorporate the knowledge on the observed data D n, we condition the process on the information available, and the resulting updated surrogate model is a random function fn which satisfies
f n ( x ) N ( μ n , Σ n ) , x D ,
where the mean μ n and variance Σn are given by
μ n = k ( x , X ) k ( X , X ) 1 y , Σ n = k ( x , x ) k ( x , X ) k ( X , X ) 1 k ( X , x ) ,
(3)
where k ( x , X ) = ( k ( x , x 1 ) , , k ( x , x n ) ) and the matrix ( k ( X , X ) ) i j = k ( x i , x j ). In order words, we have imposed a stochastic structure, given by a normal distribution, so that we can easily quantify our incomplete knowledge on the expensive objective.
The BO process seeks to select new evaluation points x n + 1 by taking into account both the available information on the objective function and the uncertainty provided by the surrogate model. BO methods generally construct an acquisition function α, which quantifies the utility of evaluating any given point for the purpose of retrieving a global minimum of the true objective f. This utility is defined with the goal of exploiting what is currently known about f, e.g., prior function evaluations, while also exploring the area of the space D in which the function has not been evaluated at. The maximization of the acquisition function is referred to as the inner optimization loop, and does not require additional evaluations of f and can be formalized as follows:
x n + 1 arg max x D α ( x | D n ) ,
(4)
where we highlight the direct dependency on the data available D n. This informs the outer optimization problem, i.e., the optimization of the original objective f. Once the function evaluation of f ( x n + 1 ) is made available, the algorithm restarts with the updated dataset D n + 1, by reconditioning the GP on the novel information and by selecting a new point for the purpose of global optimization. For a comprehensive overview of the BO framework and more details on acquisition functions, we refer the reader to Garnett's book on Bayesian optimization.20 

Bayesian methods have been applied to fusion plasma physics for the scope of model calibration before. For example, this framework has been used in conjunction with approximate Bayes computation (ABC) for the calibration of the fluid-kinetic model DREAM21 with a specific focus on runaway electrons,22 which is another leading cause of concerns for the integrity and duty cycle of future fusion reactors. In addition, we refer the reader to another approach making use of the novel Validation via Iterative Training of Active Learning Surrogates (VITALS), which the authors used to study L-mode plasmas in the Alcator C-Mod tokamak and for the validation process of the quasi-linear transport models TGLF-SAT1 and TGLF-SAT0.23 

Batch Bayesian optimization (BBO) methods seek to propose a batch of multiple evaluation points in each step. This is advantageous in settings where the multiple evaluations can be evaluated in parallel across several processors. While this enables the acquisition of a larger amount of information per step, it is important to note that this will increase the complexity of the inner optimization loop, as the acquisition function will need to be adjusted for this setting. At each step of a batch BO method, we seek to find a batch of N points to evaluate the optimization target: ideally these functions' evaluations are run in parallel in order to be as efficient as possible. More formally, assume we have already made n evaluations of the objective function f given by D n. Differently from the standard BO approach, the batch BO framework seeks to select the next batch of evaluation points { x n + 1 , , x n + N }.

In this setting, acquisition functions need to be adapted to allow a multi-point selection strategy. The multi-point expected improvement, known as q-EI, is a commonly used acquisition function for batch BO methods,24–26 which is defined as follows:
q EI ( x 1 , , x q ) = E [ max ( 0 , f n * min i = 1 , , q f ( x i ) ) | D n ] ,
(5)
where f n * = min x D n f ( x ).
Acquisition functions, especially in the batch setting, are typically non-convex, with many local maxima, and intractable,27 due to the complex, high-dimensional structure imposed by the batch. In recent work,15 an alternative probabilistic reformulation of BBO was investigated, where the inner optimization loop can be viewed as the optimization of an acquisition functional based on q-EI, defined over the space of probability measures. More formally, one can define
F [ μ ] = E x 1 , , x q μ [ E f [ max ( 0 , f n * min i = 1 , , q f ( x i ) ) | D n ] ]
(6)
as the probabilistic q-EI, which maximizes on average the multi-point expected improvement. The focus is, therefore, shifted on the joint distribution of the particles' location, not the points themselves. By maximizing this quantity, the aim is to retrieve a distribution which achieves the q-EI optimum in expectation. This formulation has various advantages, most notably that F [ μ ] is provably concave over μ. In order to ensure diversity in the batch, and guarantee strict concavity, regularization is applied by introducing a negative log-entropy term of the following form:
H [ ν ] : = Reg [ μ ] = q log μ ( x ) μ ( x ) d x ,
(7)
so that the inner optimization problem becomes
find μ α * arg max μ { L α [ μ ] : = F [ μ ] + α Reg [ μ ] } ,
(8)
where α > 0.

In this framework, the batch consists of a collection of non-independent samples from the mono-point distribution μ α *, which is obtained by using particle gradient flows, a set of methods that define strategies to propagate an initial density so that it will converge asymptotically to a distribution of interest. Two specific gradient flows for (8) are derived in the original paper,15 one based on the Stein geometry28 and another using the Wasserstein geometry.29 

In practice, the batch is selected by fixing an initial set x 0 R Nxd, and then points are propagated through a loop over T steps, where at each step we perform
x t + 1 x t + ε Φ α ( x r t ) , t = 0 , , T 1.
(9)

Here, ε represents a step size parameter for the optimization and α is the regularization parameter which we introduced in Eq. (8). The quantity Φ : R Nxd R Nxd is an update step which can be defined according to the Stein geometry or the Wasserstein geometry, and for the full derivation we refer to the original paper.15 The set of particles x T R Nxd, where N indicates the ensemble used to approximate the optimal distribution μ α * in Eq. (8), is the batch that can be evaluated in parallel.

All the components needed to perform the calibration of JOREK are now available. The steps for the calibration are listed in Algorithm 1. The iterative procedure is schematic and straightforward: given some information on the discrepancy d x o, the BO algorithm selects a set of parameters θ 1 , , θ N, which will be used to simulate JOREK. Once a simulation is finished and the related data are produced, it proceeds with updating our knowledge on the discrepancy function. Therefore, this cycle is repeated until a satisfactory match between the reference and simulation is obtained. In general, different stopping criteria for the outer loop can be implemented, in this case a fixed integer I is set to represent the number of iterations of the BO selection process.

Algorithm 1.

Automatic JOREK calibration.

Input Dataset D n = { θ i , d x o ( J ( θ i ) ) } i = 1 n, batch size N, and budget for BO iterations I
Output Updated dataset D n + I * N 
 1: for i = 1 , , I do 
 2: Fit a GP to D n
 3: Run BBO via PGF conditioned on D n using Eq. (9), and obtain a set { θ 1 , , θ N } of locations to minimize d x o
 4: Evaluate JOREK in parallel ( J ( θ j ) ) for j = 1 , , N
 5: D n D n { θ j , d x o ( J ( θ j ) ) } j = 1 N
 6: end for 
Input Dataset D n = { θ i , d x o ( J ( θ i ) ) } i = 1 n, batch size N, and budget for BO iterations I
Output Updated dataset D n + I * N 
 1: for i = 1 , , I do 
 2: Fit a GP to D n
 3: Run BBO via PGF conditioned on D n using Eq. (9), and obtain a set { θ 1 , , θ N } of locations to minimize d x o
 4: Evaluate JOREK in parallel ( J ( θ j ) ) for j = 1 , , N
 5: D n D n { θ j , d x o ( J ( θ j ) ) } j = 1 N
 6: end for 

This section presents the two JOREK use-cases that were used with the reduced MHD model, namely, the Tearing mode instability without parallel velocity and the ELM simulation with parallel velocity. Both examples have been presented in previous JOREK studies.6 A discrepancy is defined for the optimization, which represents the difference between the reference data and the simulated data, which need to be reduced as much as possible to make the two outputs match. For all examples, a reference experiment is generated from JOREK itself, and then the BO algorithm is used to trace back the exact parameters that generated it. For these examples, the focus was on the following input parameters:

  • η: Resistivity at plasma center.

  • μ: Viscosity at plasma center.

For the tearing mode use-case, we consider the problem of calibrating the above two parameters. We additionally demonstrate a more complex case where we calibrate two additional input parameters: D (the perpendicular particle diffusion) and κ (the perpendicular heat diffusion).

For a more detailed overview of the BBO method we use to calibrate JOREK, and the initial setup of each of the examples, we refer the reader to Sec. IV C. More specifically, we specify the initial exploratory simulations performed in Table II and provide more details on the parameters used for the BBO routine in Table III.

As a first test case, the automatic tuning of a tearing mode in a large aspect-ratio circular plasma is considered. More information on this tokamak core plasma instability and an example can be found in Hoelzl et al.6 (e.g., Fig. 18 of that paper). To test the approach, reference data from a JOREK simulation is used for a specific set of “ground-truth” parameters. The objective is to recover these parameters through the calibration process. The simulations of JOREK are run for a total of 150 000 normalized time units, which corresponds to approximately 75 ms. Each simulation takes around 4 min to complete. At each BO step, a batch of ten JOREK simulations is evaluated in parallel over a distributed HPC cluster. Two calibration problems are performed: a two-variable problem for θ =  ( η , μ ) (example 1) and a more complex one over four parameters θ =  ( η , μ , D , κ ) (example 2).

The QoIs of the calibration are chosen to be the time evolution of the kinetic and magnetic energies of the tearing mode instability. More formally, x θ = ( K ( θ ) , M ( θ ) ) is chosen, where K ( θ ) and M ( θ ) are the time-traces of the kinetic and magnetic energies evaluated at time units 0 , , 150 000. The calibration of the model is done against measurements x o = ( K ( θ true ) , M ( θ true ) ) obtained from JOREK for a ground-truth parameter θtrue. Then, for the two-variable problem, the optimization aims to solve
θ * arg min θ | | K sim ( θ ) K true ( θ ̂ ) | | ,
(10)
where θ represents the input parameters η and μ. For the four-variable example, we observed that the kinetic energy alone does not yield a strong enough discrepancy, so we also added the magnetic energy. More formally, the updated discrepancy becomes
θ * arg min θ | | K sim ( θ ) K true ( θ ̂ ) | | + | | M sim ( θ ) M true ( θ ̂ ) | | ,
(11)
where θ represents the input parameters η, μ, D , and κ . Results of both calibration processes are available in Figs. 1 and 2, respectively, for the two- and four-variable case. Further details on the configuration for each example is available in Table I. The results suggest that the QoIs are highly sensitive to the choice of η and strongly inform the calibration process, while they are significantly less sensitive to the other parameters. Overall, the results suggest that the proposed calibration approach is achievable.
FIG. 1.

Calibration of the tearing mode use-case, example 1. Top figure: best initial explorative simulation. Lower figure: simulation from JOREK with calibrated parameters. In very few iterations of BO, the “true” parameters θ true are recovered. The dashed lines are the true reference simulation traces, and the solid lines are the ones achieved by the BO calibration. The orange and red traces are the kinetic energies of the tearing mode itself, while the blue and green traces are the kinetic energies of the background equilibrium.

FIG. 1.

Calibration of the tearing mode use-case, example 1. Top figure: best initial explorative simulation. Lower figure: simulation from JOREK with calibrated parameters. In very few iterations of BO, the “true” parameters θ true are recovered. The dashed lines are the true reference simulation traces, and the solid lines are the ones achieved by the BO calibration. The orange and red traces are the kinetic energies of the tearing mode itself, while the blue and green traces are the kinetic energies of the background equilibrium.

Close modal
FIG. 2.

Calibration of the tearing mode use-case, example 2. Top figure: best initial explorative JOREK simulation. Lower figure: calibrated JOREK simulation.

FIG. 2.

Calibration of the tearing mode use-case, example 2. Top figure: best initial explorative JOREK simulation. Lower figure: calibrated JOREK simulation.

Close modal
TABLE I.

Details on the experiments and final results. Examples 1 and 2 refer to the tearing mode, whereas examples 3 and 4 refer to the ELM crash. For each example, we specify the number of BO iterations I performed. In total, for each example, the number of JOREK simulations performed equates 10 * I.

Reference Calibrated parameters I
η μ D κ η μ D κ
Example 1  1 × 10−6  1 × 10−7  ⋯  ⋯  10.01 × 10−7  9.59 × 10−8  ⋯  ⋯ 
Example 2  1 × 10−6  1 × 10−7  1 × 10−7  1 × 10−7  9.90 × 10−7  8.86 × 10−8  6.93 × 10−8  17.51 × 10−8  30 
Example 3  1 × 10−6  1 × 10−6  ⋯  ⋯  10.08 × 10−7  10.16 × 10−7  ⋯  ⋯  11 
Example 4  1 × 10−6  1 × 10−6  ⋯  ⋯  10.64 × 10−7  10.29 × 10−7  ⋯  ⋯  10 
Reference Calibrated parameters I
η μ D κ η μ D κ
Example 1  1 × 10−6  1 × 10−7  ⋯  ⋯  10.01 × 10−7  9.59 × 10−8  ⋯  ⋯ 
Example 2  1 × 10−6  1 × 10−7  1 × 10−7  1 × 10−7  9.90 × 10−7  8.86 × 10−8  6.93 × 10−8  17.51 × 10−8  30 
Example 3  1 × 10−6  1 × 10−6  ⋯  ⋯  10.08 × 10−7  10.16 × 10−7  ⋯  ⋯  11 
Example 4  1 × 10−6  1 × 10−6  ⋯  ⋯  10.64 × 10−7  10.29 × 10−7  ⋯  ⋯  10 

To demonstrate the efficacy of the approach, the problem of calibrating an ELM simulation is addressed. For more information regarding this example, please refer to Hoelzl et al.6 (e.g., Sec. 5 of the paper). The reduced MHD model with parallel velocity used here is able to capture the peeling–ballooning dynamics of ELMs with finite toroidal mode numbers, and provide a measure of the heat fluxes incident on the divertor targets [at the bottom of the domain as shown in Fig. 3(d)] as a result of the ELM crash.

FIG. 3.

(a)–(d) Time evolution of density ρ as simulated by JOREK. (e) Heat flux emitted on the rightmost divertor, located at the bottom right section of (a)–(d) as the experiments progress some filaments appear to be emitted onto that region, which is circled in (d).

FIG. 3.

(a)–(d) Time evolution of density ρ as simulated by JOREK. (e) Heat flux emitted on the rightmost divertor, located at the bottom right section of (a)–(d) as the experiments progress some filaments appear to be emitted onto that region, which is circled in (d).

Close modal

As before, a reference experiment with a fixed set of parameters η and μ is run and compared with simulations. These ELM simulations require to start first with the evolution of the toroidally axisymmetric equilibrium before continuing with a finite toroidal mode number. A total of 325 steps with varying time lengths are run, with each simulation taking around 1 and 2 h to complete. As above, at each step of the BO procedure, a batch of ten θs is selected that will be used in new simulations. The input parameters η and μ are calibrated using the time series of the kinetic energy of the ballooning modes. The results of this calibration are shown in Fig. 4 with more details on the optimization performed available in Table I. The figures clearly demonstrate that the batch BO algorithm successfully calibrated the parameters.

FIG. 4.

Calibration of the ELM use-case, example 3. Top figure: best initial explorative JOREK simulation. Lower figure: calibrated JOREK simulation. In very few iterations of BO, we are able to identify the generating parameter θ true.

FIG. 4.

Calibration of the ELM use-case, example 3. Top figure: best initial explorative JOREK simulation. Lower figure: calibrated JOREK simulation. In very few iterations of BO, we are able to identify the generating parameter θ true.

Close modal
Figure 4 highlights how the reference simulation can be sectioned in three parts: an initialization phase, followed by a linear ramp up, followed then by an oscillating or nonlinear phase. This last nonlinear phase is a matter of interest as it corresponds to the period in which ELMs become visible by releasing large amounts of energy from the confined plasma onto the divertor targets of the tokamak, as shown in Fig. 3(d). Hence, while the first experiment using the standard discrepancy based on kinetic and magnetic energies, defined in Eq. (11), will have a predominant focus on the linear phase, an alternative objective is designed by focusing on the release of energy onto the plasma facing components, using the divertor heat fluxes, taken radially along the divertor at a single toroidal location. The divertor of the tokamak can be visualized in Fig. 3 where the time evolution of the plasma density ρ is presented, together with the divertor heat flux on the outer target. More formally, Figs. 3(a)–3(d) show the evolution of the density in the poloidal cross section of the tokamak, whereas Fig. 3(e) is a matrix D o ( s , t ), for s = 1 , , S and t = 1 , , T, in which the color represents the heat flux on the right outer divertor as a function of time. Here, s indicates the location on the divertor and t time. Let D θ ( s , t ) be the output of a simulation with a fixed θ, then the following loss function is used:
d ( x θ , x o ) = ( 1 S 1 T s = 1 S T = 1 T | D θ ( s , t ) D o ( s , t ) | ) 0.5 ,
(12)
where x θ = ( D θ ( s , t ) : s = 1 , , S , t = 1 , , T ) and x o = ( D o ( s , t ) : s = 1 , , S , t = 1 , , T ).

In this way, using the heat fluxes will make the divergence be more susceptible to the nonlinear phase and the outbursts of energy on the divertor themselves. This is clear from Fig. 3(e), as we can observe that the onset of the peeling–ballooning instability has a bigger magnitude than the linear phase, which was not as distinguishable as when using the kinetic and magnetic energies. In addition, this is directly relevant to routine research of ELM simulations with the JOREK code, since quantitative comparison against infrared measurements of the divertor heat fluxes is often used as a means to validate simulations and the choice of unknown input parameters, such as μ.12 Results of the calibration are available in Fig. 5 and more details are presented in Table I (example 4).

FIG. 5.

Calibration of the ELM use-case, using an energy profile emitted on the divertor, example 4. Top figure: initial explorative JOREK simulation. Middle figure: calibrated JOREK simulation. Lower figure: reference simulation.

FIG. 5.

Calibration of the ELM use-case, using an energy profile emitted on the divertor, example 4. Top figure: initial explorative JOREK simulation. Middle figure: calibrated JOREK simulation. Lower figure: reference simulation.

Close modal

In this section, we present more details on the BO algorithm deployed. Before we start the BBO procedure, we need to collect some information on the discrepancy, more formally, we need to run an initial explorative JOREK simulation. In Table II, we have listed the values of the QoI for the initial simulations for each example. Using these starting simulations, we can proceed with the BO procedure. First, we implement a Gaussian process prior to using a Matérn52 kernel and the code is implemented using the jax library.30 As explained in Sec. III A, the pivotal part of BO is the inner optimization loop. The initial particles' location, namely, x 0 is sampled randomly from a certain domain, whose bounds are specified for each experiment in Table I. Finally, we choose T by controlling the norm of Φ: we let the chain run for long enough until the particles do not move anymore. For each of the four examples presented, we specify which gradient flows we use, for how many steps T we run the optimization of the inner loop step, alongside ε and α, and we report these quantities in Table III.

TABLE II.

Initial JOREK simulations before starting the BBO procedure for each example. Examples 1 and 2 refer to the tearing mode, whereas examples 3 and 4 refer to the ELM crash.

η μ D κ
Example 1  1 × 10−7  1 × 10−8  ⋯  ⋯ 
1 × 10−7  20 × 10−8  ⋯  ⋯ 
Example 2  8.352 × 10−7  4.765 × 10−8  4.838 × 10−8  12.160 × 10−8 
14.116 × 10−7  8.269 × 10−8  16.058 × 10−8  8.447 × 10−8 
2.336 × 10−7  13.990 × 10−8  6.185 × 10−8  11.314 × 10−8 
10.835 × 10−7  11.247 × 10−8  13.615 × 10−8  18.687 × 10−8 
13.748 × 10−7  15.517 × 10−8  0.371 × 10−8  17.731 × 10−8 
Example 3  10 × 10−7  60 × 10−7  ⋯  ⋯ 
Example 4  10 × 10−7  60 × 10−7  ⋯  ⋯ 
η μ D κ
Example 1  1 × 10−7  1 × 10−8  ⋯  ⋯ 
1 × 10−7  20 × 10−8  ⋯  ⋯ 
Example 2  8.352 × 10−7  4.765 × 10−8  4.838 × 10−8  12.160 × 10−8 
14.116 × 10−7  8.269 × 10−8  16.058 × 10−8  8.447 × 10−8 
2.336 × 10−7  13.990 × 10−8  6.185 × 10−8  11.314 × 10−8 
10.835 × 10−7  11.247 × 10−8  13.615 × 10−8  18.687 × 10−8 
13.748 × 10−7  15.517 × 10−8  0.371 × 10−8  17.731 × 10−8 
Example 3  10 × 10−7  60 × 10−7  ⋯  ⋯ 
Example 4  10 × 10−7  60 × 10−7  ⋯  ⋯ 
TABLE III.

Details on the parameters used for each experiment. We specify the parameters used at each iteration. Examples 1 and 2 refer to the tearing mode, whereas examples 3 and 4 refer to the ELM crash.

Iterations ε α T PGF
Example 1  0.01  0.001  3000  Stein 
2–5    5000 
Example 2  1–10  0.01  5 × 10−4  10 000  Wasserstein 
10–30  1 × 10−4 
Example 3  0.01  0.001  10 000  Wasserstein 
2–11    15 000 
Example 4  1,2  0.01  0.01  12 500  Wasserstein 
0.001 
4–10  5 × 10−4 
Iterations ε α T PGF
Example 1  0.01  0.001  3000  Stein 
2–5    5000 
Example 2  1–10  0.01  5 × 10−4  10 000  Wasserstein 
10–30  1 × 10−4 
Example 3  0.01  0.001  10 000  Wasserstein 
2–11    15 000 
Example 4  1,2  0.01  0.01  12 500  Wasserstein 
0.001 
4–10  5 × 10−4 

The most influential parameter that controls the precision of the BBO algorithm is the regularization α. This term controls how sparse the batch for maximization of the inner loop is, i.e., the actual parameters used for JOREK simulations, and the lower it is, the less diverse are the particles. Generally, we fix it at the beginning as relatively high and let it decrease as we progress with the calibration. We would like the BBO method to explore more at the beginning of the experiment, in order to retrieve a general approximation of the discrepancy. Once we have a general landscape for the underlying discrepancy, we can proceed to exploit the information collected more aggressively, by letting points get close to each other. This strategy was performed in the more difficult examples, i.e., examples 2 and 4, whereas for the simpler cases this strategy was not necessary. Another way to enforce exploitation is to restrict the bounds over which we sample the starting points x 0 for the chain. The values of the bounds over which we perform the optimization can be found in Table IV. The strategy of restricting the bounds to enforce exploitation was performed for example 4, where at iteration 8 we restricted the domain to (5,40) × (0,70)1 × 10−7, as we observed local minima to be only in that area. Similarly, for example 3, we change the domain to (5,20) × (5,20)1 × 10−7 at iteration 9. However, in order to promote exploration of the space, we add a white noise kernel to the GP kernel. In this way, especially in the case that the function to optimize presents a plateau, particles are allowed to move more freely in an area of low variation thanks to the added noise. We use this strategy in example 3: the discrepancy (11) in this case presents a large area that produces simulations similar to the reference one.

TABLE IV.

For each example, we specify the bounds of the domain over, which we define the calibration. For example 4 at iteration 8, we restricted the domain to (5,40) × (0,70)1 × 10−7, as we observed local minima to be only in that area. Similarly, for example 3 at iteration 9, we restricted the domain to (5,20) × (5,20)1 × 10−7.

Bounds
η μ D κ
Example 1  (0,20)*1 × 10−7  (0,20)*1 × 10−8  ⋯  ⋯ 
Example 2  (0,20)*1 × 10−7  (0,20)*1 × 10−8  (0,20)*1 × 10−8  (0,20)*1 × 10−8 
Example 3  (0,40)*1 × 10−7  (0,40)*1 × 10−7  ⋯  ⋯ 
Example 4  (0,80)*1 × 10−7  (0,80)*1 × 10−7  ⋯  ⋯ 
Bounds
η μ D κ
Example 1  (0,20)*1 × 10−7  (0,20)*1 × 10−8  ⋯  ⋯ 
Example 2  (0,20)*1 × 10−7  (0,20)*1 × 10−8  (0,20)*1 × 10−8  (0,20)*1 × 10−8 
Example 3  (0,40)*1 × 10−7  (0,40)*1 × 10−7  ⋯  ⋯ 
Example 4  (0,80)*1 × 10−7  (0,80)*1 × 10−7  ⋯  ⋯ 

The test cases presented, while relying on JOREK simulations and not real data, offer interesting insights into the feasibility of automatic calibration of JOREK to experimental data. While the tearing mode use-case is straightforward in the two variable setting, the example over four variables required more JOREK evaluations to reach satisfactory results. As shown in Table I, it required us 30 BO iterations for the higher dimensional example to reach a satisfactory result, whereas only 5 BO iterations were sufficient for the two variable setting to replicate the reference data. As the full parameter space for JOREK is substantially larger than the one explored in this demonstrative study, it is expected that a full parametric calibration of ELM simulations would require substantial computational resources. As demonstrated in the last example, this can be mitigated by carefully constructing loss functions tailored to QoIs under consideration. Additionally, leveraging alternative surrogate models which scale better to high-dimensional problems would also help address this challenge. The examples presented here also showed how most parameters do not affect the solution in the same way and that the loss function is highly non-convex with many areas that can generate JOREK solutions with similar characteristics to the reference. This issue of the identifiability and uniqueness of the optima is exacerbated in the ELM use-case, especially in the case where we consider the new discrepancy which makes use of the energy profiles, defined in Eq. (12). This is particularly evident in Fig. 6, where the contour of the discrepancy is plotted by interpolating the points selected by the calibration framework. The algorithm not only correctly identifies the area where the minimum lies but also explores a larger local minimum, which provides sub-optimal solutions but close to the reference experiment. Once again, the construction of more sophisticated loss functions can play a big role in addressing this issue.

FIG. 6.

Contour plot for the discrepancy defined for the calibration of the ELM crash as given by Eq. (12), example 4. The black star represents the reference parameters θtrue, while the isolines give the estimated shape of the discrepancy using the points at which we evaluated the discrepancy. While we are able to correctly identify the area where the true minimum lies, there is a significant area of local minima, corresponding to ( 20 , 40 ) × ( 20 , 80 ), which makes the optimization of the discrepancy a hard task.

FIG. 6.

Contour plot for the discrepancy defined for the calibration of the ELM crash as given by Eq. (12), example 4. The black star represents the reference parameters θtrue, while the isolines give the estimated shape of the discrepancy using the points at which we evaluated the discrepancy. While we are able to correctly identify the area where the true minimum lies, there is a significant area of local minima, corresponding to ( 20 , 40 ) × ( 20 , 80 ), which makes the optimization of the discrepancy a hard task.

Close modal

In conclusion, in this paper, we have defined a framework for the automatic tuning of the expensive nonlinear MHD code JOREK. We have highlighted difficulties and challenges related to this task through four test cases, where we used as reference experiments simulations from JOREK itself. BO approaches are a natural way of dealing with expensive models for the purpose of calibration, as they replace the unobserved and possibly noisy evaluations by a surrogate model that is easy to work with. We leave for future work the tuning with respect to real experimental data, as we expect that these difficulties will be exacerbated in this extension. Experimental errors and the quality of the data could significantly affect the outcome of the proposed optimization procedure, and the quantification of these terms in relation to our proposed methodology is left as a subject of further studies. Despite this, the surrogate model approach given in the BO framework provides a flexible way to incorporate any possible experimental error in the data. Gaussian processes can take into account uncertainties in a quantitative approach: by modifying the kernel function of choice, one can address various assumptions on the data available. In addition to this, more realistic cases will require the deployment of more complex and expensive models for simulations, while focusing on a much larger parameter space. We believe that extending our BO approach to more sophisticated surrogate models and carefully designing more informed discrepancy functions can help address all these concerns, which will require a separate and extensive study. In addition, while the BO method allows to find a point estimate for the calibration problem, the investigation of the alternative methods that allow a faithful reconstruction of the whole discrepancy could help address the problem of the uniqueness of the solution. As an example, reframing the calibration of JOREK as a Bayesian inverse problem that will target a full posterior distribution of parameters, which will quantify the probability that some parameter can replicate some observed output, would allow to circumvent some of the problems related to sub-optimal solutions while allowing for a better interpretability of the role of each input parameter.

This work has been carried out within the framework of the EUROfusion Consortium, funded by the European Union via the Euratom Research and Training Programme (Grant Agreement No. 101052200—EUROfusion). The views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them. This work was partially carried out using EUROfusion High Performance Computer Marconi-Fusion through EUROfusion funding. A.D. was supported by Wave 1 of The UKRI Strategic Priorities Fund under EPSRC Grant No. EP/T001569/1 and EPSRC Grant No. EP/W006022/1, particularly the “Ecosystems of Digital Twins” theme within those grants and The Alan Turing Institute.

The authors have no conflicts to disclose.

E. Crovini: Conceptualization (lead); Writing – original draft (lead). S. J. P. Pamela: Conceptualization (supporting); Supervision (equal); Writing – review & editing (equal). A. B. Duncan: Conceptualization (supporting); Supervision (equal); Writing – review & editing (equal).

The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.10391738, Ref. 31.

1.
N.
Holtkamp
and
I. P.
Team
, “
An overview of the ITER project
,”
Fusion Eng. Des.
82
,
427
434
(
2007
).
2.
H.
Anand
,
O.
Bardsley
,
D.
Humphreys
,
M.
Lennholm
,
A.
Welander
,
Z.
Xing
,
J.
Barr
,
M.
Walker
,
J.
Mitchell
, and
H.
Meyer
, “
Modelling, design and simulation of plasma magnetic control for the Spherical Tokamak for Energy Production (STEP)
,”
Fusion Eng. Des.
194
,
113724
(
2023
).
3.
M.
Keilhacker
,
G.
Becker
,
K.
Bernhardi
,
A.
Eberhagen
,
M.
ElShaer
,
G.
FuBmann
,
O.
Gehre
,
J.
Gernhardt
,
G. v
Gierke
,
E.
Glock
et al, “
Confinement studies in L and H-type Asdex discharges
,”
Plasma Phys. Controlled Fusion
26
,
49
(
1984
).
4.
A.
Kirk
,
N. B.
Ayed
,
G.
Counsell
,
B.
Dudson
,
T.
Eich
,
A.
Herrmann
,
B.
Koch
,
R.
Martin
,
A.
Meakins
,
S.
Saarelma
,
R.
Scannell
,
S.
Tallents
,
M.
Walsh
,
H. R.
Wilson
, and
MAST team
, “
Filament structures at the plasma edge on MAST
,”
Plasma Phys. Controlled Fusion
48
,
B433
(
2006
).
5.
J.
Connor
,
R.
Hastie
,
H.
Wilson
, and
R.
Miller
, “
Magnetohydrodynamic stability of tokamak edge plasmas
,”
Phys. Plasmas
5
,
2687
2700
(
1998
).
6.
M.
Hoelzl
,
G. T.
Huijsmans
,
S.
Pamela
,
M.
Bécoulet
,
E.
Nardon
,
F.
Artola
,
B.
Nkonga
,
C.
Atanasiu
,
V.
Bandaru
,
A.
Bhole
et al, “
The JOREK non-linear extended MHD code and applications to large-scale instabilities and their control in magnetically confined fusion plasmas
,”
Nucl. Fusion
61
,
065001
(
2021
).
7.
G. T. A.
Huysmans
and
O.
Czarny
, “
MHD stability in X-point geometry: Simulation of ELMs
,”
Nucl. Fusion
47
,
659
(
2007
).
8.
A. H.
Nielsen
,
G.
Xu
,
J.
Madsen
,
V.
Naulin
,
J. J.
Rasmussen
, and
B.
Wan
, “
Simulation of transition dynamics to high confinement in fusion plasmas
,”
Phys. Lett. A
379
,
3097
3101
(
2015
).
9.
N. M.
Ferraro
and
S. C.
Jardin
, “
Calculations of two-fluid magnetohydrodynamic axisymmetric steady-states
,”
J. Comput. Phys.
228
,
7742
7770
(
2009
).
10.
C.
Sovinec
,
A.
Glasser
,
T.
Gianakon
,
D.
Barnes
,
R.
Nebel
,
S.
Kruger
,
S.
Plimpton
,
A.
Tarditi
,
M.
Chu
, and
NIMROD team
, “
Nonlinear magnetohydrodynamics with high-order finite elements
,”
J. Comput. Phys.
195
,
355
(
2004
).
11.
B.
Dudson
,
M.
Umansky
,
X.
Xu
,
P.
Snyder
, and
H.
Wilson
, “
BOUT++: A framework for parallel plasma fluid simulations
,”
Comput. Phys. Commun.
180
,
1467
1480
(
2009
).
12.
S.
Pamela
,
G.
Huijsmans
,
T.
Eich
,
S.
Saarelma
,
I.
Lupelli
,
C.
Maggi
,
C.
Giroud
,
I.
Chapman
,
S.
Smith
,
L.
Frassinetti
et al, “
Recent progress in the quantitative validation of JOREK simulations of ELMs in JET
,”
Nucl. Fusion
57
,
076006
(
2017
).
13.
A.
Cathey
,
M.
Hoelzl
,
K.
Lackner
,
G. T. A.
Huijsmans
,
M. G.
Dunne
,
E.
Wolfrum
,
S. J. P.
Pamela
,
F.
Orain
,
S.
Günter
,
JOREK team
,
ASDEX Upgrade team
, and
EUROfusion MST1 team
, “
Non-linear extended MHD simulations of type-I edge localised mode cycles in ASDEX Upgrade and their underlying triggering mechanism
,”
Nucl. Fusion
60
,
124007
(
2020
).
14.
S.
Pamela
,
T.
Eich
,
L.
Frassinetti
,
B.
Sieglin
,
S.
Saarelma
,
G.
Huijsmans
,
M.
Hoelzl
,
M.
Becoulet
,
F.
Orain
,
S.
Devaux
et al, “
Non-linear MHD simulations of ELMs in JET and quantitative comparisons to experiments
,”
Plasma Phys. Controlled Fusion
58
,
014026
(
2016
).
15.
E.
Crovini
,
S. L.
Cotter
,
K.
Zygalakis
, and
A. B.
Duncan
, “
Batch bayesian optimization via particle gradient flows
,” arXiv:2209.04722 (
2022
).
16.
O.
Czarny
and
G.
Huysmans
, “
Bézier surfaces and finite elements for MHD simulations
,”
J. Comput. Phys.
227
,
7423
7445
(
2008
).
17.
H. R.
Strauss
, “
Nonlinear, three-dimensional magnetohydrodynamics of noncircular tokamaks
,”
Phys. Fluids
19
,
134
140
(
1976
).
18.
J.
Mockus
,
V.
Tiesis
, and
A.
Zilinskas
, “
The application of Bayesian methods for seeking the extremum
,” in
Towards Global Optimization
(
Springer
,
1978
), Vol.
2
,
117
129
.
19.
C. E.
Rasmussen
, “
Gaussian processes in machine learning
,” in
Summer School on Machine Learning
(
Springer
,
2003
), pp.
63
71
.
20.
R.
Garnett
,
Bayesian Optimization
(
Cambridge University Press
,
2023
).
21.
M.
Hoppe
,
O.
Embreus
, and
T.
Fülöp
, “
DREAM: A fluid-kinetic framework for tokamak disruption runaway electron simulations
,”
Comput. Phys. Commun.
268
,
108098
(
2021
).
22.
A. E.
Järvinen
,
T.
Fülöp
,
E.
Hirvijoki
,
M.
Hoppe
,
A.
Kit
,
J.
Åström
, and
J.
Contributors
, “
Bayesian approach for validation of runaway electron simulations
,”
J. Plasma Phys.
88
,
905880612
(
2022
).
23.
P.
Rodriguez-Fernandez
,
A.
White
,
A.
Creely
,
M.
Greenwald
,
N.
Howard
,
F.
Sciortino
, and
J.
Wright
, “
VITALS: A surrogate-based optimization framework for the accelerated validation of plasma transport codes
,”
Fusion Sci. Technol.
74
,
65
76
(
2018
).
24.
D.
Ginsbourger
,
R.
Le Riche
, and
L.
Carraro
, “
A multi-points criterion for deterministic parallel global optimization based on Gaussian processes
” (
2008
); available at https://hal-emse.ccsd.cnrs.fr/hal-00260579.
25.
D.
Ginsbourger
, “
Métamodèles multiples pour l'approximation et l'optimisation de fonctions numériques multivariables
,” Ph.D. thesis (
École Nationale Supérieure des Mines de Saint-Etienne
,
2009
); available at https://theses.hal.science/tel-00772384/.
26.
D.
Ginsbourger
,
R. L.
Riche
, and
L.
Carraro
, “
Kriging is well-suited to parallelize optimization
,” in
Computational Intelligence in Expensive Optimization Problems
(
Springer
,
2010
), pp.
131
162
.
27.
J.
Wilson
,
F.
Hutter
, and
M.
Deisenroth
, “
Maximizing acquisition functions for Bayesian optimization
,” in
Advances in Neural Information Processing Systems
(
NeurIPS
,
2018
), Vol.
31
.
28.
Q.
Liu
, “
Stein variational gradient descent as gradient flow
,” in
Advances in Neural Information Processing Systems
(
NeurIPS
,
2017
), Vol.
30
.
29.
L.
Ambrosio
,
N.
Gigli
, and
G.
Savaré
,
Gradient Flows: In Metric Spaces and in the Space of Probability Measures
(
Springer Science & Business Media
,
2005
).
30.
J.
Bradbury
,
R.
Frostig
,
P.
Hawkins
,
M. J.
Johnson
,
C.
Leary
,
D.
Maclaurin
,
G.
Necula
,
A.
Paszke
,
J.
VanderPlas
,
S.
Wanderman-Milne
, and
Q.
Zhang
(
2018
). “
JAX: Composable transformations of Python+NumPy programs
,” Github. http://github.com/google/jax
31.
E.
Crovini
(
2023
). “Automatic Jorek calibration via batch Bayesian optimization data,”
Zenodo
. https://doi.org/10.5281/zenodo.10391738