Spherical tokamaks have many desirable features that make them an attractive choice for a future fusion power plant. Power-plant viability is intrinsically related to plasma heat and particle confinement, and this is often determined by the level of micro-instability-driven turbulence. Accurate calculation of the properties of turbulent microinstabilities is therefore critical for tokamak design; however, the evaluation of these properties is computationally expensive. The considerable number of geometric and thermodynamic parameters and the high resolutions required to accurately resolve these instabilities make repeated use of direct numerical simulations in integrated modeling workflows extremely computationally challenging and create the need for fast, accurate, reduced-order models. This paper outlines the development of a data-driven reduced-order model, often termed a surrogate model for the properties of micro-tearing modes (MTMs) across a spherical tokamak reactor-relevant parameter space utilizing Gaussian process regression and classification, techniques from machine learning. These two components are used in an active learning loop to maximize the efficiency of data acquisition, thus minimizing computational cost. The high-fidelity gyrokinetic code GS2 is used to calculate the linear properties of the MTMs: the mode growth rate, frequency, and normalized electron heat flux, and core components of a quasi-linear transport model. Data cross-validation and direct validation on unseen data are used to ascertain the performance of the resulting surrogate models.

Spherical tokamaks possess a number of potential advantages when compared to conventional aspect ratio devices. These include improved vertical stability properties as well as the ability to access high β regimes of operation, allowing a more compact and ultimately cheaper reactor design. However, in these scenarios, electromagnetic instabilities, such as kinetic ballooning modes (KBM)1 and micro-tearing modes (MTM),2 are typically excited, both of which can drive radial heat propagation away from the plasma core and potentially degrade plasma performance.

In the case of MTMs, the turbulence distorts and reconnects the equilibrium magnetic field, resulting in microscopic magnetic island structures, which can overlap and cause field stochasticity, which is believed to be responsible for large radial electron heat transport.3–6 Consequently, being able to reliably predict the critical gradients at which MTMs are expected to be driven unstable has considerable value for the design of performant reactor scenarios.

The most appropriate framework for describing turbulent modes in the core of a tokamak is the gyrokinetic framework.7,8 Running nonlinear gyrokinetic codes routinely within integrated models for design space exploration and turbulence modeling is very costly, particularly for electromagnetic modes like MTMs, which exhibit extremely fine radial structure and consequently require very high numerical resolution. Even local linear gyrokinetic simulations are currently prohibitively expensive in the context of integrated modeling over confinement timescales, being many orders of magnitude slower than currently used reduced turbulence models such as TGLF.

An alternative to repeated direct use of high-fidelity models in integrated workflows is to use a training dataset to build a data-driven reduced-order model of the simulation outputs using machine learning-based techniques. This is an especially attractive approach in the case of MTMs due to the lack of sufficiently accurate physics-based reduced-order models valid for the highly shaped plasmas found in spherical tokamaks (STs); however, progress is being made in the development of valid reduced models.9,10

Machine learning (ML) and uncertainty quantification (UQ) techniques have already been extensively used in transport and turbulence modeling,11–16 including with fully nonlinear plasma turbulence simulations in a limited way,17,18 to create predictive surrogate models. Furthermore, neural networks trained on experimental databases19 and on reduced micro-tearing mode models20,21 have been developed showing speed-ups of almost two orders of magnitude over conventional models, and active learning pipelines have been shown to be able to create new training data efficiently.22,23 However, the physical simplifications and restricted parameter spaces used in these works, which help reduce the number of simulations required and the cost of those simulations, make them unsuitable for use in the predictive design of a spherical tokamak power plant. Consequently, new reduced-order models are required that are suitable for STs, both in terms of the parameter space explored and the fidelity of model used.

Local linear gyrokinetic codes24–26 permit evaluation of the frequency, growth rates, normalized radial fluxes, and the eigenfunctions of unstable modes for a given set of input parameters characterizing the geometry and profiles at a particular flux surface. These are the building blocks of a reduced transport model.27 In this paper, we aim to emulate these quantities using a database of gyrokinetic simulations of spherical tokamak power-plant relevant micro-tearing modes. To do this, Gaussian process regression (GPR) is performed on an initially unexplored seven-dimensional parameter space. A Gaussian process-based classifier is constructed, which assists in the expansion of the training set by learning the stability manifold of the parameter space, allowing active learning techniques22,28–30 to be employed, which focus sample points on regions with high probability of instability to MTMs. The combined model is designed to investigate the parameter space as efficiently as possible while building an accurate regression-based model harnessing the advantage of GPs that a prediction comes with an easily understandable confidence interval (CI) derived from the process variance.

The remainder of the paper is outlined as follows: Sec. II briefly introduces the gyrokinetic model and the simulations used to generate training data; Sec. III defines the parameter space over which models will be built and Sec. IV describes the modes of interest and the selection criteria used to identify them. An overview of Gaussian processes is given in Sec. V prior to a description of how they are used to build efficient models for MTMs in Sec. VI. Validation of the resulting models is performed in Sec. VII, and finally, conclusions and ideas for future development are provided in Sec. VIII.

The gyrokinetic Vlasov–Maxwell system of equations is widely used to describe the turbulent component of the electromagnetic fields and particle distribution functions of the species present in a plasma. Here, we will provide a short summary for understanding, but for a more comprehensive description of the model, see Refs. 7, 31, and 32. To reduce the number of dimensions and the resolutions required to solve the equations, the following ordering assumptions are imposed, which have been observed to be valid for turbulence in in the core of tokamak plasmas:

  • The Larmor radius ρ a = v / Ω a = v m a / q a B for species a is small compared to the length scale over which the background kinetic profiles and magnetic equilibrium vary. The Larmor radius normalized to the system size (characterized by the minor radius a) ρ * = ρ a a 1 is thus considered a small parameter.

  • The turbulent fluctuations in a confined plasma are highly anisotropic due to the presence of a strong magnetic field, i.e., the perpendicular wavelengths of interest are significantly smaller than those parallel to the magnetic field line, k k | |.

  • The fluctuations of the field perturbations against the background profiles are small. The electrostatic potential ϕ T q a ρ * and the electromagnetic potential A | | T q a v t h ρ *.

  • The frequencies of interest of the turbulence and transport (ω) are small compared to the ion cyclotron frequency, ω Ω a = q a B / m a.

The resulting five-dimensional gyrokinetic Vlasov equation has the form
F a t + X t . F a + v | | t F a v | | = 0 ,
(1)
where F a = F a ( x , v | | , μ ) is the gyrocenter distribution function for species a represented in full 5D phase space, x is the three-dimensional vector of coordinates of the guiding center of a particle, v | | is the velocity coordinate along the magnetic field line, and μ = v 2 / 2 B is the magnetic moment, a conserved quantity, where v is the velocity in the perpendicular direction. A particle in the gyrokinetic framework is advected through configuration space by the plasma drifts caused by variations in the electric and magnetic fields along its trajectory, and accelerated along the magnetic field by the parallel electric field.
The particle velocity can be written as
X t = v = v | | b 0 ̂ + v B + v χ ,
(2)
where the terms on the right-hand side represent the parallel streaming, the drifts due to the inhomogeneous magnetic field, and the drifts due to the perturbed fields, respectively. This equation is coupled to the gyrokinetic Poisson and Ampère equations for closure.

Solution of the gyrokinetic Vlasov–Maxwell system enables an understanding of the stability of tokamak plasmas, and in the nonlinear case, the saturated turbulent state and the resulting heat, particle, and momentum fluxes. Nonlinear gyrokinetics represent one of the highest fidelity plasma turbulence modeling tools accessible on current supercomputers. It has been extensively validated against experiment33–36 in a variety of regimes. Its large computational expense, however, makes it impractical for use in integrated modeling simulations over confinement timescales, and as such, quasi-linear models such as TGLF37,38 and QuaLiKiz39 among others are used for this purpose. These models try to infer the nonlinear regime from the properties of the underlying linear modes. These are then used, along with a set of estimates of the saturation amplitudes of the mode based off of the underlying saturation mechanism, to estimate the nonlinear fluxes. However, the parameter regime of interest in this paper is outside the range of validity of such models and as such does not encapsulate the MTM-driven transport, which is thought to be of primary importance in high-β spherical tokamaks.

GS2 is an implicit gyrokinetic code, which enables solution of the gyrokinetic Vlasov–Maxwell system of equations using HPC resources and is used in the generation of training data for this study.26 It was specifically developed to study low-frequency turbulence in magnetized plasmas. The code incorporates many physics effects in the local limit including kinetic electrons, electromagnetic effects (both A | | and B | | effects), collisions,40,41 full general geometry, and rotation.

It is typically used to assess the micro-stability of plasmas produced in the laboratory and to calculate key properties of the turbulence, which results from instabilities. Linear micro-instability growth rates, frequencies, and normalized fluxes can be calculated on a wavenumber-by-wavenumber basis using a flux tube geometry in the ballooning limit.42 

GS2 has the option of being run as an initial value solver where the fastest growing mode for a given parameter set is found, or as an eigenvalue calculation where all the eigenvalue pairs for a parameter set can be found. The latter method has the advantage that all unstable modes can be found, including those that are subdominant. This is particularly useful for MTMs as they are in many cases subdominant to other microinstabilities, while still having a significant effect on heat transport. However, for ease of computation, the former option is chosen, and usage of the eigenvalue solver is left for a future expansion of the model. The usage of an eigenvalue solver adds significant complexity to the classifier component of the model as will be described further in Sec. V B.

In this study, a seven-dimensional parameter space is modeled comprising the variables considered to be those to which the MTMs are most sensitive. These parameters are as follows:

  • The safety factor q: The ratio of toroidal to poloidal turns a field line of interest will wind around a flux surface.

  • The magnetic shear s ̂ = r q d q d r: Parameterizes the radial variation of the safety factor, where r is the radial coordinate.

  • The normalized radial gradient of the electron density a L n e = a n e n e r, where a is the minor radius of the last closed flux surface.

  • The radial gradient of the logarithmic electron temperature a L T e = a T e T e r.

  • The ratio of the plasma pressure to the magnetic pressure β = 2 μ 0 n k B T B 2. Here, the magnetic field is the toroidal magnetic field at the center of the flux surface at the height of the magnetic axis.

  • The electron–ion collision frequency νei: The characteristic time scale at which electrons will scatter off ions.

  • The normalized bi-normal wavevector ky (units of ρs).

The ranges of the seven parameters are described in Table I and represent plausible limits within which a future spherical tokamak power plant may be expected to operate. Other parameters, for instance those defining the flux surface shaping, are fixed at representative values based on STEP scenario SPR-00843 except for the ion temperature gradient, which is set to 0 to suppress ion temperature gradient (ITG) and trapped electron (TEM) modes, which are not in the scope of this study, and it is known to have a negligible effect on the MTM.44 The values of the fixed variables are given in Table II. The bi-normal mode wavevector is restricted to the range 0 k y ρ s 1 for this initial model as this is the range of modes, which dominate the heat flux spectra. An analysis of the mode stability for some of the flux surfaces that are considered here can be found in Refs. 45 and 46.

TABLE I.

The input parameters that are varied in this model with their maximum and minimum values.

Variable Name Min. Max.
q  Safety factor 
s ̂  Magnetic shear 
a / L n e  Electron density gradient  10 
a / L T e  Electron temperature gradient  0.5 
β  Ratio plasma/magnetic pressures  0.3 
νei  Electron–ion collision frequency ( c s / a 0.1 
ky  Bi-normal mode wavelength (units of 1 / ρ s
Variable Name Min. Max.
q  Safety factor 
s ̂  Magnetic shear 
a / L n e  Electron density gradient  10 
a / L T e  Electron temperature gradient  0.5 
β  Ratio plasma/magnetic pressures  0.3 
νei  Electron–ion collision frequency ( c s / a 0.1 
ky  Bi-normal mode wavelength (units of 1 / ρ s
TABLE II.

The parameters that are fixed in the model. Variations in these parameters will be considered in a future development.

Variable Name Value
Normalized major radius (m)  1.84 
Minor radius (m)  1.47 
κ  Elongation  2.66 
d κ d ρ  Radial elongation gradient  −0.25 
τ  Triangularity  0.34 
d τ d ρ  Radial triangularity gradient  0.25 
Δ  Shafranov shift  −0.44 
T e T i  Electron/ion temperature ratio  0.944 
β   Radial gradient of β  −0.64 
θ0  Ballooning angle  0.0 
Variable Name Value
Normalized major radius (m)  1.84 
Minor radius (m)  1.47 
κ  Elongation  2.66 
d κ d ρ  Radial elongation gradient  −0.25 
τ  Triangularity  0.34 
d τ d ρ  Radial triangularity gradient  0.25 
Δ  Shafranov shift  −0.44 
T e T i  Electron/ion temperature ratio  0.944 
β   Radial gradient of β  −0.64 
θ0  Ballooning angle  0.0 

One of the characteristics of MTMs is their parity with respect to the coordinate along the magnetic field line. Specifically, the parallel magnetic vector potential ( A | |) has an even parity around the outboard midplane (θ = 0), while the electrostatic potential ( ϕ) has an odd parity. Examples of this are seen in Fig. 1.

FIG. 1.

Example eigenfunctions of the electrostatic potential (left) and parallel magnetic potential (right) of an example of an MTM plotted along the parallel coordinate. Real (blue) and imaginary (orange) parts are plotted. The parameters data point are, ky = 0.384, β = 0.204 , a / L n = 1.072 , a / L T e = 4.421 , ν e i = 0.075, q = 2.92, and s ̂ = 2.610 for the lower, ky = 0.866, β = 0.052 , a / L n = 1.803 , a / L T e = 4.473 , ν e i = 0.085, q = 8.561, and s ̂ = 1703.

FIG. 1.

Example eigenfunctions of the electrostatic potential (left) and parallel magnetic potential (right) of an example of an MTM plotted along the parallel coordinate. Real (blue) and imaginary (orange) parts are plotted. The parameters data point are, ky = 0.384, β = 0.204 , a / L n = 1.072 , a / L T e = 4.421 , ν e i = 0.075, q = 2.92, and s ̂ = 2.610 for the lower, ky = 0.866, β = 0.052 , a / L n = 1.803 , a / L T e = 4.473 , ν e i = 0.085, q = 8.561, and s ̂ = 1703.

Close modal

In order to ensure that GS2 predominantly finds micro-tearing modes, which in many cases will be subdominant to ITGs, TEMs, and KBMs, the desired parity can be enforced on the solution during the simulation. This is done by setting the solution at negative θ to be minus the solution at positive θ. Even parity solutions can be forced in a similar way by setting the solution at negative θ to be identical to that at positive θ. This filters out modes with the parity seen in the lower panel of Fig. 2 and allows subdominant micro-tearing modes to be found even when running the code as an initial value solver. In general, for electromagnetic modes such as these, the magnitude of the magnetic perturbation A | | is the same amplitude as or larger than its electrostatic counterpart, ϕ ( | A | | | | ϕ |), as seen in Fig. 1.

FIG. 2.

Example eigenfunctions of the electrostatic potential (left) and parallel magnetic potential (right) and of the kind of KBMs found plotted along the parallel coordinate. Real (blue) and imaginary (orange) parts are plotted. Top is an odd parity KBM (oKBM) (data point: ky = 0.704, β = 0.26 , a / L n = 0.29 , a / L T e = 4.23 , ν e i = 0.054, q = 2.42, s ̂ = 2.43), bottom is a KBM (datapoint: ky = 0.053, β = 0.166 , a / L n = 4.468 , a / L T e = 5.25 , ν e i = 0.07, q = 8.56, s ̂ = 1.518). Setting a / L T i = 0 filters out the upper modes, while the GS2 boundary parity filter, filters out the lower.

FIG. 2.

Example eigenfunctions of the electrostatic potential (left) and parallel magnetic potential (right) and of the kind of KBMs found plotted along the parallel coordinate. Real (blue) and imaginary (orange) parts are plotted. Top is an odd parity KBM (oKBM) (data point: ky = 0.704, β = 0.26 , a / L n = 0.29 , a / L T e = 4.23 , ν e i = 0.054, q = 2.42, s ̂ = 2.43), bottom is a KBM (datapoint: ky = 0.053, β = 0.166 , a / L n = 4.468 , a / L T e = 5.25 , ν e i = 0.07, q = 8.56, s ̂ = 1.518). Setting a / L T i = 0 filters out the upper modes, while the GS2 boundary parity filter, filters out the lower.

Close modal
These constraints do not guarantee that converged simulations will be MTMs, with large regions of parameter space having no unstable odd parity modes and odd parity kinetic ballooning type modes (KBMs) also being present, such as those seen in the upper panel of Fig. 2. To filter these out of our training data, additional selection criteria are set which are developed from knowledge of the physics of the expected modes. These are applied to the set of converged runs to produce a training set identified as MTMs. MTMs perturb field lines so that they do not return to the equilibrium flux surface. This effect can be quantified by the tearing parameter:
C tear = | A | | d θ | | A | | | d θ ,
(3)
where d θ corresponds to the integral along the field line, which is parameterized by the field following coordinate, θ. This parameter is calculated for each simulation, and those with values less than 0.15 are rejected.
Micro-tearing modes have frequencies that are in the electron diamagnetic direction.2 By definition, these have a negative sign in GS2. This allows a second selection rule; the calculated frequency of the mode must be within a window whose bounds are 50% above and below the value predicted by the analytical lowest order expression as derived by Catto et al.47 
ω 0 = ( k y ρ s ) ( a L n e + 0.5 a L T e ) .
(4)

Figure 3 shows the distribution of a subset of the converged simulations in normalized mode frequency and tearing parameter space. The horizontal lines represent the lower and upper bounds of the normalized frequency within which an MTM is expected to sit, and the vertical dashed line represents the lower cutoff to the tearing parameter. A final visual inspection of the structure of the eigenfunctions removes points, which are unconverged or lack sufficient length along the field line to fully resolve the mode, and this rejects the non-MTM (blue) points within the acceptance limits in Fig. 3. These physics-based selection rules are used to set a logical parameter indicating whether a converged instability is considered to be a MTM or not. It is upon this parameter that the classifier component of the model is trained.

FIG. 3.

Distribution of points in tearing parameter and frequency [normalized to the analytical expression, ω0, Eq. (4)] space for a subset of the MTM database. Blue points are accepted into the training database, while the green crosses and red points are not for being outside the acceptance criteria as denoted by the horizontal (frequency) and vertical (tearing parameter) dark lines.

FIG. 3.

Distribution of points in tearing parameter and frequency [normalized to the analytical expression, ω0, Eq. (4)] space for a subset of the MTM database. Blue points are accepted into the training database, while the green crosses and red points are not for being outside the acceptance criteria as denoted by the horizontal (frequency) and vertical (tearing parameter) dark lines.

Close modal

The MTMs will be examined using local linear gyrokinetics on a single core flux surface with ρ = r / a = 0.67, where a is the radius of the last closed flux surface. A Miller parameterization48 was used to model the equilibrium, with the parameters outlined in Table II. Three kinetic species are modeled: deuterium, tritium, and electrons ( n e = n D + n T, deuterium and tritium in a 50/50 mix). The simulations are run for a single bi-normal mode, ky. We neglect compressional magnetic field ( B | |) effects in order to reduce the computational complexity of the model and suppress some types of kinetic ballooning mode, which are destabilized by compressional magnetic field perturbations. Micro-tearing modes are essentially unaffected by their inclusion.

MTMs require significant numerical resolution due to being particularly elongated along the magnetic field line in ballooning space, equivalent to having a very fine radial structure. A single set of numerical inputs is used in this work for tool-chain simplicity. These are summarized in Table III. nperiod defines the domain of the simulation by setting the number of times the flux tube winds around the device poloidally. The number of 2 π segments is given by 2  n period 1. GS2 uses the pitch angle and particle energy as velocity space coordinates. n | | determines the number of un-trapped pitch angle grid points. The number of trapped pitch angle points is defined by the resolution of the θ grid as n θ / 2 + 1.

TABLE III.

The resolutions used the parallel direction and velocity grids when generating the training data.

Variable Description Value
n ϵ  Number of energy grid points.  16 
n | |  Number of un-trapped pitch angle grid points in each direction along the field line.  16 
ntrapped  Number of grid points in the trapped region of velocity space.  65 
n θ  Sets the number of parallel grid points along the magnetic field line per poloidal turn.  128 
nperiod  Defines the domain of the simulation by setting the number of times the flux tube winds around the device poloidally. 
Variable Description Value
n ϵ  Number of energy grid points.  16 
n | |  Number of un-trapped pitch angle grid points in each direction along the field line.  16 
ntrapped  Number of grid points in the trapped region of velocity space.  65 
n θ  Sets the number of parallel grid points along the magnetic field line per poloidal turn.  128 
nperiod  Defines the domain of the simulation by setting the number of times the flux tube winds around the device poloidally. 

The run time of the simulation is approximately linear in the value of nperiod, so a value of 9 was chosen for this work as a compromise between accuracy and required computational resources; however, there will be a systematic uncertainty resulting from the choice of this parameter, especially in parts of parameter space where the mode is particularly elongated in the ballooning angle. The simulations where this effect is particularly acute are removed from the training set to prevent such systematic errors from entering the regression model. The resolutions used here are influenced by, and consistent with, the resolutions used in previously published studies of the linear structure of unstable modes in spherical tokamaks.49 It is noted here that this resolution is approximately an order of magnitude higher than for the KBMs or trapped electron modes (TEM) that would also be found in this parameter space where 2/3 poloidal turns is usually sufficient, and the modes tend to be more strongly growing.

The outputs of the model are the properties of the micro-tearing modes that are important for building a quasi-linear heat transport model. Namely, the growth rates, frequencies, and mode contribution to the radial electron heat flux caused by the interaction of the turbulent field and the turbulent pressure fluctuations. These are outlined in Table IV along with their units.

TABLE IV.

Model outputs. a is the minor radius of the tokamak, while c s = T i , ref / m e is the ion sound speed. The electron heat flux is related to the heat diffusivity and temperature gradient by Q e = n e χ e T e.

Variable Name Units
γ  Mode growth rate  c s / a 
ω  Mode frequency  c s / a 
Q e m , e  Radial electromagnetic electron heat flux  ρ * 2 n ref c s T e 
Variable Name Units
γ  Mode growth rate  c s / a 
ω  Mode frequency  c s / a 
Q e m , e  Radial electromagnetic electron heat flux  ρ * 2 n ref c s T e 

The heat flux is defined as Q e m , e = d 3 v m v 2 2 v δ B ̃ · r f , normalized to | A | | | 2 , where v ̃ δ A | | = b ̂ × ( v | | A | | ̃ ) / B is the flutter velocity, the parallel velocity along the perturbed field lines, and the curly brackets denote a flux surface average (where for quantity F, F = F d l). b ̂ denotes a unit vector along the equilibrium magnetic field and B is the magnetic field strength. This is the predominant radial heat transport driven by MTMs.

In addition to GS2, our tool-chain incorporates Pyrokinetics,50 which is a python library that aims to standardize gyrokinetic analysis by providing a single interface for reading and writing input and output files from different gyrokinetic codes, normalizing to a common standard. This enables future interoperability with other gyrokinetic codes. All variables in this paper utilize the standard normalizations used by Pyrokinetics.50 

Our data-driven surrogate model of the linear properties of MTMs is built on Gaussian process (GP) regression. An introduction can be found here.51 GP regression is a Bayesian approach that has received significant interest in recent years and has been used in the modeling of a diverse range of physical systems. A Gaussian process defines a prior over a function space, which can be conditioned on observations of the function to be emulated. Once trained, it provides a probability distribution over the function value at any unobserved location, allowing prediction with a quantified uncertainty.

Gaussian process regression is used as opposed to neural networks for the model as they are readily interpretable and, importantly when doing uncertainty quantification, they handle uncertainty and noise in a rigorous fashion, two properties that are not always applicable to neural networks. Due to the expense of running a gyrokinetic code, even linearly, the dataset is expected to be small, and as such, Gaussian processes have an advantage due to their data efficiency.

A Gaussian process is a stochastic process (a collection of random variables) such that every finite subset has a joint probability distribution, which is Gaussian. In the context of function emulation, one can consider the value of the function f at each location in the input space to be given by a random variable, such that the function as a whole is comprised of an infinite collection of random variables which obey the above property. This effectively yields a probability distribution for the function, which can then be conditioned on observations.

The joint probability distribution for the function at a finite set of input locations is given by a multivariate Gaussian as follows:
f N ( μ ( x ) , Σ ( x , x ) )
(5)
for any two points x, x in a set X, N denotes a Gaussian distribution, where μ is the mean function and Σ is the covariance function, which is a positive semi-definite function that maps pairs of locations in parameter space to the covariance between their function values.

If there is prior knowledge of the overall structure of the function to be emulated, then this can be expressed via the choice of parameterization of the mean function, but in many cases, this is simply set to zero. In this work, we use a constant mean function, unless otherwise stated.

The posterior mean and variance are given by
μ ( x ) = κ ( x , x ) T ( κ ( x , x ) + σ 2 I ) 1 f ( x ) ,
(6)
Σ ( x , x ) = κ ( x , x ) κ ( x , x ) T ( κ ( x , x ) + σ 2 I ) 1 κ ( x , x ) ,
(7)
where x denotes a vector of parameter space locations to be predicted, σ 2 is the process variance, and κ is a kernel function modeling the correlation between data points. κ ( x , x ) denotes a n × n matrix of the covariances between points in a training set of size n. Similarly, κ ( x , x ) is a matrix of n × n size containing the covariances between all pairs of training and test points of number n .

In relation to our model, the mean μ corresponds to the growth rate, mode frequency, and fluxes for a specific input parameter set x (the seven input parameters as outlined in Table I). The variance Σ represents the uncertainty in those predicted mean values.

1. Kernels

The covariance function is expressed as the product of σ 2 and κ,
Σ ( x , x ) = σ 2 κ ( x , x ) .
(8)
Many choices are available for the kernel function, but two common choices used when one has no prior knowledge of the covariance structure are the radial basis function (RBF) kernel
κ ( x , x ) = Π i = 1 p exp ( ( x i x i ) 2 2 l i 2 ) ,
(9)
and the Matérn family of kernels parameterized by ν,
κ ( x , x ) = Π i = 1 p 1 2 ν 1 Γ ν ( 2 ν 1 / 2 | x i x i | l i ) ν J ν ( 2 ν 1 / 2 | x i x i | l i ) ,
(10)
where Γ ν is the Gamma function for ν, J ν is a modified Bessel function of order ν > 0, and p is the number of input dimensions, as outlined in Table I. In both cases, each dimension i has an associated correlation scale length given by li. Points in parameter space much closer than these characteristic distances can be expected to be strongly correlated.

The squared exponential in the RBF kernel is infinitely differentiable, which can lead to unrealistically smooth function realizations. The Matérn kernel is only ν 1 times differentiable and thus allows more flexible modeling of the degree of smoothness of the underlying function. Note that the Matérn kernel can be simplified for ν = p + 1 2 ; p 0, yielding particularly “nice” solutions for ν = [ 1 / 2 , 3 / 2 , 5 / 2 ]. When ν 7 / 2, the kernel is so smooth that it is hardly distinguishable from the RBF kernel.51 The Matérn 5/2 kernel is a common choice for emulation and the default kernel used in the experiments below.

2. Fitting to the data

Before seeing any data, the structure outlined above constitutes the GP prior, and it specifies a certain set of functions according to the mean function and the kernel. This prior over functions f for given hyperparameters θ can be written compactly as
p ( f | Θ ) ,
(11)
where the model hyperparameters Θ are the covariance length scales of the kernel (denoted li above) and the noise level. The aim is to identify the posterior distribution after conditioning the prior on inputs X and outputs y using Bayes' theorem
p ( f | y , X , Θ ) = p ( f | Θ ) p ( y | X , f ) p ( y | X , Θ ) ,
(12)
where the marginal likelihood is
p ( y | X , Θ ) = p ( f | Θ ) p ( y | X , f ) d f .
(13)
This is relatively straightforward for fixed hyperparameters, but these are rarely known a priori, and hence, a similar approach must be taken at the next higher level of this hierarchical model to identify the posterior distribution over model hyperparameters
p ( Θ | y , X ) = p ( Θ ) p ( y | X , Θ ) p ( y | X )
(14)
with marginal likelihood
p ( y | X ) = p ( Θ ) p ( y | X , Θ ) d Θ
(15)
inferring the hyperparameters from the data by marginalizing over model hyperparameters Θ is challenging since this integral rarely has an analytical solution. This can be alleviated through approximations, such as variational inference (VI), or by sampling using Markov chain Monte Carlo (MCMC). In practice, type-II maximum likelihood estimation (MLE) is often employed. Instead of evaluating or approximating the full posterior of hyperparameters [Eq. (14)], the marginal likelihood [Eq. (13)] is maximized with respect to the hyperparameters Θ. Thus, the hyperparameters will be fixed at the maximum likelihood values. Refer to Ref. 51 for more details. Due to the relatively large datasets used in this study, type-II MLE will be employed for regression throughout. Irrespective of the method employed to approximate the model hyperparameters, fitting a GP requires inverting an n × n dense matrix for n data points. Hence, GP regression scales extremely poorly with increasing numbers of datapoints with a compute cost of O ( n 3 ). In this work, blackbox matrix–matrix multiplication (BBMM) is leveraged to dramatically reduce this cost, reducing it to O ( n 2 ), as described in Ref. 52.
The classification problem consists of determining whether or not a certain input would result in an unstable MTM. This is a binary classification problem, which can be handled using a GP with a Bernoulli distribution as the likelihood and a probit link function, i.e., the standard normal cumulative density function (CDF) Φ ( · ). This probit Gaussian process classification model can be written compactly as
p ( MTM = 1 | x ) = Φ ( f ( x ) ) ,
(16)
where f(x) is a sample drawn from a Gaussian process, commonly known as a latent function or nuisance function. This results in a stochastic classification model with a smoothly varying decision boundary. The procedure for fitting the latent function is similar to the approach outlined above for GP regression; however, the posterior for this model is analytically intractable and must be approximated either through sampling methods, such as MCMC, or VI. While the posterior recovered by MCMC is asymptotically exact, it can be prohibitively computationally expensive to perform this sampling, particularly for large datasets. Hence, VI is often employed for approximate inference. Here, the posterior is approximated by minimizing the Kullback–Leibler (KL) divergence between the posterior and a computationally tractable distribution, usually referred to as the variational distribution.53 Refer to Ref. 51 for more details.

After conditioning on training data, the Φ-transformed posterior mean of the latent function can be interpreted as the probability of a test point x being an MTM, i.e., p ( MTM = 1 | x ) = Φ ( μ ( x ) ), and conversely, p ( MTM = 0 | x ) = 1 Φ ( μ ( x ) ). This classifier will enable efficient expansion of the training set by predicting the regions of parameter space where an MTM may be found and thus allowing a targeted use of compute resources. The classifiers were evaluated using standard metrics, namely, accuracy, precision, recall, and F-score (see, e.g., Ref. 54).

The micro-tearing mode is stabilized by some of the parameters that are investigated here, and as such, there are large volumes of parameter space where GS2 may not return a positive result as is evident in the locations of the simulation points seen in Fig. 4. To demonstrate this, Fig. 5 shows a contour plot of the calculated growth rate across a two-dimensional parameter space of magnetic shear s ̂ and electron temperature gradient a / L T e calculated using a Latin hypercube with 100 samples. Stability boundaries where the growth rate of the MTMs tends to zero are evident in the bottom right-hand corner and additionally at low electron temperature gradient. This submanifold characterizing marginal stability is of great importance for determining the parameter values beyond which turbulence can form.

FIG. 4.

The distribution of simulations over the seven-dimensional domain. The orange points represent simulations where an unstable micro-tearing mode is found while blue points are those found to be stable. It is evident that the electron density gradient is highly stabilizing, as there are a few unstable points seen when a / L n > 4; however, the stability region is otherwise quite complex, with significant structure over the parameter space. Plots on the diagonal show the marginal distributions of stable and unstable points in each parameter.

FIG. 4.

The distribution of simulations over the seven-dimensional domain. The orange points represent simulations where an unstable micro-tearing mode is found while blue points are those found to be stable. It is evident that the electron density gradient is highly stabilizing, as there are a few unstable points seen when a / L n > 4; however, the stability region is otherwise quite complex, with significant structure over the parameter space. Plots on the diagonal show the marginal distributions of stable and unstable points in each parameter.

Close modal
FIG. 5.

The MTM growth rate calculated as a function of the electron temperature gradient ( a / L T e) and the magnetic shear ( s ̂) using a 100-point Latin hypercube sample with all other parameters kept constant. Black points are simulations where an MTM is found, whereas red points are observed non-MTMs and some MTMs with a negative growth rate were found. The dark black lines are the contours of zero growth rate and the marginal stability manifold. Distinct areas of stability and instability and evident.

FIG. 5.

The MTM growth rate calculated as a function of the electron temperature gradient ( a / L T e) and the magnetic shear ( s ̂) using a 100-point Latin hypercube sample with all other parameters kept constant. Black points are simulations where an MTM is found, whereas red points are observed non-MTMs and some MTMs with a negative growth rate were found. The dark black lines are the contours of zero growth rate and the marginal stability manifold. Distinct areas of stability and instability and evident.

Close modal

As GS2 is being used as an initial value code, in cases where the simulation is run in areas of stability, it will run to a pre-determined maximum time without finding an instability, thus using up significant computational resources without producing useful data upon which to train the regression model. To mitigate this, a classifier is trained to predict the probability that a given point in parameter space will be unstable to MTMs, effectively learning the marginal stability manifold and allowing targeted exploration of the parameter space. To build such a classifier an initial dataset is required.

1. Initial dataset generation

An initial dataset was produced by generating a 300 point maxi-min Latin hypercube design over the parameter space. A Latin hypercube is a space-filling quasi-Monte Carlo sampling scheme whereby one partitions each parameter range into N divisions and samples N points such that each such partition only contains one sampled point for every parameter. A maxi-min design attempts to do this in a way that maximizes the minimum separation between points in order to generate a point set with good coverage and minimal clustering.

Out of the 300 points sampled, 293 converged, and of these, 99 passed the selection criteria for MTMs discussed in Sec. IV and in total cost approximately 9000 CPU hours. The classifier was trained on both the samples identified as MTMs and those identified as stable (no MTM), while the regression models were trained solely on the MTMs. The training of the classifier takes approximately 1 h on a single CPU. The low fraction ( 30 %) of MTMs produced by the Latin hypercube, a uniform sampling approach, and the high associated computational overhead, highlights the potential benefit of using an active learning approach.

To improve the predictions of the GP regression, more samples were drawn iteratively from areas of the parameter space with high posterior probability of finding an MTM as determined by the classifier. The algorithm is summarized as follows:

  • Draw n = 2 k low discrepancy candidate data points from the input space using Sobol sequences55 to yield an initial sample X = { x i } i = 1 n.

  • Evaluate the posterior predictive probability of being an MTM for each candidate data point using the Bernoulli GP classifier p ( MTM = 1 | x i ) = Φ ( μ ( x i ) ), where Φ ( · ) is the standard normal CDF, and μ ( x i ) is the predictive mean of the latent function for data point xi.

  • Accept/reject each candidate data point using a rejection sampler on the posterior predictive probability of being an MTM, i.e.,

    • Draw a random number from a uniform distribution ρ i U ( 0 , 1 ).

    • If ρ i < p ( MTM = 1 | x i ), accept the candidate data point xi.

    • Otherwise, reject the candidate data point xi.

  • This algorithm yields a final sample X of accepted samples, where the samples are now distributed according to the posterior predictive distribution of MTMs, p ( MTM = 1 | x i ).

  • Retrain the classifier and regressors using the expanded data set.

  • Iterate until the desired performance is achieved or the available compute resources are expended.

Two batches of new data were generated using this procedure. The first batch consisted of 1362 accepted samples, of which 971 were subsequently identified as MTMs, corresponding to a hit rate of 71%. The second batch consisted of 1317 accepted samples, of which 983 were subsequently identified as MTMs, corresponding to a hit rate of 75%. This is a significant increase compared to the initial parameter space sampling, representing a substantial saving in computational time. A summary of the batch statistics can be found in Table V, and the locations of the simulation points in parameter space are showing in Fig. 6.

TABLE V.

Outlining the number of simulations, the number of simulations found with an unstable MTM, and the percentage of the batch that gives an unstable MTM for the training set. Evident is the increase in hit rate as more data are added to the database and the classifier fidelity improves.

No Name Description Total num. simulations Number of MTMs Hit rate (%)
Initial LHD  Original LHD distributed parameter scan  293  99  33 
Original  First expansion of dataset using rudimentary classification  1000  513  51.3 
Targeted1  General classifier directed database expansion  1361  971  71.2 
Targeted2  General classifier directed database expansion  1316  983  74.6 
HighFlux1  Dataset targeted to high-flux region ( Q e m , e > 0.2 300  237  79 
HighFlux2  Dataset targeted to high-flux region ( Q e m , e > 0.5 400  349  87 
No Name Description Total num. simulations Number of MTMs Hit rate (%)
Initial LHD  Original LHD distributed parameter scan  293  99  33 
Original  First expansion of dataset using rudimentary classification  1000  513  51.3 
Targeted1  General classifier directed database expansion  1361  971  71.2 
Targeted2  General classifier directed database expansion  1316  983  74.6 
HighFlux1  Dataset targeted to high-flux region ( Q e m , e > 0.2 300  237  79 
HighFlux2  Dataset targeted to high-flux region ( Q e m , e > 0.5 400  349  87 
FIG. 6.

The complete dataset with each data point colored by its respective batch. The original dataset (blue) is the data produced using a 300-point Latin hypercube design as outlined in Sec. VI A 1. The targeted dataset (orange) encompasses both batches of samples directed toward finding an MTM. The high-flux datasets (green and red) were from areas of high predicted flux, Q e m , e > 0.2 and Q e m , e > 0.5, respectively, and are combined here denoted as high-flux.

FIG. 6.

The complete dataset with each data point colored by its respective batch. The original dataset (blue) is the data produced using a 300-point Latin hypercube design as outlined in Sec. VI A 1. The targeted dataset (orange) encompasses both batches of samples directed toward finding an MTM. The high-flux datasets (green and red) were from areas of high predicted flux, Q e m , e > 0.2 and Q e m , e > 0.5, respectively, and are combined here denoted as high-flux.

Close modal

The model trained on this expanded dataset was observed to lack accuracy at high heat fluxes, and so, further iterations targeting high-flux points were performed. To do this, the existing data were labeled according to whether they were below or above a specified heat flux threshold and a Bernoulli GP classifier was trained on these labels. Sample points with a high predicted heat flux value were then proposed using the rejection sampling algorithm outlined above, highlighting the flexibility of the classifier-driven sampling approach. Using this approach, two further iterations targeting Q e m , e > 0.2 and Q e m , e > 0.5 were performed.

The predictive power of each model was evaluated using the mean standardized log loss (MSLL) metric.51 This metric computes the average negative log-probability of the testing data with respect to their predictive posterior densities, standardized by the loss of the trivial model y ( μ D , σ D 2 ):
log p ( y | D , x ) = 1 2 log ( 2 π σ 2 ) + ( y f ¯ ( x ) ) 2 2 σ 2 ,
(17)
MSLL ( y ) = log p ( y | D , x ) + log ( y ; μ D , σ D 2 ) ,
(18)
where ( x , y ) is a test point, D = { ( x i , y i ) } i = 1 n is the training dataset, σ 2 is the predictive variance, and μ D and σ D 2 are the mean and variance of the training data, respectively.
Additionally, the calibration error of each model was evaluated using the root mean squared calibration error (CE),56–58 which is a frequentist method that measures the difference between the expected and the observed cumulative distributions of model predictions:
ν ( q ) = 1 N i = 1 N I ( y i Q x i ( q ) ) q [ 0 , 1 ] ,
(19)
CE = 1 M i = 1 M ( q i ν i ) 2 ,
(20)
where I ( · ) is the indicator function, which returns 1 if the argument is true and 0 otherwise, Q x i ( q ) is the quantile function of the posterior prediction of xi evaluated at the quantile q [ 0 , 1 ], so that ν ( q ) is the average number of datapoints within a given quantile q. For a perfectly calibrated model, ν ( q ) q q [ 0 , 1 ] as N . Then, the CE can be computed from M samples of q and v, { ( q i , ν i ) } i = 1 M. The remaining regression metrics are explained in A.

The Bernoulli GP (BGP) classifier was benchmarked against standard reference classifiers, namely, logistic regression (LR) (see, e.g., Refs. 59 and 60) and a gradient-boosting classifier (GB).60–62 The BGP was tested using three different covariance kernels, namely, Matérn 1/2, 3/2, and 5/2 (M12, M23, and M52). Table VI shows the mean and standard deviation of the accuracy, precision, recall, and F1 score for each classifier. For definitions of these terms, the reader is directed to A. Here, the mean and standard deviation were computed from a fivefold cross-validation. Note that the Bernoulli GP classifiers have slightly lower accuracy and precision, and slightly higher recall than the reference classifiers.

TABLE VI.

Mean and standard deviation of the accuracy, precision, recall, and F1 score for the logistic regression (LR), gradient boosting classifier (GB), and Bernoulli GP classifier (BGP).

Classifier Accuracy Precision Recall F1
LR  0.827 ± 0.027  0.813 ± 0.019  0.907 ± 0.036  0.857 ± 0.024 
GB  0.879 ± 0.036  0.868 ± 0.046  0.935 ± 0.024  0.900 ± 0.026 
BGP (M12)  0.875 ± 0.016  0.847 ± 0.024  0.954 ± 0.008  0.898 ± 0.014 
BGP (M32)  0.888 ± 0.026  0.865 ± 0.029  0.955 ± 0.018  0.907 ± 0.020 
BGP (M52)  0.890 ± 0.017  0.864 ± 0.032  0.961 ± 0.013  0.909 ± 0.015 
Classifier Accuracy Precision Recall F1
LR  0.827 ± 0.027  0.813 ± 0.019  0.907 ± 0.036  0.857 ± 0.024 
GB  0.879 ± 0.036  0.868 ± 0.046  0.935 ± 0.024  0.900 ± 0.026 
BGP (M12)  0.875 ± 0.016  0.847 ± 0.024  0.954 ± 0.008  0.898 ± 0.014 
BGP (M32)  0.888 ± 0.026  0.865 ± 0.029  0.955 ± 0.018  0.907 ± 0.020 
BGP (M52)  0.890 ± 0.017  0.864 ± 0.032  0.961 ± 0.013  0.909 ± 0.015 

Figure 7 shows the predictions of the Bernoulli GP classifier with the Matérn 5/2 covariance kernel, which was the best-performing BGP classifier, across each of the two-dimensional subspaces of the seven-dimensional space. For each marginal parameter scan, all the remaining parameters were fixed at the center point of their respective domain. These indicate where in the parameter space MTMs are likely to be unstable.

FIG. 7.

The probability of finding an MTM p ( MTM = 1 | x ) along planes in the input parameter space, according to the Bernoulli GP classifier with the best-performing kernel function, namely, Matérn 5/2. There is a complex structure across the parameter space showing broad areas of instability. The stabilizing effect of high-density gradients is also apparent. For each marginal parameter scan, all remaining parameters were fixed at the center point of their respective domain.

FIG. 7.

The probability of finding an MTM p ( MTM = 1 | x ) along planes in the input parameter space, according to the Bernoulli GP classifier with the best-performing kernel function, namely, Matérn 5/2. There is a complex structure across the parameter space showing broad areas of instability. The stabilizing effect of high-density gradients is also apparent. For each marginal parameter scan, all remaining parameters were fixed at the center point of their respective domain.

Close modal

GP regression was employed to predict the mode frequencies, growth rates, and normalized electron heat fluxes across the unstable regions of the parameter space using the simulations, which were identified as MTMs as training data. Independent GPs are trained for each output. Initial experiments in this included multi-task GP models (where correlations between outputs can yield more accuracy) with an intrinsic coregionalization model (ICM) kernel on the outputs. However, it was found that they did not yield any significant improvement for our data compared to the independent predictors and they are more expensive to train.

Four different covariance functions [Matérn 1/2 (M12), Matérn 3/2 (M32), Matérn 5/2 (M52), and radial basis function (RBF)] were investigated for each output variable. Each model used a constant mean function. Initial experiments additionally included corresponding models with a linear mean function, but this provided no predictive improvement. The results of a fivefold cross-validation across the entire dataset of MTM-positive datapoints are shown in Table VII. The MSLL metrics show that every GP model performs better than the trivial model y ( μ D , σ D 2 ). The coverage shows that there is good agreement with the theoretical 95% credible interval for every model. There is no appreciable difference between the kernels with respect to the chosen metrics. However, there is a trade-off between the predictive variance and the inferred observation noise with respect to the kernel smoothness (not shown). As the kernels become smoother, the uncertainty is pushed from the predictive variance to the observation noise. This is reflected in the calibration error, which decreases slightly but monotonically with the smoothness of the kernel (RBF being the smoothest and M12 being the roughest).

TABLE VII.

Mean squared error (MSE), standardized MSE (SMSE), mean squared log loss (MSLL), calibration error (CE), and the coverage of the 95% credible interval for each investigated model. Mean and standard deviation were computed from a fivefold cross-validation. It should be noted that the MSE is not unitless, and thus, the values are significantly larger for the frequency than those for the growth rate and flux. Unitless metrics such as the MSLL show comparable values between the three outputs.

Model MSE SMSE MSLL CE C 0.95
Growth rate           
M12  0.069 ± 0.008  0.068 ± 0.016  −1.325 ± 0.106  0.092 ± 0.011  0.952 ± 0.015 
M32  0.070 ± 0.008  0.062 ± 0.014  −1.369 ± 0.116  0.083 ± 0.011  0.943 ± 0.017 
M52  0.070 ± 0.008  0.064 ± 0.015  −1.363 ± 0.117  0.080 ± 0.011  0.939 ± 0.019 
RBF  0.071 ± 0.008  0.066 ± 0.014  −1.350 ± 0.107  0.081 ± 0.011  0.934 ± 0.020 
Mode frequency           
M12  3.338 ± 0.518  0.030 ± 0.011  −1.692 ± 0.185  0.148 ± 0.011  0.957 ± 0.015 
M32  3.344 ± 0.517  0.029 ± 0.011  −1.742 ± 0.187  0.138 ± 0.012  0.953 ± 0.019 
M52  3.336 ± 0.516  0.031 ± 0.012  −1.722 ± 0.176  0.133 ± 0.012  0.948 ± 0.019 
RBF  3.320 ± 0.504  0.038 ± 0.012  −1.651 ± 0.154  0.122 ± 0.010  0.940 ± 0.024 
Quasi-linear flux           
M12  0.088 ± 0.014  0.039 ± 0.013  −1.485 ± 0.130  0.165 ± 0.008  0.963 ± 0.018 
M32  0.088 ± 0.014  0.041 ± 0.013  −1.493 ± 0.140  0.159 ± 0.009  0.960 ± 0.022 
M52  0.088 ± 0.014  0.043 ± 0.013  −1.482 ± 0.138  0.156 ± 0.009  0.956 ± 0.023 
RBF  0.088 ± 0.014  0.047 ± 0.013  −1.451 ± 0.128  0.147 ± 0.008  0.952 ± 0.025 
Model MSE SMSE MSLL CE C 0.95
Growth rate           
M12  0.069 ± 0.008  0.068 ± 0.016  −1.325 ± 0.106  0.092 ± 0.011  0.952 ± 0.015 
M32  0.070 ± 0.008  0.062 ± 0.014  −1.369 ± 0.116  0.083 ± 0.011  0.943 ± 0.017 
M52  0.070 ± 0.008  0.064 ± 0.015  −1.363 ± 0.117  0.080 ± 0.011  0.939 ± 0.019 
RBF  0.071 ± 0.008  0.066 ± 0.014  −1.350 ± 0.107  0.081 ± 0.011  0.934 ± 0.020 
Mode frequency           
M12  3.338 ± 0.518  0.030 ± 0.011  −1.692 ± 0.185  0.148 ± 0.011  0.957 ± 0.015 
M32  3.344 ± 0.517  0.029 ± 0.011  −1.742 ± 0.187  0.138 ± 0.012  0.953 ± 0.019 
M52  3.336 ± 0.516  0.031 ± 0.012  −1.722 ± 0.176  0.133 ± 0.012  0.948 ± 0.019 
RBF  3.320 ± 0.504  0.038 ± 0.012  −1.651 ± 0.154  0.122 ± 0.010  0.940 ± 0.024 
Quasi-linear flux           
M12  0.088 ± 0.014  0.039 ± 0.013  −1.485 ± 0.130  0.165 ± 0.008  0.963 ± 0.018 
M32  0.088 ± 0.014  0.041 ± 0.013  −1.493 ± 0.140  0.159 ± 0.009  0.960 ± 0.022 
M52  0.088 ± 0.014  0.043 ± 0.013  −1.482 ± 0.138  0.156 ± 0.009  0.956 ± 0.023 
RBF  0.088 ± 0.014  0.047 ± 0.013  −1.451 ± 0.128  0.147 ± 0.008  0.952 ± 0.025 

Figure 8 shows the MSLL and SMSE of the GP model using the Matérn 1/2 covariance kernel, as more data points from the additional batches described in Sec. VI B are cumulatively added. The classifier and regression are retrained after each iteration. Here, k-fold cross-validation with k = 10 is performed, with the testing data for each fold kept fixed with Ntest = 200 data points from the first batch. Broadly, each new batch improves the performance of the growth rate and quasi-linear heat flux predictors with respect to both metrics. However, for the mode frequency, the variance of the MSLL across the k = 10 folds first increases and then decreases again. Notably, as the variance increases, some folds are better than any of the previous and some are worse. This shows there are some testing data that are highly sensitive to the new data points from MTM-targeted batches (batches 2 and 3, see Table V). As the high-flux targeted batches (batches 4 and 5, see Table V) are added, the variance of the MSLL contracts again, indicating that the sensitive test points are resolved. There is some correlation between the mode frequency and electron flux, which may explain why the high-flux batches assist in resolving the sensitive mode frequency data points. It is also worth noting that the MSLL of the mode frequency is consistently low.

FIG. 8.

The evolution of the standardized mean squared error (SMSE) and the mean standardized log loss (MSLL) as more batches are added to the dataset, showing that for the growth rate and flux models, the predictive loss decreases as the data volume is increased. For the mode frequency models, the variance of the MSLL across batches first increases with batches 2 and 3 (for batch details see Table V), and then decreases with batches 4 and 5. Note that batches 2 and 3 were targeted toward areas with a high probability of finding an MTM, while batches 4 and 5 were targeted toward areas with high flux, as explained in Table V. For each boxplot, the box spans the first quartile (Q1) and the third quartile (Q3), while the centerline displays the second quartile (Q2), i.e., the median value. The whiskers show 1.5 IQR, where IQR is the interquartile range. The points show outliers. The batch number corresponds to those described in Table V.

FIG. 8.

The evolution of the standardized mean squared error (SMSE) and the mean standardized log loss (MSLL) as more batches are added to the dataset, showing that for the growth rate and flux models, the predictive loss decreases as the data volume is increased. For the mode frequency models, the variance of the MSLL across batches first increases with batches 2 and 3 (for batch details see Table V), and then decreases with batches 4 and 5. Note that batches 2 and 3 were targeted toward areas with a high probability of finding an MTM, while batches 4 and 5 were targeted toward areas with high flux, as explained in Table V. For each boxplot, the box spans the first quartile (Q1) and the third quartile (Q3), while the centerline displays the second quartile (Q2), i.e., the median value. The whiskers show 1.5 IQR, where IQR is the interquartile range. The points show outliers. The batch number corresponds to those described in Table V.

Close modal

The validation plots shown in Fig. 9 showing the results of the k-fold validation of the whole dataset give an indication of the global performance of the model but do not show how well it can reproduce specific local parametric dependencies, which are important in integrated modeling. Figures 10 and 11 depict scans in the ky parameter comparing the predictions of the model and the values given by GS2. Two scans are shown, one far away from marginal stability (Fig. 10) characterized by large growth rates, and one close to marginal stability (Fig. 11), with lower growth rates and a characteristic peak in the growth rate spectrum. In both cases, the model outputs are in good agreement with the GS2 predictions within the predictive uncertainties of the model.

FIG. 9.

Target vs predicted values (top row, error bars display the predictive standard deviation) and calibration curve ν ( q ) (bottom row) for the growth rate (left), mode frequency (center), and quasi-linear flux (right) for the models using the Matérn 1/2 covariance kernel. An ideally calibrated model is shown as an orange line. Here, a k-fold validation is performed (k = 5, where five independent 80–20 splits of the data are performed to cover all the data. (a) Growth rate. (b) Mode frequency. (c) Electron heat flux.

FIG. 9.

Target vs predicted values (top row, error bars display the predictive standard deviation) and calibration curve ν ( q ) (bottom row) for the growth rate (left), mode frequency (center), and quasi-linear flux (right) for the models using the Matérn 1/2 covariance kernel. An ideally calibrated model is shown as an orange line. Here, a k-fold validation is performed (k = 5, where five independent 80–20 splits of the data are performed to cover all the data. (a) Growth rate. (b) Mode frequency. (c) Electron heat flux.

Close modal
FIG. 10.

A parameter scan over ky away from marginal stability comparing the model output values (orange) with those predicted by GS2 (blue) for the growth rate (left), the mode frequency (middle), and the quasi-linear electron flux (right). The error bars represent the 95% confidence interval as predicted by the GP regression. It should be noted that the GS2 simulations were not included in the training set and that the values all fall within the confidence intervals of the model. The other model parameters are set to: q = 2.43, s ̂ = 3.0 , β = 0.26 , a / L n e = 0.29 , a / L T e = 5.91 , ν e i = 0.054.

FIG. 10.

A parameter scan over ky away from marginal stability comparing the model output values (orange) with those predicted by GS2 (blue) for the growth rate (left), the mode frequency (middle), and the quasi-linear electron flux (right). The error bars represent the 95% confidence interval as predicted by the GP regression. It should be noted that the GS2 simulations were not included in the training set and that the values all fall within the confidence intervals of the model. The other model parameters are set to: q = 2.43, s ̂ = 3.0 , β = 0.26 , a / L n e = 0.29 , a / L T e = 5.91 , ν e i = 0.054.

Close modal
FIG. 11.

A parameter scan over ky close to marginal stability comparing the model output values (orange) with those predicted by GS2 (blue) for the growth rate (left), the mode frequency (middle), and the quasi-linear electron flux (right). The error bars represent the 95% confidence interval as predicted by the GP regression. It should be noted that the GS2 simulations were not included in the training set and that the values all fall within the confidence intervals of the model. The other model parameters are set to: q = 4.45, s ̂ = 3.0 , β = 0.26 , a / L n e = 0.18 , a / L T e = 3.86 , ν e i = 0.094.

FIG. 11.

A parameter scan over ky close to marginal stability comparing the model output values (orange) with those predicted by GS2 (blue) for the growth rate (left), the mode frequency (middle), and the quasi-linear electron flux (right). The error bars represent the 95% confidence interval as predicted by the GP regression. It should be noted that the GS2 simulations were not included in the training set and that the values all fall within the confidence intervals of the model. The other model parameters are set to: q = 4.45, s ̂ = 3.0 , β = 0.26 , a / L n e = 0.18 , a / L T e = 3.86 , ν e i = 0.094.

Close modal

The growth rate predictions (left, Fig. 11) show two points of interest: First, the growth rate, and correspondingly the heat flux prediction, diverges from the actual values at the upper boundary of our ky range, indicating that there is a lack of data in this region requiring model improvement; second, the confidence intervals are significantly larger than the deviation of the model from the GS2 values, and more generally larger than the mean across this ky range indicating the complexity of capturing the zero growth rate manifold. The error bars here represent the predictive standard deviation, i.e., V ( f ( x ) ) + σ ϵ 2, where V ( f ( x ) ) is the variance of f and σ ϵ 2 is the noise variance. The latter is practically zero, and so, the variance of f dominates.

Figure 12 shows a comparison of the model with GS2 output values over a scan in the safety factor, q. Here, it is evident that there are two unstable branches in this part of parameter space, at low and high q. These two unstable branches are captured by the model, with the growth rate predicted to be negative at the values of q, where GS2 does not predict an unstable MTM ( 3 < q < 6). This highlights the complex nature of the outputs, particularly the growth rate. The model does not reproduce the frequency in between the two unstable branches well because this is a stable area where there are no data on which the GP model can train; thus, the model reverts toward its mean. Both branches maintain a dominant electromagnetic component ( | A | | | | ϕ |) and a large value of the tearing parameter, thereby confirming that both branches are micro-tearing modes; however, the mode structures in the higher branch are generally narrower in ballooning space, indicating a difference in which type of MTM the two branches contain. An example of a lower q and a higher q can be seen in Fig. 1.

FIG. 12.

A scan over the safety factor q comparing model output values (orange) with those predicted by GS2 (blue) for (top left) the growth rate, (top right) the mode frequency, and (bottom right) the quasi-linear electron flux. The error bars represent the 95% confidence interval as predicted by the GP regression. The other model parameters are set to: q = 4.45, s ̂ = 3.0 , β = 0.26 , a / L n e = 0.18 , a / L T e = 3.86 , ν e i = 0.094. Black dashed lines indicate the zero growth rate line. (Bottom left) (black) the classifier predicted MTM probability as a function of β. (blue) logical parameter denoting a GS2 run with an MTM with positive growth rate.

FIG. 12.

A scan over the safety factor q comparing model output values (orange) with those predicted by GS2 (blue) for (top left) the growth rate, (top right) the mode frequency, and (bottom right) the quasi-linear electron flux. The error bars represent the 95% confidence interval as predicted by the GP regression. The other model parameters are set to: q = 4.45, s ̂ = 3.0 , β = 0.26 , a / L n e = 0.18 , a / L T e = 3.86 , ν e i = 0.094. Black dashed lines indicate the zero growth rate line. (Bottom left) (black) the classifier predicted MTM probability as a function of β. (blue) logical parameter denoting a GS2 run with an MTM with positive growth rate.

Close modal

A further crucial parameter that shows criticality is β. Figure 13 shows that the model captures the stability boundary with high precision, again showing an agreement that is significantly better than the error bars suggest. In both Figs. 12 and 13, the classification of the data points in the scan and a comparison with the MTM probability as predicted by the GP-based classifier show the expected dip in probability, below the 0.5 threshold at low β and a corresponding dip between the branches within the confidence interval in the q-scan, which in both cases corresponds to the drop in the growth rate below 0 indicating a stable area.

FIG. 13.

A scan over the plasma β comparing model output values (orange) with those predicted by GS2 (blue) for (top left) the growth rate, (top right) the mode frequency, and (bottom right) the quasi-linear electron flux. The error bars represent the 95% confidence interval as predicted by the GP regression. The other model parameters are set to: q = 4.45, s ̂ = 3.0, ky = 0.26, a / L n e = 0.18 , a / L T e = 3.86 , ν e i = 0.094. Black dashed lines indicate the zero growth rate line. (Bottom left) (black) the classifier predicted MTM probability as a function of β. (blue) logical parameter denoting a GS2 run with an MTM with positive growth rate.

FIG. 13.

A scan over the plasma β comparing model output values (orange) with those predicted by GS2 (blue) for (top left) the growth rate, (top right) the mode frequency, and (bottom right) the quasi-linear electron flux. The error bars represent the 95% confidence interval as predicted by the GP regression. The other model parameters are set to: q = 4.45, s ̂ = 3.0, ky = 0.26, a / L n e = 0.18 , a / L T e = 3.86 , ν e i = 0.094. Black dashed lines indicate the zero growth rate line. (Bottom left) (black) the classifier predicted MTM probability as a function of β. (blue) logical parameter denoting a GS2 run with an MTM with positive growth rate.

Close modal

This paper described the development of Gaussian process-based models of the linear parameters of micro-tearing modes trained on data generated by a high-fidelity gyrokinetic code. Gaussian processes have advantages over neural network-based models for surrogate modeling via their ability to be more accurate at low data volume and their transparent and understandable confidence intervals.

The model consists of two components: a classifier, which predicts the probability of a given point in parameter space being unstable to MTMs, and a regressor, which provides the quantity of interest in the unstable regions predicted by the classifier. Combining these in an active learning loop, it was possible to significantly increase the generation efficiency of training data on a previously unexplored parameter space. An accurate model was created using approximately 5000 data points in a seven-dimensional parameter space requiring approximately 1M CPU hours.

A wider expansion of the model is planned, including:

  • Expansion of the parameter set to include all the geometric parameters listed in Table II. Increasing the dimensionality of the problem will require significantly expanding the data set.

  • Harnessing the confidence intervals intrinsic in Gaussian processes to inform the targeting of new simulations in parts of parameter space where the most improvement can be made.

  • Generalization of the model to account for other types of mode, which may be dominant, e.g., KBM, hybrid KBM, trapped electron modes (TEM), etc. This would benefit from the use of a gyrokinetic eigenvalue solver as this would allow the tracking of subdominant stable and unstable modes.

  • Expanding the current parameters to cover larger ranges. For example, it was apparent from Fig. 10 that the current ky range is insufficient to resolve the turbulent spectrum peak in all cases. It is thought that an expansion of this parameter to k y ρ s 7 would be sufficient to resolve the peak.

  • Furthermore, expanding the number of outputs to include the flux surface-averaged k 2 . A further component required for quasi-linear models. This could be further refined to predict the full eigenfunctions of the perturbed fields.

  • Integration of the model into the JINTRAC63 integrated modeling toolchain. This will include the development of bespoke saturation rules required to predict the total electromagnetic heat flux via full nonlinear turbulence simulations.

The authors would like to thank David Dickinson and the GS2 team for many discussions about the intricacies of the GS2 code.

This work has been (part-) funded by the EPSRC Energy Programme (Grant No. EP/W006839/1). To obtain further information on the data and models underlying this paper, please contact PublicationsManager@ukaea.uk.

The authors have no conflicts to disclose.

William Anestis Hornsby: Conceptualization (equal); Data curation (lead); Investigation (lead); Methodology (lead); Software (equal); Validation (equal); Writing – original draft (lead). Nikolaos Papadimas: Investigation (supporting); Methodology (supporting); Software (supporting). Ben Fourcin: Investigation (supporting); Methodology (supporting); Software (supporting). Jordan Hart: Investigation (supporting); Methodology (supporting); Software (supporting); Validation (supporting); Visualization (supporting). Ander Gray: Investigation (equal); Methodology (equal); Software (equal); Validation (equal). James Christopher Buchanan: Conceptualization (lead); Investigation (equal); Methodology (equal); Project administration (lead); Software (equal); Validation (equal); Writing – review & editing (equal). Bhavin S. Patel: Conceptualization (equal); Investigation (equal); Methodology (equal); Project administration (equal); Software (equal); Writing – review & editing (equal). Daniel Kennedy: Conceptualization (supporting); Investigation (equal); Methodology (equal). Francis Casson: Conceptualization (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). Colin M. Roach: Conceptualization (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). Mikkel B. Lykkegaard: Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (lead); Writing – original draft (equal). Huy Nguyen: Investigation (supporting); Methodology (supporting); Software (supporting); Validation (supporting); Visualization (supporting).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

When determining the accuracy of the GP classifier the following terms are used, they are listed here for the reader:

  • TP—True positive—The fraction of points correctly classified as growing MTMs.

  • FP—False positive—The fraction of points incorrectly classified as a growing MTM.

  • TN—True negative—The fraction of points correctly classified as a stable point.

  • FN—False negative—The fraction of points incorrectly classified as a stable point

  • Accuracy—The fraction of true positive and true negative points— ( TP + TN ) / ( TP + TN + FP + FN ).

  • Precision—The fraction of true positive over true positive and false positive— TP / ( TP + FP ).

  • Recall—The fraction of true positive over true positive and false negatives— TP / ( TP + FN ).

  • F1 score—A metric that accounts for both the precision and the recall, defined as the harmonic mean of the two, F 1 = 2 × Precision × Recall / ( Precision + Recall )

An ideal classifier would classify points as true positive or true negative only. Multiple metrics are used to analyze a classifier in unbalanced systems (when number of MTM points are significantly more or less than non-MTM points). An ideal classifier would give values of one for its accuracy, precision, recall, and F1.

  • MSE—Mean squared error: MSE = 1 N i = 1 N ( y i f ( x i ) ) 2

  • SMSE—Standardized MSE: SMSE = MSE σ y 2. The MSE standardized by the variance of the test data.

  • Coverage—The coverage C measures the coverage of the nth predictive credible interval with respect to the test data, with n [ 0 , 1 ], i.e.
    C n = 1 N i = 1 N I ( y i CI n ( x i ) ) ,
    (A1)

    where I ( · ) is the indicator function and CI n ( x ) is the n credible interval of testing datapoint x. For example, for the 0.95 credible interval, the coverage should be 0.95 for a well-calibrated model. Conversely, if CI n > n, the model is underconfident with respect to that particular credible interval and vice versa.

1.
W. M.
Tang
,
J. W.
Connor
, and
R. J.
Hastie
, “
Kinetic-ballooning-mode theory in general geometry
,”
Nucl. Fusion
20
(
11
),
1439
(
1980
).
2.
R. D.
Hazeltine
,
D.
Dobrott
, and
T. S.
Wang
, “
Kinetic theory of tearing instability
,”
Phys. Fluids
18
(
12
),
1778
1786
(
1975
).
3.
A. B.
Rechester
and
M. N.
Rosenbluth
, “
Electron heat transport in a tokamak with destroyed magnetic surfaces
,”
Phys. Rev. Lett.
40
,
38
41
(
1978
).
4.
W.
Guttenfelder
,
J.
Candy
,
S. M.
Kaye
,
W. M.
Nevins
,
E.
Wang
,
R. E.
Bell
,
G. W.
Hammett
,
B. P.
LeBlanc
,
D. R.
Mikkelsen
, and
H.
Yuh
, “
Electromagnetic transport from microtearing mode turbulence
,”
Phys. Rev. Lett.
106
(
15
),
155004
(
2011
).
5.
W.
Guttenfelder
,
J.
Candy
,
S. M.
Kaye
,
W. M.
Nevins
,
R. E.
Bell
,
G. W.
Hammett
,
B. P.
LeBlanc
, and
H.
Yuh
, “
Scaling of linear microtearing stability for a high collisionality National Spherical Torus Experiment discharge
,”
Phys. Plasmas
19
(
2
),
022506
(
2012
).
6.
M.
Giacomin
,
D.
Dickinson
,
D.
Kennedy
,
B. S.
Patel
, and
C. M.
Roach
, “
Nonlinear microtearing modes in mast and their stochastic layer formation
,”
Plasma Phys. Controlled Fusion
65
(
9
),
095019
(
2023
).
7.
H.
Sugama
, “
Gyrokinetic field theory
,”
Phys. Plasmas
7
(
2
),
466
480
(
2000
).
8.
P. J.
Catto
,
W. M.
Tang
, and
D. E.
Baldwin
, “
Generalized gyrokinetics
,”
Plasma Phys.
23
(
7
),
639
(
1981
).
9.
G. M.
Staebler
,
E. A.
Belli
, and
J.
Candy
, “
A flexible gyro-fluid system of equations
,”
Phys. Plasmas
30
(
10
),
102501
(
2023
).
10.
G.
Avdeeva
,
K. E.
Thome
,
S. P.
Smith
,
D. J.
Battaglia
,
C. F.
Clauser
,
W.
Guttenfelder
,
S. M.
Kaye
,
J.
McClenaghan
,
O.
Meneghini
,
T.
Odstrcil
, and
G.
Staebler
, “
Energy transport analysis of NSTX plasmas with the TGLF turbulent and NEO neoclassical transport models
,”
Nucl. Fusion
63
(
12
),
126020
(
2023
).
11.
J.
Citrin
,
S.
Breton
,
F.
Felici
,
F.
Imbeaux
,
T.
Aniel
,
J. F.
Artaud
,
B.
Baiocchi
,
C.
Bourdelle
,
Y.
Camenen
, and
J.
Garcia
, “
Real-time capable first principle based modelling of tokamak turbulent transport
,”
Nucl. Fusion
55
(
9
),
092001
(
2015
).
12.
A.
Pavone
,
A.
Merlo
,
S.
Kwak
, and
J.
Svensson
, “
Machine learning and Bayesian inference in nuclear fusion research: An overview
,”
Plasma Phys. Controlled Fusion
65
(
5
),
053001
(
2023
).
13.
I.-G.
Farcaş
,
G.
Merlo
, and
F.
Jenko
, “
A general framework for quantifying uncertainty at scale
,”
Commun. Eng.
1
(
1
),
43
(
2022
).
14.
J.
Citrin
,
P.
Trochim
,
T.
Goerler
,
D.
Pfau
,
K. L.
van de Plassche
, and
F.
Jenko
, “
Fast transport simulations with higher-fidelity surrogate models for ITER
,”
Phys. Plasmas
30
(
6
),
062501
(
2023
).
15.
S.
Dasbach
and
S.
Wiesen
, “
Towards fast surrogate models for interpolation of tokamak edge plasmas
,”
Nucl. Mater. Energy
34
,
101396
(
2023
).
16.
K. L.
van de Plassche
,
J.
Citrin
,
C.
Bourdelle
,
Y.
Camenen
,
F. J.
Casson
,
V. I.
Dagnelie
,
F.
Felici
,
A.
Ho
,
S.
Van Mulders
, and
JET Contributors
, “
Fast modeling of turbulent transport in fusion plasmas using neural networks
,”
Phys. Plasmas
27
(
2
),
022310
(
2020
).
17.
A. J.
Creely
,
M. J.
Greenwald
,
N. T.
Howard
,
F.
Sciortino
,
P.
Rodriguez-Fernandez
,
A. E.
White
, and
J. C.
Wright
, “
Vitals: A surrogate-based optimization framework for the accelerated validation of plasma transport codes
,”
Fusion Sci. Technol.
74
(
1–2
),
65
76
(
2018
).
18.
A.
Di Siena
,
A.
Bañón Navarro
,
T.
Luda
,
G.
Merlo
,
M.
Bergmann
,
L.
Leppin
,
T.
Görler
,
J. B.
Parker
,
L.
LoDestro
,
T.
Dannert
,
K.
Germaschewski
,
B.
Allen
,
J.
Hittinger
,
B. W.
Dorland
,
G.
Hammett
,
F.
Jenko
,
ASDEX Upgrade Team
, and
EUROfusion MST1 Team
, “
Global gyrokinetic simulations of ASDEX upgrade up to the transport timescale with GENE–Tango
,”
Nucl. Fusion
62
(
10
),
106025
(
2022
).
19.
A.
Ho
,
J.
Citrin
,
C.
Bourdelle
,
Y.
Camenen
,
F. J.
Casson
,
K. L.
van de Plassche
,
H.
Weisen
, and
JET Contributors
, “
Neural network surrogate of QuaLiKiz using JET experimental data to populate training space
,”
Phys. Plasmas
28
(
3
),
032305
(
2021
).
20.
M. T.
Curie
,
J. L.
Larakers
,
D. R.
Hatch
,
A. O.
Nelson
,
A.
Diallo
,
E.
Hassan
,
W.
Guttenfelder
,
M.
Halfmoon
,
M.
Kotschenreuther
,
R. D.
Hazeltine
,
S. M.
Mahajan
,
R. J.
Groebner
,
J.
Chen
,
C.
Perez von Thun
,
L.
Frassinetti
,
S.
Saarelma
,
C.
Giroud
,
J. E. T.
Contributors
,
M. M.
Tennery
, and
DIII-D Team
, “
A survey of pedestal magnetic fluctuations using gyrokinetics and a global reduced model for microtearing stability
,”
Phys. Plasmas
29
(
4
),
042503
(
2022
).
21.
M. T.
Curie
,
J.
Larakers
,
J.
Parisi
,
G.
Staebler
,
S.
Munaretto
,
W.
Guttenfelder
,
E.
Belli
,
D. R.
Hatch
,
M.
Lampert
,
G.
Avdeeva
,
T.
Neiser
,
S.
Smith
,
A.
Diallo
,
O.
Nelson
,
S.
Kaye
,
E.
Fredrickson
,
J. M.
Manela
,
S.
Lei
,
M.
Halfmoon
,
M. M.
Tennery
, and
E.
Hassan
, “
Microtearding mode study in NSTX using machine learning enhanced reduced model
,” arXiv:2304.08982.
22.
L.
Zanisi
,
A.
Ho
,
T.
Madula
,
J.
Barr
,
J.
Citrin
,
S.
Pamela
,
J.
Buchanan
,
F.
Casson
,
V.
Gopakumar
, and
JET Contributors
, “
Efficient training sets for surrogate models of tokamak turbulence with active deep ensembles
,” arXiv:2310.09024.
23.
J.
Barr
,
T.
Madula
,
L.
Zanisi
,
A.
Ho
,
J.
Citrin
,
V.
Gopakumar
, and
JET Contributors
, “
An active learning pipeline for surrogate models of gyrokinetic turbulence
,” in
48th EPS Conference on Plasma Physics 27 June–1 July
(
2022
).
24.
A. G.
Peeters
,
Y.
Camenen
,
F. J.
Casson
,
W. A.
Hornsby
,
A. P.
Snodin
,
D.
Strintzi
, and
G.
Szepesi
, “
The nonlinear gyro-kinetic flux tube code GKW
,”
Comput. Phys. Commun.
180
(
12
),
2650
2672
(
2009
).
25.
F.
Jenko
,
W.
Dorland
,
M.
Kotschenreuther
, and
B. N.
Rogers
, “
Electron temperature gradient driven turbulence
,”
Phys. Plasmas
7
(
5
),
1904
1910
(
2000
).
26.
M.
Barnes
,
D.
Dickinson
,
W.
Dorland
,
P. A.
Hill
,
C. M.
Parker
,
J. T.
Roach
,
S.
Biggs-Fox
,
N.
Christen
,
R.
Numata
et al (
2022
). “GS2 v8.1.2,”
Zenodo
. .
27.
H. G.
Dudding
,
F. J.
Casson
,
D.
Dickinson
,
B. S.
Patel
,
C. M.
Roach
,
E. A.
Belli
, and
G. M.
Staebler
, “
A new quasilinear saturation rule for tokamak turbulence with application to the isotope scaling of transport
,”
Nucl. Fusion
62
(
9
),
096005
(
2022
).
28.
X.
Chen
and
Q.
Zhou
, “
Sequential experimental designs for stochastic kriging
,” in
Proceedings of the Winter Simulation Conference
(IEEE, Piscataway, NJ,
2014
), pp.
3821
3832
.
29.
M.
Binois
,
J.
Huang
,
R. B.
Gramacy
, and
M.
Ludkovski
, “
Replication or exploration? Sequential design for stochastic simulation experiments
,”
Technometrics
61
(
1
),
7
23
(
2018
).
30.
S.
Seo
,
M.
Wallat
,
T.
Graepel
, and
K.
Obermayer
, “
Gaussian process regression: Active data selection and test point rejection
,” in
Mustererkennung 2000
, edited by
G.
Sommer
,
N.
Krüger
, and
C.
Perwass
(
Springer
,
Berlin, Heidelberg
,
2000
), pp.
27
34
.
31.
A. J.
Brizard
and
T. S.
Hahm
, “
Foundations of nonlinear gyrokinetic theory
,”
Rev. Mod. Phys.
79
,
421
468
(
2007
).
32.
J. A.
Krommes
, “
Nonlinear gyrokinetics: A powerful tool for the description of microturbulence in magnetized plasmas
,”
Phys. Scr.
2010
(
T142
),
014035
.
33.
A. E.
White
, “
Validation of nonlinear gyrokinetic transport models using turbulence measurements
,”
J. Plasma Phys.
85
(
1
),
925850102
(
2019
).
34.
T. F.
Neiser
,
F.
Jenko
,
T. A.
Carter
,
L.
Schmitz
,
D.
Told
,
G.
Merlo
,
A.
Bañón Navarro
,
P. C.
Crandall
,
G. R.
McKee
, and
Z.
Yan
, “
Gyrokinetic GENE simulations of DIII-D near-edge L-mode plasmas
,”
Phys. Plasmas
26
(
9
),
092510
(
2019
).
35.
W. A.
Hornsby
,
C.
Angioni
,
E.
Fable
,
P.
Manas
,
R.
McDermott
,
A. G.
Peeters
,
M.
Barnes
,
F.
Parra
, and
ASDEX Upgrade Team
, “
On the effect of neoclassical flows on intrinsic momentum in ASDEX upgrade Ohmic L-mode plasmas
,”
Nucl. Fusion
57
(
4
),
046008
(
2017
).
36.
W. A.
Hornsby
,
C.
Angioni
,
Z. X.
Lu
,
E.
Fable
,
I.
Erofeev
,
R.
McDermott
,
A.
Medvedeva
,
A.
Lebschy
,
A. G.
Peeters
, and
ASDEX Upgrade Team
, “
Global gyrokinetic simulations of intrinsic rotation in ASDEX upgrade Ohmic L-mode plasmas
,”
Nucl. Fusion
58
(
5
),
056008
(
2018
).
37.
G. M.
Staebler
,
J. E.
Kinsey
, and
R. E.
Waltz
, “
Gyro-Landau fluid equations for trapped and passing particles
,”
Phys. Plasmas
12
(
10
),
102508
(
2005
).
38.
G. M.
Staebler
,
J. E.
Kinsey
, and
R. E.
Waltz
, “
A theory-based transport model with comprehensive physics
,”
Phys. Plasmas
14
(
5
),
055909
(
2007
).
39.
C.
Bourdelle
,
J.
Citrin
,
B.
Baiocchi
,
A.
Casati
,
P.
Cottier
,
X.
Garbet
,
F.
Imbeaux
, and
JET Contributors
, “
Core turbulent transport in tokamak plasmas: Bridging theory and experiment with QuaLiKiz
,”
Plasma Phys. Controlled Fusion
58
(
1
),
014036
(
2015
).
40.
I. G.
Abel
,
M.
Barnes
,
S. C.
Cowley
,
W.
Dorland
, and
A. A.
Schekochihin
, “
Linearized model Fokker–Planck collision operators for gyrokinetic simulations. I. Theory
,”
Phys. Plasmas
15
(
12
),
122509
(
2008
).
41.
M.
Barnes
,
I. G.
Abel
,
W.
Dorland
,
D. R.
Ernst
,
G. W.
Hammett
,
P.
Ricci
,
B. N.
Rogers
,
A. A.
Schekochihin
, and
T.
Tatsuno
, “
Linearized model Fokker–Planck collision operators for gyrokinetic simulations. II. Numerical implementation and tests
,”
Phys. Plasmas
16
(
7
),
072107
(
2009
).
42.
M. A.
Beer
,
S. C.
Cowley
, and
G. W.
Hammett
, “
Field–aligned coordinates for nonlinear simulations of tokamak turbulence
,”
Phys. Plasmas
2
(
7
),
2687
2700
(
1995
).
43.
H.
Wilson
,
I.
Chapman
,
T.
Denton
,
W.
Morris
,
B.
Patel
,
G.
Voss
,
C.
Waldon
, and
STEP Team,
STEP–On the pathway to fusion commercialization
,” in
Commercialising Fusion Energy
(
2020
), Vol.
12
.
44.
B. S.
Patel
,
M. R.
Hardman
,
D.
Kennedy
,
M.
Giacomin
,
D.
Dickinson
, and
C. M.
Roach
, “
Relevance of E × B shear suppression of microtering based transport in spherical tokamaks
” (unpublished).
45.
D.
Kennedy
,
M.
Giacomin
,
F. J.
Casson
,
D.
Dickinson
,
W.
Hornsby
,
B. S.
Patel
, and
C. M.
Roach
, “
Electromagnetic gyrokinetic instabilities in step
,”
Nucl. Fusion
63
,
126061
(
2023
).
46.
M.
Giacomin
,
D.
Kennedy
,
F. J.
Casson
,
A.
C. J
,
D.
Dickinson
,
B. S.
Patel
, and
C. M.
Roach
, “
Electromagnetic gyrokinetic instabilities in the spherical tokamak for energy production (step) Part II: Transport and turbulence
,” arXiv:2307.01669.
47.
P. J.
Catto
and
M. N.
Rosenbluth
, “
Trapped electron modifications to tearing modes in the low collision frequency limit
,”
Phys. Fluids
24
(
2
),
243
255
(
1981
).
48.
R. L.
Miller
,
M. S.
Chu
,
J. M.
Greene
,
Y. R.
Lin-Liu
, and
R. E.
Waltz
, “
Noncircular, finite aspect ratio, local equilibrium model
,”
Phys. Plasmas
5
(
4
),
973
978
(
1998
).
49.
B. S.
Patel
,
D.
Dickinson
,
C. M.
Roach
, and
H.
Wilson
, “
Linear gyrokinetic stability of a high β non-inductive spherical tokamak
,”
Nucl. Fusion
62
(
1
),
016009
(
2021
).
50.
B. S.
Patel
,
L.
Pattinson
,
P.
Hill
,
M.
Giacomin
,
D.
Kennedy
,
D.
Dickinson
,
H. G.
Dudding
,
F. J.
Casson
, and
A. C.
Jayalekshmi
, “Pyrokinetics,” Github. https://github.com/pyro-kinetics/pyrokinetics.
51.
C. E.
Rasmussen
and
C. K. I.
Williams
,
Gaussian Processes for Machine Learning
, Adaptive Computation and Machine Learning (
MIT Press
,
Cambridge, MA
,
2006
).
52.
J.
Gardner
,
G.
Pleiss
,
K. Q.
Weinberger
,
D.
Bindel
, and
A. G.
Wilson
, “
GPyTorch: Blackbox matrix-matrix Gaussian process inference with GPU acceleration
,” in
Advances in Neural Information Processing Systems
(
Curran Associates, Inc
.,
2018
), Vol.
31
.
53.
J.
Hensman
,
A.
Matthews
, and
Z.
Ghahramani
, “
Scalable variational Gaussian process classification
,” in
Artificial Intelligence and Statistics
(
PMLR
,
2015
), pp.
351
360
.
54.
A. A.
Taha
and
A.
Hanbury
, “
Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool
,”
BMC Med. Imaging
15
(
1
),
29
(
2015
).
55.
I. M.
Sobol'
, “
On the distribution of points in a cube and the approximate evaluation of integrals
,”
USSR Comput. Math. Math. Phys.
7
(
4
),
86
112
(
1967
).
56.
V.
Kuleshov
,
N.
Fenner
, and
S.
Ermon
, “
Accurate uncertainties for deep learning using calibrated regression
,” in
Proceedings of the 35th International Conference on Machine Learning
(
PMLR
,
2018
), pp.
2796
2804
.
57.
Y.
Chung
,
I.
Char
,
H.
Guo
,
J.
Schneider
, and
W.
Neiswanger
, “
Uncertainty toolbox: An open-source library for assessing, visualizing, and improving uncertainty quantification
,” arXiv:2109.10254 (
2021
).
58.
K.
Tran
,
W.
Neiswanger
,
J.
Yoon
,
Q.
Zhang
,
E.
Xing
, and
Z. W.
Ulissi
, “
Methods for comparing uncertainty quantifications for material property predictions
,”
Mach. Learn.
1
(
2
),
025006
(
2020
).
59.
C. M.
Bishop
,
Pattern Recognition and Machine Learning. Information Science and Statistics
(
Springer
,
New York
,
2006
).
60.
T.
Hastie
,
R.
Tibshirani
, and
J. H.
Friedman
, “
The elements of statistical learning: Data mining, inference, and prediction
,” in
Springer Series in Statistics
, 2nd ed. (
Springer
,
New York
,
2009
).
61.
J. H.
Friedman
, “
Greedy function approximation: A gradient boosting machine
,”
Ann. Stat.
29
(
5
),
1189
1232
(
2001
).
62.
J. H.
Friedman
, “
Stochastic gradient boosting
,”
Comput. Stat. Data Anal.
38
(
4
),
367
378
(
2002
).
63.
M.
Romanelli
,
G.
Corrigan
,
V.
Parail
,
S.
Wiesen
,
R.
Ambrosino
,
P.
Da Silva Aresta Belo
,
L.
Garzotti
,
D.
Harting
,
F.
Köchl
,
T.
Koskela
,
L.
Lauro-Taroni
,
C.
Marchetto
,
M.
Mattei
,
E.
Militello-ASP
,
M. F. F.
Nave
,
S.
Pamela
,
A.
Salmi
,
P.
Strand
,
G.
Szepesi
, and
EFDA-JET Contributors
, “
JINTRAC: A system of codes for integrated simulation of tokamak scenarios
,”
Plasma Fusion Res.
9
,
3403023
(
2014
).