Many scientific problems can be formulated as sparse regression, i.e., regression onto a set of parameters when there is a desire or expectation that some of the parameters are exactly zero or do not substantially contribute. This includes many problems in signal and image processing, system identification, optimization, and parameter estimation methods such as Gaussian process regression. Sparsity facilitates exploring high-dimensional spaces while finding parsimonious and interpretable solutions. In the present work, we illustrate some of the important ways in which sparse regression appears in plasma physics and point out recent contributions and remaining challenges to solving these problems in this field. A brief review is provided for the optimization problem and the state-of-the-art solvers, especially for constrained and high-dimensional sparse regression.
I. INTRODUCTION
Despite spanning an enormous range of scales and dynamical regimes, plasmas exhibit a number of common challenges to their diagnosis and analysis. Almost all plasmas are numerically intensive to simulate, challenging to diagnose, and present significant barriers to open design or closed-loop control strategies in experiments. Many of these challenges are common with fluid dynamics, but understandably the solutions may be quite different for plasmas.
For instance, tokamak fusion reactors will require many active and robust control schemes to ensure safe and reliable operation. This includes feedback control for stabilizing the vertical position, shape control, current control, position control, neutral beam control, controlling core parameters such as the line-averaged density and plasma pressure, active control of the divertor particle and energy fluxes, control for energetic particles, and, perhaps most importantly, prediction and avoidance or mitigation of disruptions.1–3 The significant challenges of real-time control in tokamaks are somewhat converted into a high-dimensional and sensitive engineering design space in the stellarator. Stellarators require solving a very complex two-stage design problem subject to both physics and engineering constraints.4–6 Similarly, inertial confinement fusion poses a very challenging design problem. The recent extraordinary successes at the National Ignition Facility come in part from improved symmetry control and laser pulse design.7
In principle, researchers would like to perform full-fidelity simulations of realistic and dynamical plasmas with multi-species kinetic codes, but these simulations can require months of computational run-time despite modern advances in supercomputing. Partially reduced-order models such as gyro-kinetics or magnetohydrodynamics (MHDs) are also computationally intensive and only apply in parameter regimes where their simplifying assumptions are valid.
Sensing is also very challenging, and subsequently sparse measurements are a ubiquitous feature in the literature. Space plasmas are primarily sensed with expensive spacecraft that take point measurements along a flight path. By its very nature, bulk heliophysics measurements are limited to the plasma near the surface of stars. Future fusion devices must have core plasmas with temperatures on the order of 100 × 106 K and high neutron fluxes that necessitate a large neutron-absorbing blanket, other shielding, and subsequently very limited diagnostic access. Diagnostics in these devices are often limited to a set of spatial line-of-sight measurements in a single poloidal cross section, from which practitioners often attempt to accurately infer quantities that may in principle vary in two or even all three dimensions. Variants of this tomographic inversion problem8 have been studied in this field for decades and have some interesting parallels with coil optimization for stellarators, e.g., both are ill-posed inverse problems.
A. Review of modern sparse regression
Common to all of these challenges is some form of parameter estimation with a number of free parameters. This problem is not unique to the field of plasma physics. Regression models are ubiquitous in science and engineering, and it is often very useful to distinguish a relatively small number of variables in a regression that dominantly contribute to the identified model. This utility increases as the data and number of regression variables become large, i.e., in a high-dimensional optimization space; using a large number of regression variables can easily lead to overfitting the data, but using a small number of variables limits the expressivity of the model. Sparse regression facilitates a compromise by using many variables in the optimization, but returning a sparse solution representing the subset of the most important parameters.
The balance between sparse and parsimonious models and models with high prediction accuracy maps out a well-known curve called the Pareto-front.9 In order to produce a priori parsimonious models from regression, optimization problems must be solved that include a loss term that promotes some sense of sparsity in the solution. One of the most important sparsity priors is the l0 quasi-norm , a function that counts the number of nonzero terms in the vector . When added to a linear least squares regression, we have in total the following optimization problem:
The linear combination of parameters in is chosen to best fit the data in , subject to a requirement of sparsity. The α hyperparameter controls this balance between accuracy and sparsity, so that scanning through values of α will generate the Pareto-front. The optimal value of α in the Pareto-front, corresponding to the model that best satisfies this trade-off between accuracy and sparsity, should be determined by an information-theoretic metric such as the Akaike information criteria (AIC).10 The optimization problem in (1) can easily be extended for regression onto more general convex functions, rather than simple least squares. However, it follows from the nonsmoothness of the l0 that the application of typical gradient or Hessian-based optimizers such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm will not usually work. In the present review, we will use the term sparse regression to indicate an optimization problem consisting of a convex loss term and a sparsity prior, but will occasionally point out more general opportunities for sparsity-promotion.
To address the issue of nonsmoothness, a large literature exists on using different sparsity priors in place of l0, including the l1 norm,11 the lp norm12 , and variations that allow for sparsity within pre-defined groups, such as the group l1 norm.13 The l1-regularized regression problem is typically called Lasso or matching pursuit. These “relaxations” of the l0 term are often used because they have advantageous properties with respect to optimization, notably improved smoothness, convexity, and/or theoretical bounds on convergence given some constraints on the data properties. However, unlike the l0 norm, these relaxations do not typically produce a solution with components that are exactly zero, which is often an important requirement of the regression problem. For instance, the theoretical l1 recovery of the l0 optimal sparse solution is guaranteed only under quite strong assumptions on the data.14 Lasso often struggles to select the most important parameters without also including many weak and relatively superfluous parameters.15 Tikhonov regularization,16 or the l2 norm, is also common for inverse and other ill-posed problems but lacks any substantial relationship to the l0 problem beyond the fact that both terms regularize the primary objective. Tikhonov regularization is a useful tool for ill-posed problems such as the tomographic inversion relevant to many plasma diagnostics,17 but the solution variables will generally all be nonzero.
Subsequently, the sparse regression literature addressing the l0 norm optimization in (1) is prolific.18–24 Despite the fact that the problem is NP-hard, there has been very recent work in mixed-integer (discrete) optimization, showing that l0-based sparse regression can actually be solved to optimality with reasonable computational efficiency for low-dimensional problems24 and even some high-dimensional problems.25 Two caveats for exact methods for high-dimensional l0-based sparse regression: first, we find it unclear whether the methods can easily incorporate additional optimization constraints. For instance, the MIOSR algorithm for system identification allows for convex constraints and scales well with the problem dimension, but scales poorly with the difficulty of the optimization problem,24 so it is not clear that this algorithm is suitable for difficult problems in high-dimensional spaces. Second, there is limited open-source code availability to perform general or sparse-regression-specific high-dimensional mixed-integer optimization. Nonetheless, this is an exciting development for future progress in high-dimensional sparse regression.
In most previous research using high-dimensional and constrained sparse regression, greedy algorithms26 are used that iteratively select nonzero features in . Unfortunately, greedy algorithms can fail spectacularly, so there has been substantial research into deriving performance guarantees for greedy algorithms, given some reasonable properties of the underlying optimization problem. The most substantial theoretical performance guarantees for greedy algorithms come from formulating an objective function that is submodular. A review of submodularity can be found in Bilmes,27 and recent work has also found theoretical performance bounds for greedy optimization of some non-submodular functions,28 including the mean-squared error.29 This literature provides a strong motivation for researchers to choose submodular or similar optimization objective functions.
Sparsity-promotion is used for nonconvex problems too. Unfortunately, these problems are very challenging to solve because there are now multiple nonconvex terms and the l0 norm is nonsmooth. Once again the problem is typically addressed by convex relaxations of the l0 or greedy algorithms. Two examples in the literature are sparsifying deep neural networks to avoid overfitting, reduce memory, and enforce group structure,30 (but the problem uses the l1), and system identification that uses the multi-time step dynamical prediction error as a loss term31 (but greedy techniques are used). Despite the challenges, high-dimensional sparse regression results for neural network compression indicate that greedy magnitude pruning often performs quite well.32 Finally, note that most of the preceding discussion generalizes to the Bayesian inference settings,33,34 where it is roughly the case that the l1 maps to the Laplace prior, the l0 maps to the spike-and-slab prior, and relaxations of the spike-and-slab are common.35
The primary point for the present review is that a litany of algorithms is now available to solve (1) in high-dimensional spaces and with Bayesian inference, convex constraints, variations of the sparsity prior, and some theoretical performance guarantees. Therefore, we move on now to a discussion of the plasma physics problems that take an equivalent or similar form to (1). Further details regarding optimization considerations can be found in the review from Bertsimas et al.36
B. Contributions of this work
This work is intended to be a concise and useful manual for plasma physics practitioners to quickly understand the opportunities, optimization problems, and algorithms appearing in sparse regression, which typically appears only briefly (and sometimes disjointly because of the diverse applications) in larger reviews on data-driven methods for plasma physics.37,38 We highlight new work, by ourselves and others, in this field which demonstrates that an understanding of new innovations and remaining challenges in sparse regression can lead to novel solutions for a broad array of useful plasma physics problems.
II. SPARSE REGRESSION SUBFIELDS
Notions of sparsity occur very naturally in a number of plasma physics applications ranging from system identification to signal processing. To some extent, the plasma physics community has addressed the ubiquity of sparse and limited plasma measurements by numerous innovations in diagnostic techniques, installation of many diagnostics per device, the operation of such diagnostics over years, and intensive research into high-fidelity numerical simulations. Thus, despite the limited spatial resolution of plasma measurements in experiments, the modern era has actually seen the proliferation of enormous volumes of experimental and simulated plasma physics data, which can be used to discover new data-driven models across the large parameter regimes spanned by different subfields of plasma physics. Toward that goal, we focus this review first on the task of system identification.
A. System identification
System identification is model discovery, the subfield of identifying a physical or reduced-order model from data, and is increasingly playing a role in plasma physics. Identifying or approximating the true underlying dynamics of a plasma system has obvious importance to scientists, including for understanding dynamical behavior, estimating physical parameters, such as the viscosity or resistivity, improving simulations, forecasting, and much more. In particular, reduced-order modeling is the process of finding low-dimensional models that approximate the true high-dimensional system, which is often restricted to a low-dimensional manifold. Identifying reduced-order models can be useful for a number of tasks, since they can be computed more efficiently than the high-dimensional models, coupled with control strategies, and investigated for insights into the dynamical behavior.
Traditional system identification typically assumes that the underlying data can be fit reasonably well with a linear model; this assumption is often not well-justified physically, but is suitable for highly resolved data and particularly useful for applying real-time control. However, nonlinear models coupled with constrained model predictive control (MPC) appear feasible for a number of scenarios.39 These nonlinear schemes may be required for complex modeling scenarios, such as the dynamics of detachment in the tokamak divertor, as we highlight in Sec. III A.
Since system identification refers to any method that builds a model from data, it encompasses an enormous swath of literature across scientific disciplines. A large number of varied approaches have been developed in recent years, such as the dynamic mode decomposition,40,41 Koopman theory,42 nonlinear autoregressive algorithms,43 neural networks,44–46 methods based on Gaussian process regression,47 operator inference and reduced-order modeling,48–50 genetic programming,51,52 divide-and-conquer strategies,53 and sparse regression. We focus on system identification based on sparse regression because it produces an interpretable set of equations that can be scientifically analyzed. It can also be computed very efficiently, and subsequently such methods have seen prolific use across scientific fields. One of the most common such methods is called the sparse identification of nonlinear dynamics (SINDy).18
Within the field of plasma physics, recently scientists have attempted to discover nonlinear SINDy models from data, including finding models for anode-glow oscillations,54 simplified models for the L–H transition in tokamaks,55 reduced-order models for magnetohydrodynamic (MHD) simulations,56 hierarchical plasma models from data,57 more accurate plasma closures,58,59 turbulence modelling,60 and models for the tokamak divertor.61
It is a reasonable expectation that nonlinear models of explicit equations can represent nonlinear systems more faithfully than linear models, but there has been also been substantial progress building useful linear models by embedding nonlinear systems in much higher-dimensional spaces. For a review on this Koopman theory perspective, see, e.g., Brunton et al.42 In addition, interpretable nonlinear models obtained from system identification can come with some caveats, which we highlight now in order to warn practitioners and point to potential solutions.
For instance, even if the equations extracted from the data have all of the correct dynamical terms, the coefficients on these terms will inevitably have errors from finite sampling rates, numerical precision limits, or noise in the dataset. This is problematic for the purpose of forecasting (if the practitioner is interested in the equations for other reasons, the following discussion is irrelevant), because very small deviations from the true dynamical equations can result in models that are unstable for some subset of initial conditions. This can be resolved straightforwardly for linear data-driven systems of arbitrary state dimension; during the optimization, require a negative semidefinite matrix in the fit of as in, e.g., Pan and Duraisamy62 Stability analysis is much more challenging for nonlinear systems, but there are some methods for guaranteeing local or global stability for quadratically nonlinear models, which have relevance for many fluid and magnetohydrodynamic models.63–65 To illustrate, Fig. 1 shows the correct simulation of new trajectories from a provably stable, nonlinear, and data-driven model trained on a noisy trajectory of the Lorenz63 system.66 Finally, recent work has utilized the multi-time step dynamical prediction error as a loss term,31,67 sacrificing convexity for much improved (but not guaranteed) model stability. This is a worthwhile trade for a number of applications.
Moreover, there have been recent and substantial improvements in the robustness of sparse regression for system identification. SINDy has been extended to handle more complex modeling scenarios, such as partial differential equations (PDEs),19,68 delay equations,69 stochastic differential equations,70–73 Bayesian approaches,35 systems with inputs or control,39,74,75 systems with implicit dynamics,76,77 hybrid systems,78,79 to enforce physical constraints,21,56,80 to incorporate information theory81 or group sparsity82 or global stability,65 to identify models from corrupt or limited data,83–86 to identify models with partial measurements of the state space,67,87–89 to identify models with clever subsampling strategies,90 and ensembles of initial conditions,91 to perform cross-validation with ensemble methods,92 and extending the formulation to include weak or integral terms,57,93–98 tensor representations,99,100 and stochastic forcing.101 In particular, the weak formulation drastically improves performance when noise is present in the data and we recommend this formulation for essentially all applications.
The enormous number of modifications to this method can be overwhelming for practitioners, and which advanced techniques to utilize will depend on the purpose of the system identification. For general use, many of the quoted sparse-regression-based system identification advances have been implemented in the open-source PySINDy code.102,103 The technicalities are mostly hidden from the user interface, and a large set of examples and YouTube tutorials are available. Moreover, large-scale datasets have been recently incorporated in PySINDy for standardized benchmarks of the code.104
B. Sparse sensing
Sparse sensing (also called optimal sensor placement)105 is the subfield of maximizing the information gleaned from a sparse set of measurements. Sparse sensing usually refers to choosing the most informative points for sampling data and applying actuation.106 Since plasmas are inherently hard to diagnose, we expect that research in sparse sensing could produce significant results, especially for extremely sparse diagnostic environments, such as nuclear fusion devices, spacecraft, and space weather.107 These methods can be extended to incorporate cost constraints,108,109 representing anything from literal financial costs to electricity requirements to varying levels of diagnostic accessibility.
Sparse sensing and compressed sensing110 are closely related. Compressed sensing relies on the fact that signals can often be sparsely represented in some universal basis such as a Fourier basis. Compressed sensing is clearly related to optimal sensor placement because one can phrase the problem as selecting a measurement matrix that facilitates maximal compression. The primary difference is that the optimal sensor placement algorithms typically rely on “tailored” bases rather than universal ones, i.e., using the basis of biorthogonal modes111,112 computed from a training dataset, rather than a Fourier or other universal basis. Furthermore, compressed sensing has been used to an extent in the plasma physics field113–117 but not to select optimal measurement points but rather to avoid overfitting, provide data compression, or address ill-posedness. More sophisticated nonlinear methods of sparse sensing are also increasingly available.118 As with many high-dimensional l0 optimization problems, most methods of sparse sensing rely on greedy algorithms with submodular objectives. As far as the authors are aware, there has been little consideration in the plasma physics field for using these methods to choose diagnostic measurement locations in simulations or real experiments and we consider this as a useful opportunity for future work.
As a simple illustration, a particle-in-cell code is used to simulate the 1D two-stream instability and a biorthogonal basis is learned from the data. This basis is used, alongside an initial set of 20 pointwise sensor locations, by a greedy sensor placement algorithm105 in order to discover the most informative locations in the phase space. Figure 2 shows the phase space reconstruction errors during another simulation of the two-stream instability, at a single snapshot in time, for 20 randomly placed sensors vs the 20 sensors that were greedily optimized. The optimized sensors clearly pick out the important peaks and troughs where the particles tend to bunch up in the phase space, resulting in much better reconstruction errors. Greedy algorithms are useful here because one is choosing 20 points to sample from the set of all points in the full phase space grid. Even for this simple problem, the set of all gridpoints is a high-dimensional space.
C. Gaussian process regression
Gaussian process regression is now a common tool across the plasma physics field. For instance, it has been used for fitting diagnostic profiles in fusion devices,119 accelerating the convergence of global transport simulations,120 solar wind event classification,121 and laser pulse shape optimization in laser-wakefield accelerators.122 There are a number of excellent introductions to Gaussian process regression available in the literature.123 For our purposes, it matters only that Gaussian process regression is regression, i.e., that a set of outputs, in a Bayesian framework, are modeled with a set of Gaussian processes with free parameters. The resulting parameter estimation is typically done with maximum likelihood with a penalization for the complexity (in other words, a reward for sparsity) of the parameters.123 The sparsity accounts for outliers and promotes parsimonious models. Common distributions used for sparsity promotion include Student's t, logistic, Laplace, spike-and-slab, and horseshoe distributions. Student's t was recently used for tokamak profile fits124 and the sparse GP technique from Almosallam et al.125 were used for prediction implosion yields in inertial confinement fusion.126 To see how this relates back to sparse regression as formulated above, note that maximum a posteriori (MAP) estimation using the Laplace distribution corresponds to regression with the l1 norm.127 We expect sparse Gaussian process techniques to expand to new and high-dimensional applications in the plasma physics field, and the connection with sparse regression is fundamental.
III. SPECIFIC APPLICATIONS IN PLASMA PHYSICS
We have discussed three subfields of sparse regression for which there are a myriad of applications in the plasma physics field. A number of notable examples in the literature were pointed out, but the examples in the present work have so far been limited to illustrative toy problems. We now transition to demonstrating a few concrete and realistic applications of these methods in the literature, which are focused on fusion experiments because this is the present authors' subfield of mutual expertise. Section III B will dive considerably deeper into a discussion of sparse regression for stellarators, where we see significant opportunities for sparsity-promotion across a wide range of optimization problems.
A. Data driven models of the tokamak divertor
As we indicated in the introduction, the real-time control needs of the tokamak fusion concept are very significant. One of the most important challenges is divertor detachment control,128–130 since this strongly affects the large heat fluxes to the divertor and elsewhere. Accurately modeling divertor detachment requires a boundary transport plasma model, such as the coupled 2D fluid plasma and kinetic neutral transport code SOLPS-ITER.131 These codes are too computationally intensive to include in a real-time control loop, so reduced-order data-driven models are needed that both approximate the SOLPS-ITER solution and can be computed fast enough to be integrated into the closed loop. We have highlighted this problem because it illustrates an important use case for data-driven modeling and sparse system identification in the fusion community.
To begin to address this challenging problem, recent work61 reduces divertor detachment control using extrinsic impurity seeding to the more straightforward problem of controlling two pointwise plasma quantities using main ion gas puff actuation. The SINDy method is used to generate linear and nonlinear models for the evolution of pointwise measurements of the electron density (upstream) and temperature (downstream). The models are computationally efficient enough to be both incorporated into a model-predictive control (MPC) loop and automatically retrained if model errors become greater than a preset threshold. Figure 3 illustrates the geometry of the tokamak boundary, alongside a baseline setpoint scenario where a linear SINDy model is shown to accurately track the full SOLPS-ITER evolution of the pointwise measurements. In more advanced scenarios, the authors show that MPC with nonlinear models is required for tracking the true evolution of the pointwise measurements.
B. Stellarator optimization
A particularly interesting opportunity for sparse regression comes in the form of the many variants of stellarator optimization. Stellarator optimization is typically divided into two stages. The first is a configuration optimization using fixed-boundary MHD equilibrium codes to obtain MHD equilibria with desirable physics properties.132–134 After obtaining the optimal magnetic field in this first stage, complex coils must be designed to produce these fields135–137 and this complexity raises the cost and difficulty of manufacturing. Both stages of stellarator optimization can be performed with a large number of degrees of freedom, and sparsity can be useful in various scenarios. Stage-2 optimization with coils and permanent magnets is particularly appealing for sparsity-promoting regularizations, since an optimization using the Biot–Savart law as the cost function will always be ill-posed.5
Recently, we have reformulated permanent magnet optimization for stellarators as sparse regression and provided new algorithms for solving this problem.138 Thanks to the connections with sparse regression, we were able to design greedy algorithms that can compete with the state of the art, while being significantly simpler in design, faster to compute, and guaranteed to generate solutions with advantageous engineering properties (binary, grid-aligned magnets).139 Figure 4 illustrates a novel permanent magnet solution for a four-period quasi-helically symmetric stellarator in Landreman and Paul,140 scaled to the 0.15 T on-axis magnetic field strength of the MUSE permanent magnet experiment.141 A similar approach appears feasible with other problems appearing in the stellarator field, including winding-surface optimization135 and superconducting tile optimization.142
Stage-1 stellarator optimization for the plasma boundary may even benefit from promoting sparsity; the space of Fourier modes describing the boundary can be chosen quite large. Additionally, there may be interesting and useful stellarator shapes that are dominantly described by relatively high-order Fourier harmonics. In practice, stage-1 convergence is quite challenging if one uses many Fourier modes (i.e., many sub-optimal local minima can appear), so often only the first Fourier modes are considered in each angular direction. Even further, the optimization is frequently performed in multiple stages, starting with just a few modes, converging that solution, and using it as an initial condition to another optimization over additional Fourier degrees of freedom.143 By using sparsity-promotion to regularize the problem, we may be able to directly optimize in the higher-dimensional space of Fourier modes, finding new and improved stellarators while retaining some parsimony in the final configuration.
Finally, even more exotic stellarator optimization problems could be attempted with sparsity-promotion. A different potential stage-1 sparsity problem could be formulated to lose alpha particles in a set of sparse locations, corresponding to places for liquid metal or divertors, instead of having alpha losses that are uniformly nonzero or large in certain unfavorable directions.
IV. SUMMARY AND CONCLUSIONS
Sparse regression appears across science, including increasingly in plasma physics, and it is often indispensable for producing high-quality and interpretable results from high-dimensional optimization and parameter estimation. Recent work across the scientific community continues to improve the robustness of these methods for tasks, such as system identification, compressed sensing, and other optimization problems. New algorithms increasingly address more advanced scenarios, such as constrained or high-dimensional regression.
Plasma physics and its subfields, such as nuclear fusion, laser wakefield acceleration, plasma propulsion, and heliophysics, regularly encounter challenges that can be addressed by formulating the problem as sparse regression. We posit that there are significant opportunities for future work, including in stellarator optimization, compressed sensing, and general parameter estimation, including Gaussian process regression.
ACKNOWLEDGMENTS
The authors thank Eduardo Paulo Alves for providing the two stream instability simulation data. This work was supported by the U.S. Department of Energy under Award Nos. DEFG0293ER54197 and DE-ACO5-000R22725, through a grant from the Simons Foundation under Award No. 560651, and by the National Science Foundation under Grant No. PHY-2108384.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Alan Ali Kaptanoglu: Conceptualization (lead); Investigation (lead); Writing – original draft (lead). Christopher Hansen: Conceptualization (equal); Supervision (equal); Writing – review & editing (equal). Jeremy D. Lore: Investigation (supporting); Methodology (supporting). Matt Landreman: Conceptualization (equal); Software (equal); Supervision (equal). Steven L. Brunton: Conceptualization (equal); Supervision (equal); Writing – review & editing (supporting).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.