We study the 2D Orszag–Tang vortex magnetohydrodynamics (MHD) problem through the use of physics-constrained convolutional neural networks (PCNNs) for forecasting the density, ρ, and the magnetic field, B, as well as the prediction of B given the velocity field v of the fluid. In addition to translation equivariance from the convolutional architecture, other physics constraints were embedded: absence of magnetic monopoles, non-negativity of ρ, use of only relevant variables, and the periodic boundary conditions of the problem. The use of only relevant variables and the hard constraint of non-negative ρ were found to facilitate learning greatly. The divergenceless condition · B = 0 was implemented as a hard constraint up to machine precision through the use of a magnetic potential to define B = × A. Residual networks and data augmentation were also used to improve performance. This allowed for some of the residual models to function as surrogate models and provide reasonably accurate simulations. For the prediction task, the PCNNs were evaluated against a physics-informed neural network, which had the ideal MHD induction equation as a soft constraint. Several models were able to generate highly accurate fields, which are visually almost indistinguishable and have low mean squared error. Only methods with built-in hard constraints produced physical fields with · B = 0. The use of PCNNs for MHD has the potential to produce physically consistent real-time simulations to serve as virtual diagnostics in cases where inferences must be made with limited observables.

Magnetohydrodynamics (MHD) describes the low-frequency dynamics of a plasma, which is modeled as an electrically conducting fluid influenced by electromagnetic fields.1 MHD is used to describe a wide range of systems including turbulence in plasmas,2,3 quantum plasmas,4 Tokamaks,5–9 plasma detachment in magnetic nozzles,10 liquid metals,11,12 and intense ion beams in particle accelerators.13 MHD simulations are complex, describing systems with many evolving nonlinear fields and can be computationally expensive.

In this work, we utilize physics constrained neural networks (PCNNs) via a novel neural operator approach for building hard physics constraints into convolutional neural networks (CNNs),14 for the study of MHD. We focus on the Orszag–Tang vortex,15 a widely used test case for 2D MHD simulations.16–18 We generate synthetic MHD data using physics simulations and test various neural network-based methods for creating surrogate models of the MHD simulation, which are able to quickly predict the evolution of the data, such as using magnetic field and charge density values at one time step to predict those fields at subsequent time steps. We also test incorporating neural network-based methods as solvers within the physics model to speed up computationally expensive parts of the MHD simulation.

The use of PCNNs has the potential to provide physically consistent and accurate surrogates of MHD simulations, which run orders of magnitude faster than traditional approaches. Complex plasma processes could benefit from real-time data processing19 and real-time data analysis.20 The PCNN-based methods studied in this work, although not as accurate as physics models, are incredibly fast and therefore may be useful for real-time applications such as finding approximate solutions to inverse problems for operational systems with limited direct observations by providing a good initial guess, which is refined by computationally expensive physics models. Such ML-based methods, combined with robust model-independent control algorithms,21 as was demonstrated for intense electron beams in the LCLS free electron laser,22 have the potential to provide real-time feedback control for complex plasma physics systems.

We consider the case of ideal MHD where there is perfect conductivity (thus no magnetic diffusivity) and no viscosity. The governing equations of ideal MHD are given by Alfvén,1 Mocz et al.,17 Jackson,23 and Mocz et al.24,
D ρ D t = ρ · v ,
(1)
ρ D v D t = · ( p I + B B T ) ,
(2)
B t = · ( B v T v B T ) ,
(3)
· B = 0.
(4)
In addition, there is an equation of state for p. The variables ρ, p, v, and B are the mass density, total pressure, velocity, and magnetic fields, respectively, while
D D t = t + v ·
(5)
is the material derivative. Equations (1)–(4) correspond to the conservation of mass, the force equation, the ideal MHD induction equation (i.e., conservation of magnetic flux), and Gauss's law for magnetism, respectively. Also, the total pressure is a sum of the mechanical pressure of the fluid and the magnetic pressure: p = p fluid + B 2 2. Even though in 2 + 1 electromagnetism the magnetic field is a pseudo-scalar field,25,26 it is common in 2D MHD to take it as a pseudo-vector in order to have a better analogy to the 3D MHD equations and we do so here (see the  Appendix A for details of generalizing the ideal 3D MHD equations to arbitrary dimensions).

For what follows, we consider a monatomic, perfect gas.27 In this case, the equation of state becomes p fluid = ( γ 1 ) ρ u, where u is the gas's thermal energy density per mass and γ is the gas adiabatic index. We follow the example of previous works16,17 and let γ = 5 3 to have better analogy to the 3D case.

For the Orszag–Tang vortex, ρ and p are initially constant. We use the initial conditions at t = 0 of
ρ ( r , 0 ) = γ 2 4 π ,
(6)
p ( r , 0 ) = γ 4 π ,
(7)
v ( r , 0 ) = ( sin ( 2 π y ) , sin ( 2 π x ) ) ,
(8)
B ( r , 0 ) = 1 2 π 4 π ( cos ( 4 π x ) 2 + cos ( 2 π y ) ) ,
(9)
where = ( y , x ).

For this work, Orszag–Tang vortex data were generated using a physics simulation that utilizes a finite volume method together with constrained transport on a static mesh as described in Refs. 17 and 28. The physics code is used to generate large data sets, which include the temporal evolution of the fields ( ρ , v , B , p ) in Eqs. (1)–(4). We then study various neural network-based approaches for reproducing these fields and for capturing their dynamics, such as using field values at one time B ( x , y , t ) to predict their values at a subsequent time step B ( x , y , t + Δ t ).

For most of this work, we utilize convolutional neural networks (CNN), which take 128 × 128 pixel images as inputs. CNNs are used for this image-based ML because they have been demonstrated to have the best performance for image processing and image generation. When generating a 2D image, such as the mass density ρ ( t , x , y ) at time t for a 128 × 128 grid of values of (x, y), the CNN generates the entire image in one shot, and therefore, when referring to CNN-generated values such as ρ ( t , x , y ), we simply write ρ ( t ). In the one case where we use a dense neural network to map out individual values at individual (x, y) positions, we write out ρ ( t , x , y ).

For accurate predictions, one would expect low mean squared error (MSE) and high structural similarity (SSIM) (SSIM values are between −1 and 1 with a value of 1 for images that are exactly the same) between true and predicted images. In all of the results in this work, it was found that the MSE and SSIM were consistent; that is, as MSE increased, SSIM decreased and models with lower MSE also had higher SSIM values relative to each other. However, the sensitivity of MSE and SSIM was, as expected, very different. For example, Fig. 3 shows the true and predicted mass density ρ as predicted by several of the ML models at various time steps. Visually, the blue ρ predictions of all of the models look similar, but they clearly have very different levels of error as shown in the red sub-plots. In Fig. 4, this is numerically summarized by the MSE and SSIM. While all SSIM values remain within roughly 10% of each other, the MSE values differ by up to 1000%.

For the relatively simple problem of predicting mass density ρ, the networks having a simple positivity constraint of ρ 0 enforced with a ReLU output function had the best performance. For the more challenging problem of predicting the magnetic field B, some trade-offs were found between how easy it was to train a network in terms of MSE or SSIM accuracy of the B field prediction vs constraint violation. Networks with no constraints or only soft constraints were able to minimize a given cost function faster (in fewer training steps) than those with hard physics constraints, but produced nonphysical fields. This result is not very surprising because unconstrained networks (including PINNs, networks where the constraint is soft as part of the cost function) are free to generate completely arbitrary functions over the largest space of possibilities. On the other hand, the networks with the hard physics constraint of · B = 0, which was strictly enforced by generating B = × A, were the most restricted in terms of the families of functions that they could generate. Of all of the approaches considered, only the networks with built-in hard constraints generated physical fields (within numerical precision).

The simulated Orszag–Tang vortex data were generated using the finite volume method and constrained transport on a static mesh.17,28 The fluid is simulated over a unit square subject to spatial periodic boundary conditions. The procedure makes use of conserved quantities while also maintaining a discretization of the divergence-free condition, · B = 0. Furthermore, details of the simulation can be found in  Appendix B.

Some models that appear later will mimic the constrained transport of the simulation; thus, we discuss briefly how it functions.

In order to maintain · B = 0 in each time step, we use the constrained transport method on a static mesh.28 Each grid point represents the center of a cell, but now we utilize a staggered grid representation to also represent quantities that exist on the face centers and corners of cells. The constrained transport setup is shown in Fig. 1. The magnetic field is computed via the cell-face averaged magnetic field, b, which is related to the cell volume-averaged magnetic field, B, by
B i , j x = 1 2 ( b i + 1 2 , j x + b i 1 2 , j x ) , B i , j y = 1 2 ( b i , j + 1 2 y + b i , j 1 2 y ) .
(10)
FIG. 1.

Constrained transport mesh setup.

FIG. 1.

Constrained transport mesh setup.

Close modal
The quantities b are updated using the electric field at the corners of each cell in a way that preserves the following discretization of · B:
( · B ) i , j = b i + 1 2 , j x b i 1 2 , j x Δ x + b i , j + 1 2 y b i , j 1 2 y Δ y ,
(11)
where Δ x and Δ y are the side lengths of a cell. Note that throughout this work, we use a grid with Δ x = Δ y.

It is well known that deep neural networks can represent both functions and operators with arbitrary accuracy.29–31 Deep neural networks are now being utilized for a wide range of scientific research due to the availability of incredibly powerful high-performance computing,32 combined with their ability to represent complex functions over compact sets with the representation being learned directly from high dimensional data, such as large images.33 For example, in the field of plasma physics, deep neural networks have been applied to a wide range of problems,34,35 including the use of neural networks as surrogates for equations of state.36 Neural networks can take advantage of transfer learning for the design optimization of internal confinement fusion,37 and deep surrogate models can take advantage of intelligent iterative sampling of expensive simulations for faster training.38 

Several methods have been developed in an effort to incorporate physics within neural networks. The most general and most popular approach by far is that of physics informed neural networks (PINNs). In a PINNs approach, additional terms are added to the cost function used to train a neural network, typically in the form of a PDE, which describes the underlying physics such as electrodynamics39 and fluid mechanics.40 

Although very general, the main limitations of PINNs are a lack of any guarantee of respecting the physics as they use only soft constraints. A soft constraint is a condition the network attempts to fulfill. A hard constraint is one in which the NN automatically satisfies. Furthermore, in a PINNs approach, there is usually a trade-off between satisfying the physics constraint and in accurately generating the desired fields. For example, we consider the well-known physics constraint · B = 0. In a PINNs approach, a cost function term such as | | · B | | is trivially satisfied by B = C for any arbitrary constant vector C, which has absolutely nothing to do with the true magnetic field.

Recently, work has been done on incorporating hard physics constraints within PCNNs, such as the approach applied here to satisfy Maxwell's equations for electrodynamics,14 methods have been developed for physics-constrained Gaussian processes that enforce thermodynamic consistency by limiting the space of functions that are sampled,41 and PCNNs have also been implemented for fluid mechanics.42,43 Although PCNNs have now been applied for electrodynamics and fluid mechanics separately, comparatively little has been done to study their use for MHD.44 We carry out a proof-of-principal study of applying PCNNs to the 2D Orszag–Tang Vortex for predicting both the fluid density and the associated magnetic fields.

Many approaches to utilizing neural networks for physics applications do not incorporate any physics constraints. For example, if a NN is being used to construct an estimate B ̂ of a magnetic field B, typical un-constrained approaches utilize supervised learning in which the neural network's predictions are compared to the correct answer via a cost function such as
C = x , y , z | | B ̂ ( x , y , z ) B ( x , y , z ) | | 2 d xdydz .
(12)
By showing a network only physics simulation-based outputs or measurements, the network is inductively biased toward generating such fields. Typically, such networks can learn to reproduce correct field values relative to the cost function, but wildly violate the physics constraints of which they are unaware.
There are several ways to incorporate physics constraints into NNs. The constraints can be soft when they are added as part of the cost function, which is the typical approach for PINNs. For predicting B ̂, this would be done by utilizing the cost function
C = w 1 x , y , z | | B ̂ ( x , y , z ) B ( x , y , z ) | | 2 d xdydz + w 2 x , y z | · B ̂ ( x , y , z ) | 2 d xdydz ,
(13)
where increasing the weight w2 penalizes the network for generating fields with non-zero divergence. With soft constraints, there is no guarantee that the physics will be respected, and as described above, there is an unfortunate trade-off between w1 and w2 as increasing w2 usually leads the network to predict incorrect constant field values, which trivially have zero divergence.
A way to implement physics-constrained neural networks, PCNNs with hard constraints, was recently developed in Ref. 14, where rather than modeling the magnetic field with the NN directly, one can model the magnetic vector potential, A ( r , t ). Then, the physical constraint is automatically satisfied by construction [i.e., · ( × A ) = 0], up to numerical error. Unlike PINNs, this PCNN approach has no trade-offs between accuracy and constraint violation as it simplifies the cost function back to Eq. (12). In this paper, we implement the 2D version of Ref. 14 with
B ( r , t ) = A ( r , t ) ,
(14)
where A ( r , t ) is a scalar field, which ensures that Eq. (4) is followed.

Another benefit of using CNNs is that the translational symmetry of the physics, formally spatial translational equivariance, can be approximated through the discrete translational symmetry of the CNN architecture.45,46 We can also impose relatively simple hard physics constraints, such as ρ 0 by taking the activation function in the final layer of our network to be a ReLU [i.e., ReLU ( x ) = max ( 0 , x ) 0]. For periodic boundary conditions, padding to enforce spatial periodicity can be used. A summary of physics constraints and how they are enforced in this paper can be found in Table I.

TABLE I.

Various NN physics constraints.

Constraint Implementation Hard/soft
Divergence-free  B ( r , t ) = A ( r , t )  Hard 
Translation equivariance  CNN architecture  Hard 
Non-negativity (e.g., ρ 0 Final layer ReLU  Hard 
Periodic BC's  Padding  Soft 
Partial differential equation  Term in cost function  Soft 
Constraint Implementation Hard/soft
Divergence-free  B ( r , t ) = A ( r , t )  Hard 
Translation equivariance  CNN architecture  Hard 
Non-negativity (e.g., ρ 0 Final layer ReLU  Hard 
Periodic BC's  Padding  Soft 
Partial differential equation  Term in cost function  Soft 
Furthermore, the difference between utilizing NNs as operators vs functions is worth discussing. When using a CNN which utilizes images as inputs and directly maps those to new output images, the NN can be thought of as an operator that is directly mapping the values of a function over a domain D at time t to new values over that domain at t + Δ t,
{ B ( r , t ) | r D R 2 } { B ( r , t + Δ t ) | r D R 2 } ,
as if the network is an operator solving the governing PDE of the dynamics over the domain D. When using NNs to map individual points to values of interest at those points, the NN is acting as a function that is trained over the entire domain D,
( r , t ) B ( r , t + Δ t ) , r D R 2 .
Function-approximating is by far the more popular approach, especially for PINNs, because when using a NN to make such point-wise predictions automatic differentiation can be used to take derivatives of the network's outputs relative to its inputs, and so, it becomes trivially easy to implement any arbitrary PDE as a soft constraint.

The major advantages of using CNNs as operators that map entire images to other images are faster inference and greater generalizability. A trained CNN can map incredibly large images (or volumes) to their subsequent states within microseconds, whereas a function-approximating CNN would have to work its way through many individual points one batch at a time, which greatly slows down both inference and training. Because the CNN sees the entire global image, it can interpolate between various examples, whereas the function-approximating NN memorizes values at individual points and has to re-train for even small variations. The major advantages of using NNs for point-wise function approximation is auto-differentiation and smaller memory requirements because they can be relatively small dense networks that do not need to be able to handle large image or volume tensors.

Most of our NNs have a convolutional encoder–decoder architecture. The baseline architecture along with a description of the layers and parameters used can be seen in Fig. 2. This setup is based on the well-known VGG-style approach, which has been established for deep networks, in which several sets of two convolutional layers with 3 × 3 filters are followed by a max pool layer that is used for the encoder network.47 For some models, modifications to the initial and final layers were made and will be discussed later.

FIG. 2.

The baseline architecture for the convolutional NNs. The wide arrows represent several layers, with the exact meaning shown below the network diagram. The order of operation goes from left to right. The number of filters a convolution layer has is depicted in the box immediately following the arrow, unless stated otherwise.

FIG. 2.

The baseline architecture for the convolutional NNs. The wide arrows represent several layers, with the exact meaning shown below the network diagram. The order of operation goes from left to right. The number of filters a convolution layer has is depicted in the box immediately following the arrow, unless stated otherwise.

Close modal
For training, we incorporated L2 regularization with a cost function of the form
C = 1 N d i = 1 N d ( y i y ̂ i ) 2 + λ r j = 1 N w w j 2 ,
(15)
where yi is the ith data point for the variable being predicted, y ̂ i the model prediction for the variable (we will use the hat ̂ over a variable to denote it is a prediction), wj is the jth weight or bias, and λr is a hyperparameter. Nd and Nw are the number of data points and weights, respectively. The first term in Eq. (15) measures how accurate the model predictions are, while the second term is acting as a L2 regularizer for the network's weights. In the Bayesian framework, it is possible to interpret the second term as coming from having a Gaussian prior on the weights and λr being related to its variance. All the CNNs were created, trained, and tested using Tensorflow.48 For all training sessions, the Adam optimizer was used.49 Detailed model parameters, such as numbers of filters per layer, were chosen by quickly testing various setups on a small subset of the training data.

We had the models perform both forecasting and predicting tasks. For forecasting, the task was given certain input fields at t forecast a certain output field at t + Δ t. First, this was done for ρ and then B. For predicting, we looked at given the v field predicts the B field at the same time step, after which the models' predictions were compared to that of a PINN.

A summary of networks used, constraints, inputs, outputs, and training data is given in Table III.

For this study, several different datasets and ML approaches were used, which are summarized in Table II. Throughout the paper, to quantify the accuracy of our ML-based predictions, we utilized mean squared error (MSE) and structural similarity index (SSIM), as each has its own strengths and limitations. Our formulations of MSE and SSIM are discussed in more detail below. MSE is a popular metric for comparing numerical data, and MSE-based optimization problems are popular because of the many analytical results (such as least squares fits). A limitation of MSE is its sensitivity to small perturbations and to outliers, which becomes especially important for images. Regarding outliers, if even a single pixel of a generated high-resolution image is corrupted, while all other pixels are identical to the true values, a large MSE value will result. Regarding small perturbations, we consider a generated image that is identical to the true image, but the entire image is shifted by just a single pixel. While this difference may be unnoticed by the human eye, it will result in an incredibly high MSE.

TABLE II.

Neural network types and names.

Name Constraints Training data Inputs Outputs
ρ-CNN-rel/all  none  1 MHD  { ρ , p , B , v } / { ρ , v } images (t ρ images ( t + Δ t ) 
ρ-PCNN-rel/all  ρ = ReLU ( y ) 0  1 MHD  { ρ , p , B , v } / { ρ , v } images (t ρ images ( t + Δ t ) 
B-CNN-all  none  1 MHD  { B , v } images  B images ( t + Δ t ) 
B-PCNN-all  · B = · ( × A ) = 0  1 MHD  { B , v } images (t A images ( t + Δ t ) 
B-CNN-res/res-rand  none  1/50 MHD  { B , v } images (t Δ B images ( t + Δ t ) 
B-PCNN-res/res-rand  · B = · ( × A ) = 0  1/50 MHD  { B , v } images (t Δ A images ( t + Δ t ) 
B ( v )-CNN  none  1 MHD  { v } images (t B images (t
B ( v )-PCNN  · B = · ( × A ) = 0  1 MHD  { v } images (t A images (t
B ( v )-PINN  soft  1 MHD  { x , y , v ( x , y ) } points (t B ( x , y ) points (t
Name Constraints Training data Inputs Outputs
ρ-CNN-rel/all  none  1 MHD  { ρ , p , B , v } / { ρ , v } images (t ρ images ( t + Δ t ) 
ρ-PCNN-rel/all  ρ = ReLU ( y ) 0  1 MHD  { ρ , p , B , v } / { ρ , v } images (t ρ images ( t + Δ t ) 
B-CNN-all  none  1 MHD  { B , v } images  B images ( t + Δ t ) 
B-PCNN-all  · B = · ( × A ) = 0  1 MHD  { B , v } images (t A images ( t + Δ t ) 
B-CNN-res/res-rand  none  1/50 MHD  { B , v } images (t Δ B images ( t + Δ t ) 
B-PCNN-res/res-rand  · B = · ( × A ) = 0  1/50 MHD  { B , v } images (t Δ A images ( t + Δ t ) 
B ( v )-CNN  none  1 MHD  { v } images (t B images (t
B ( v )-PCNN  · B = · ( × A ) = 0  1 MHD  { v } images (t A images (t
B ( v )-PINN  soft  1 MHD  { x , y , v ( x , y ) } points (t B ( x , y ) points (t
Mean squared error (MSE) is defined by
MSE ( x , y ) = 1 N c N d i = 1 N d ( x i y i ) 2 ,
(16)
where Nc is the number of components in xi and yi. The second metric is the structural similarity index (SSIM) defined as
SSIM ( x , y ) = ( 2 μ x μ y + c 1 ) ( 2 σ x y + c 2 ) ( μ x 2 + μ y 2 + c 1 ) ( σ x 2 + σ y 2 + c 2 ) ,
(17)
where μx and μy are the means of x and y, respectively; σ x 2 and σ y 2 are the variances of x and y, respectively; σxy is the covariance of x and y; and c 1 = c 2 = 10 4 (their presence is meant to avoid issues that occur if u x 2 + μ y 2 = 0 or σ x 2 + σ y 2 = 0). The SSIM produces values in the range of [ 1 , 1 ]. An SSIM of 1 means that x and y match exactly, while lower values of the SSIM correspond to tensors that are less structurally similar.50 

Compared to MSE, SSIM captures a higher level more wholistic view of images, similarly to what we do with our eyes. In fact, interesting examples have been generated in which the MSE between two images is kept constant, while the SSIM is either maximized or minimized, resulting in images that are completely different or visually almost identical.51 A much more thorough comparison of MSE and SSIM, as well as an interesting discussion and set of examples, has been published.52 We present MSE and SSIM for all of the generated quantities to simultaneously summarize both the pixel-by-pixel and the neighborhood-dependent features of the predictions.

Padding to enforce periodic boundary conditions was attempted. That is, an array is expanded with extra rows and columns so that elements on opposite edges “see” one another. For example, given an array a periodic padding would produce ap,
a = [ 7 24 42 16 ] , a p = [ 16 42 16 42 24 7 24 7 16 42 16 42 24 7 24 7 ] .
(18)
The hope was that by incorporating this in the encoder part of our models that it would help reduce errors occurring at the boundary of our simulation domain. Unfortunately, this only yielded mild error reduction. For models that incorporated taking a curl in their architecture, using periodic padding helped the most. This is why the PCNN models in Sec. V C. still incorporate periodic padding in their model architecture.
The first task was to use 2D images of field values from a single time step t to predict the mass density at a subsequent time step ρ ( t + Δ ) via CNNs. We compared the use of a linear and ReLU output activation functions with ReLU output providing a hard physics constraint of ρ 0. The two models operated according to
ρ - CNN - all : { ρ , p , B , v } ( t ) N N ρ ̂ ( t + Δ t ) , ρ - PCNN - all : { ρ , p , B , v } ( t ) N N ReLU ρ ̂ ( t + Δ t ) .
where N N is our baseline architecture. The other two models are named “rel” because they used only the inputs that should be relevant according to Eq. (1),
ρ - CNN - rel : { ρ , v } ( t ) N N ρ ̂ ( t + Δ t ) , ρ - PCNN - rel : { ρ , v } ( t ) N N ReLU ρ ̂ ( t + Δ t ) .
Note that at each time step, we are inputting the fields on a 128 × 128 spatial grid; thus, a 128 × 128 × n input tensor gets inputted into a network (see Fig. 2).

To create training data, the physics-based MHD simulation ran from t = 0 to t = 1 with Δ t = 8 × 10 4 producing 1250 snapshots. The first 936 snapshot pairs or t [ 0 , 0.7488 ] were used for training of the neural networks and the last 313 snapshot pairs or t [ 0.7488 , 1 ] were used for testing. All ML models used early stopping and were trained using a learning rate of 5 × 10 4. Models were trained for 240 epochs with a batch size starting at 2 and doubling every 60 epochs, as recommended in Ref. 53.

It should be noted that we are training on a turbulent part of the Orszag–Tang vortex and then trying to predict the evolution of a smoother, more uniform part. Predictions of the models and their absolute errors at various times are shown in Fig. 3. The SSIMs and the MSEs of the model predictions on the test set are shown in Fig. 4.

FIG. 3.

The true value of ρ and the model predictions are shown in blue. The absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

FIG. 3.

The true value of ρ and the model predictions are shown in blue. The absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

Close modal
FIG. 4.

The SSIM and MSE on the test set for different models forecasting ρ. The sum of MSE over the test set for each model is displayed next to its label in the MSE plot.

FIG. 4.

The SSIM and MSE on the test set for different models forecasting ρ. The sum of MSE over the test set for each model is displayed next to its label in the MSE plot.

Close modal

The next problem was the use of the B and v field images at a time steps t to forecast the magnetic field at the next time step B ( t + Δ ). The generated magnetic fields should satisfy physics constraints. It is known that not preserving · B = 0 in numerical simulations leads to magnetic monopole numerical artifacts and the Lorentz force no longer being orthogonal to the magnetic field.17 

The same MHD simulation-based data and train/test split were used as in Sec. V A, but this time only the relevant primitive variables suggested by Eq. (3) were used as input. Our first model, B-CNN, is very similar to ρ-CNN-rel with two major differences: one, B-CNN takes in v and B at one time step and output B at the next time step. Two, B-CNN predicts b, the face-averaged magnetic field, at the next time step and then takes the average of b to produce B,
B - CNN : { B , v } ( t ) N N b ̂ ( t + Δ t ) ave . B ̂ ( t + Δ t ) .
(19)
We estimate b as an intermediate step because the discretization of the divergence in Eq. (11) is with respect to b, so it is easiest to compare the divergence of the models' predictions to the simulation's divergence if we predict b in our models. As an added benefit, it performed more accurately than when it was not included.
Our network that encodes the divergence-free constraint as a hard constraint within its architecture, B-PCNN, has a significantly different structure. The B-PCNN predicts the vector potential A, where b = A, and then numerically takes the curl of A with a custom made Tensorflow layer (also known as a lambda layer) and averages the result to produce B,
B - PCNN : { B , v } ( t ) N N A ̂ ( t + Δ t ) b ̂ ( t + Δ t ) ave . B ̂ ( t + Δ t ) .
(20)
Both models used the same hyperparameters that were used for forecasting ρ in the previous section. Comparisons of both models' predictions for Bx and By are shown in Figs. 5 and 6, respectively.
FIG. 5.

Predictions of Bx generated by the B-CNN and B-PCNN models shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

FIG. 5.

Predictions of Bx generated by the B-CNN and B-PCNN models shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

Close modal
FIG. 6.

Predictions of By generated by the B-CNN and B-PCNN models shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

FIG. 6.

Predictions of By generated by the B-CNN and B-PCNN models shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

Close modal

In Fig. 7, the models' adherence to the divergence-free condition is seen in two ways. First, we look at · B, as computed as in Eq. (11), for the models at certain time steps. In terms of · B, only the B-PCNN would generate physical fields (by construction), while the B-CNN created wildly nonphysical fields (by many orders of magnitude). Therefore, we also used a modified less stringent measure of divergence, referred to as relative divergence.17,54

FIG. 7.

Top three rows: stream plots of B for simulation and models (note: the density of lines does not represent magnitude) Bottom four rows: divergence and the logarithm of the relative divergence error [Eq. (21)] for the models. Note that · B was calculated using Eq. (11). In the top three rows, the field lines look similar because both networks are reproducing the magnetic fields accurately in terms of SSIM and MSE. In the bottom four rows, the relative divergence is different by orders of magnitude with only the B-PCNN respecting the physics constraints. Because they are plotted on the same color scales, only the very high relative divergence of the B-CNN is visible while that of the B-PCNN looks blank. Although qualitatively the vector fields of both networks look correct, the divergences of the fields they predict differ by many orders of magnitude.

FIG. 7.

Top three rows: stream plots of B for simulation and models (note: the density of lines does not represent magnitude) Bottom four rows: divergence and the logarithm of the relative divergence error [Eq. (21)] for the models. Note that · B was calculated using Eq. (11). In the top three rows, the field lines look similar because both networks are reproducing the magnetic fields accurately in terms of SSIM and MSE. In the bottom four rows, the relative divergence is different by orders of magnitude with only the B-PCNN respecting the physics constraints. Because they are plotted on the same color scales, only the very high relative divergence of the B-CNN is visible while that of the B-PCNN looks blank. Although qualitatively the vector fields of both networks look correct, the divergences of the fields they predict differ by many orders of magnitude.

Close modal
In relative divergence, the actual divergence at a single point is reduced relative to the amplitude of the field strength at that same point. Intuitively, this can be thought of as taking into account that it is difficult to generate physical fields, which actually satisfy · B = 0, and this violation will grow with field amplitude. Therefore, the following quantity is defined:
ϵ r d = | · B | Δ x Δ y 2 p = | · B | Δ x Δ y 4 / 3 ρ u + | B | 2 ,
(21)
in which the measure of divergence at large field values is attenuated by the field magnitude and fluid pressure.

The denominator in Eq. (21) comes from the idea that the effective magnetic field of interest is B eff = 2 p (recall the magnetic pressure is given by B 2 / 2). In both cases, the divergence of the PCNN was found to be many orders of magnitude lower than that of the CNN, with the lower bound on the PCNN's divergence given by the numerical precision used, which in our case was 10 9 for a float32, as shown in Fig. 8. Note that for both cases, we are using Eq. (10) to estimate the divergences in order for them to be directly comparable to the constraint transport simulation's notion of the divergence-free condition. Also, we let ϵ r d denote the mean of ϵrd taken over the whole grid at fixed time. A comparison of the MSE and SSIM is shown in Fig. 9.

FIG. 8.

The mean of the absolute value of the divergences of B [computed as in Eq. (11)], which we denote as ( 1 / 128 2 ) | | · B | | 1, at each time step. The top graph was created using the true values of b generated by the simulation.

FIG. 8.

The mean of the absolute value of the divergences of B [computed as in Eq. (11)], which we denote as ( 1 / 128 2 ) | | · B | | 1, at each time step. The top graph was created using the true values of b generated by the simulation.

Close modal
FIG. 9.

SSIM and MSE on the test set for the different models forecasting B. Note that the sum of the MSE at all time steps for each model is displayed next to its label in the MSE plot.

FIG. 9.

SSIM and MSE on the test set for the different models forecasting B. Note that the sum of the MSE at all time steps for each model is displayed next to its label in the MSE plot.

Close modal

1. Residual networks

If we consider the finite difference approximation of the derivative for the dynamics of B ( t ), we get
d B ( t + Δ t ) d t B ( t + Δ t ) B ( t ) Δ t = F ( t + Δ t ) .
(22)
For Δ t 1, the subsequent state B ( t + Δ t ) is a small perturbation of the previous state B ( t ) defined as
B ( t + Δ t ) = B ( t ) + Δ t F ( t + Δ t ) .
(23)
Therefore, we consider the use of a residual neural (ResNet) network that is tasked with the goal of predicting Δ t F ( t + Δ t ). The motivation here is that the network's job should be easier because if it now tasked with only generating a very small perturbation of a field rather than the entire field itself. This approach did dramatically improve performance, with the MSE being reduced by four orders of magnitude.
The structure of our residual neural network, which we call B-CNN-res, is as follows:
B - CNN - res :
(24)
{ B , v } ( t ) N N Δ E z ̂ ( t + Δ t ) ave . F ̂ ( t + Δ t ) × Δ t Δ t F ̂ ( t + Δ t ) + B ( t ) B ̂ ( t + Δ t ) ,
(25)
where Δ E z is the change in the electric field at the corners of the cell defined as
Δ E z i j ( t ) = 1 Δ x [ ( E i + 1 2 , j + 1 2 z ( t ) E i + 1 2 , j 1 2 z ( t ) ) E i + 1 2 , j + 1 2 z ( t ) E i 1 2 , j + 1 2 z ( t ) , ]
(26)
where B and Δ E z are related by Faraday's law.
 Appendix C gives a justification for this architecture. Furthermore, we tested a network architecture that predicted a divergence free update step, which we call B-PCNN-res, by again predicting a potential,
B - PCNN - res :
(27)
{ B , v } ( t ) N N A ̂ Δ E z ̂ ( t + Δ t ) ave . F ̂ ( t + Δ t ) × Δ t Δ t F ̂ ( t + Δ t ) + B ( t ) B ̂ ( t + Δ t ) .
(28)
Note that the actual update in the simulation is not divergence free in the discrete sense taken with a forward difference (see  Appendix C for details). However, by predicting a divergence-free update, we hope that our model does better at matching the relative divergence of B in the simulation computed as in Eq. (21) with the exception that now we compute · B as
( · B ) i , j = B i + 1 , j x B i , j x Δ x + B i , j + 1 y B i , j y Δ y .
(29)
We calculate · B in this way because we no longer have access to b as an intermediate step in our model architecture.

The hyperparameters for both models were different from previous sections. The models were trained with a learning rate of 5 × 10 3. The loss function was changed to be the squared error (SE), which is just the MSE without any normalizing constant. Training was run for 90 epochs. The batch size started at 2 and doubled every 30 epochs for a final batch size of 8. Early stopping was not utilized.

Since the change in B from one time step to the next tends to be small, we also made sure that the networks were doing significantly better than predicting zero for F ( t + Δ t ) [i.e., predicting B ̂ ( t + Δ t ) = B ( t )]. This is indeed the case. Predictions of the models and their absolute errors at various times are shown in Figs. 10 and 11. The SSIMs and the MSEs of the model predictions over time on the test set are shown in Fig. 12. The absolute error of the relative divergences of these models is displayed in Fig. 13.

FIG. 10.

Predictions of Bx generated by the ResNet models B-CNN-res and B-PCNN-res shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

FIG. 10.

Predictions of Bx generated by the ResNet models B-CNN-res and B-PCNN-res shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

Close modal
FIG. 11.

Predictions of By generated by the ResNet models B-CNN-res and B-PCNN-res shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

FIG. 11.

Predictions of By generated by the ResNet models B-CNN-res and B-PCNN-res shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.

Close modal
FIG. 12.

SSIM and MSE on the test set for the different ResNet models forecasting B. Note that the sum of the MSE at all time steps for each model is displayed next to its label in the MSE plot. The result of predicting B ̂ ( t + Δ t ) = B ( t ) at each time step is labeled “predicting zero” in each graph.

FIG. 12.

SSIM and MSE on the test set for the different ResNet models forecasting B. Note that the sum of the MSE at all time steps for each model is displayed next to its label in the MSE plot. The result of predicting B ̂ ( t + Δ t ) = B ( t ) at each time step is labeled “predicting zero” in each graph.

Close modal
FIG. 13.

The absolute error between the mean of the relative divergence of the simulation and mean of the relative divergence of the various ResNet models forecasting B for time steps in the test set. The sum of the absolute errors of the relative divergence means over the whole test set for each model is displayed next to its label. Note that · B was calculated as in Eq. (29).

FIG. 13.

The absolute error between the mean of the relative divergence of the simulation and mean of the relative divergence of the various ResNet models forecasting B for time steps in the test set. The sum of the absolute errors of the relative divergence means over the whole test set for each model is displayed next to its label. Note that · B was calculated as in Eq. (29).

Close modal

2. Data augmentation

NNs often require much data to learn properly; thus, often data augmentation is used in their training. Thus, to improve learning and see how well the NNs could generalize, we trained the ResNets on 50 ideal MHD simulations that started with random initial fluid velocity fields and magnetic vector potentials, but kept the same initial pressure and density as in the Orszag–Tang vortex. The NNs were then tested on the Orszag–Tang vortex, which they had never encountered before. Here, we take advantage of the fact that the neural operators work for a wide range of conditions, unlike PINNs which just approximate a particular solution of a PDE.

The 50 physics-based MHD simulations started with random initial v and A fields, where b = A, and B is computed from b via the same averaging operation as in Eq. (10), created by starting with a Gaussian noise field at t = 0 and then applying a Gaussian filter (i.e., a convolution with a normal distribution) to smooth things out, similar to what has been done in other works.55 Wrapping was used in the Gaussian filter to maintain spatial periodicity of the initial conditions. The parameters for the random field generation were chosen so that the values of fields in the initial conditions were of the same order of magnitude as that of the Orszag–Tang vortex's (on average). The simulation was run from t = 0 to 1 and the Orszag–Tang vortex test ran on the same interval. All of the simulations were combined into one training dataset.

Two neural network models were trained on these data: B-CNN-res-rand, which has the same architecture as B-CNN-res, and B-PCNN-res-rand, which has the same architecture as B-PCNN-res. These models were trained for 100 epochs and used a learning rate of 1 × 10 4. The SE was used again as the loss function. The batch size started at 16 and doubled every 25 epochs, with a final batch size of 128. Early stopping was not used.

The result of using these models to predict on the same part of the Orszag–Tang vortex that the other ResNets were using as their test set is displayed in Fig. 12. The absolute error of the relative divergences of these models is displayed in Fig. 13.

3. Surrogate modeling

Our last task for forecasting B was to embed our neural network models inside of the physics-based MHD Orszag–Tang simulation. Specifically, we ran the simulation using constrained transport and finite volume methods until we reached the test set. Then, we switched over to using one of our forecasting models to advance B instead of using constrained transport.

We slightly modified the simulation to be able to better handle predictions that come from a NN. Namely, all models that we tested would eventually cause nonphysical behavior in a term that looked at the difference between the total pressure and magnetic pressure extrapolated to the face of a cell
p f ( B x f ) 2 + ( B y f ) 2 2 ,
(30)
where the superscript f is meant to denote these quantities are being taken at the face of a cell. This extrapolation uses finite differences, and this can lead to errors where Eq. (30) becomes negative (which should not be physically possible). This does not happen with the normal simulation, but always happens when we try to introduce surrogate models into the simulation. The main issue here being that the calculation of a stabilizing diffusive term uses the square root of Eq. (30), which can lead to “not a number” (NaN) errors. Our fix for this was to take the absolute value of Eq. (30).

For this setup, we allowed the simulation to produce data up until where the test set starts. Then, we deactivated the part of the simulation that uses constrained transport to advance B and instead used B-CNN-res-rand or B-PCNN-res-rand. These models would take in v ( t ) and B ( t ) and then produce B ( t + Δ t ). The simulation would then be used to advance all other variables, incorporating B ( t + Δ t ) where relevant.

The B-CNN-res and B-PCNN-res models failed rapidly, quickly making predictions that resulted in the simulation becoming plagued with NaN errors even with the fix discussed above. One of the difficulties faced by this preliminary study of embedding the CNNs within the physics model was that we were using both v ( t ) and B ( t ) as NN inputs, but only predicting B ( t + Δ t ), while v ( t + Δ t ) was generated by the physics model itself. This disconnect resulted in the build-up of nonphysical numerical errors, but was greatly improved by the use of residual networks, which only predict small perturbations. Both the B-CNN-res-rand and B-PCNN-res-rand models were able to carry out the whole simulation without producing NaN errors. These models still ended up predicting some negative values for the cell-centered version of Eq. (30), but this value is not necessary to advance the simulation, so it did not lead to additional problems. The result of this in terms of the MSE and SSIM for all primitive variables is shown in Fig. 14. How the relative divergences differ is shown in Fig. 15. A qualitative comparison of the primitive variables ρ, p, vx, and By is shown in Fig. 16.

FIG. 14.

SSIM and log base 10 of the MSE on the test set for all primitive variables in the simulation produced using B-CNN-res-rand or B-PCNN-res-rand as a surrogate model. Note that the sum of the MSE at all time steps for each primitive variable is displayed next to its label in the MSE plots. Also note that the SSIM was calculated using the same data range for all primitive variables.

FIG. 14.

SSIM and log base 10 of the MSE on the test set for all primitive variables in the simulation produced using B-CNN-res-rand or B-PCNN-res-rand as a surrogate model. Note that the sum of the MSE at all time steps for each primitive variable is displayed next to its label in the MSE plots. Also note that the SSIM was calculated using the same data range for all primitive variables.

Close modal
FIG. 15.

The mean of the relative divergence of the simulation data, and the data generated using surrogate models, for time steps in the test set. The sum of the relative divergence means over the whole test set for each model is displayed next to its label. Note that · B was calculated as in Eq. (29).

FIG. 15.

The mean of the relative divergence of the simulation data, and the data generated using surrogate models, for time steps in the test set. The sum of the relative divergence means over the whole test set for each model is displayed next to its label. Note that · B was calculated as in Eq. (29).

Close modal
FIG. 16.

Qualitative comparisons of the primitive variables ρ, p, vx, and By for data generated by the simulation and the simulation using surrogate models at certain time steps. For this figure, B-CNN-res-rand has be abbreviated to CNN and B-PCNN-res-rand has be abbreviated to PCNN.

FIG. 16.

Qualitative comparisons of the primitive variables ρ, p, vx, and By for data generated by the simulation and the simulation using surrogate models at certain time steps. For this figure, B-CNN-res-rand has be abbreviated to CNN and B-PCNN-res-rand has be abbreviated to PCNN.

Close modal
For predictions, the convolution models had the form
B ( v ) - CNN : { v } ( t ) N N b ̂ ( t ) ave . B ̂ ( t )
(31)
and
B ( v ) - PCNN : { v } ( t ) N N A ̂ ( t ) b ̂ ( t ) ave . B ̂ ( t ) .
(32)
Both models had the same hyperparameters used for the models forecasting ρ. They were tested against B-PINN, a feedforward NN with a leaky ReLU activation ( α = 0.1) and 12 hidden layers with 256 neurons per layer. The model takes time and spatial co-ordinates as inputs and outputs the magnetic field
B - PINN : { t , x , y } N N B ̂ ( t , x , y ) .
(33)
During the fitting procedure for B-PINN, the following term is added to the cost function at Eq. (15)
C PDE = λ PDE 1 N d i = 1 N d | B ̂ i t + · ( B ̂ i v i T v i B ̂ i T ) | 2 ,
(34)
where λPDE is a hyperparameter. Note that in addition to the PDE-based term in the cost, this B-PINN network is also relying on B ( t ) field values produced by the physics-based MHD simulation for its training, as they are part of the cost function defined by Eq. (15). The network was trained with 120 epochs, starting with a learning rate of 2 × 10 3 and halving every 40 epochs. During training, the network attempts to obey the ideal MHD induction equation while also fitting the data and gets punished for large weights. Note that Eq. (3) implies
t · B = 0 .
(35)
If a function obeys Eq. (3) and the divergence-less condition is satisfied as an initial condition, it holds for the entire time. Thus, B-PINN is weakly driven to be divergenceless from both fitting the data and satisfying the PDE.

One advantage of PINNs is that by automatic differentiation, you can compute time/spatial derivatives efficiently and accurately, hence their use in looking for PDE solutions. They are also independent of meshes. On the other hand, unlike the CNNs, there is no translation equivariance built into B-PINN. Also, by approximating a particular solution of a PDE, it is less general than the neural operator, which can, in principle, approximate the solution operator of the PDE, thus solve for any given initial conditions.

The MSE of the models on the test data and ϵ r d is shown in Figs. 17 and 18, respectively. Both the B ( v )-CNN and B ( v )-PCNN have low ϵ r d on the training data, with B ( v )-PCNN matching the simulation nearly exactly from about t = 0.15 to before end of the training data at t = 0.75, then rising for the test data. In contrast, B-PINN had a steady relative divergence error of slightly less than 1% (Fig. 18).

FIG. 17.

MSE on the test set for the different models predicting B ( t ) given v ( t ). The sum of the MSE on the test set for each model shown in the legend.

FIG. 17.

MSE on the test set for the different models predicting B ( t ) given v ( t ). The sum of the MSE on the test set for each model shown in the legend.

Close modal
FIG. 18.

The mean relative divergence error on B predictions for different models at each time step. For the simulation, the volume-averaged B was used, and for the convolutional models, B was used instead of the face valued b, which satisfy Eq. (11), in order to have direct comparisons. The sum of ϵ r d on the test set for each model shown in the legend.

FIG. 18.

The mean relative divergence error on B predictions for different models at each time step. For the simulation, the volume-averaged B was used, and for the convolutional models, B was used instead of the face valued b, which satisfy Eq. (11), in order to have direct comparisons. The sum of ϵ r d on the test set for each model shown in the legend.

Close modal

We investigated ways to improve both the convolutional models and the PINNs. For the CNN and PCNN, we used data augmentation, with the models training on 50 simulations like earlier. Also, v was added as an input to the PINN, with the idea that access to it on the test data may improve predictions. These approaches only increased the error on the test set.

We briefly summarize the above results with a list of all of the various neural networks that were used and the names with which we refer to them, as shown in Table I. A summary of quantitative MSE and SSIM values for the various architectures is given in Table III.

TABLE III.

Summary of results. Bold indicates best performing model according to column metric. “Total” here means the sum over all time steps on the test set.

Forecasting ρ
Model Total MSE ( × 10 2 )
ρ-CNN-all  8.97     
ρ-CNN-rel  9.87     
ρ-PCNN-rel  1.47     
ρ-PCNN-all  3.08     
Forecasting ρ
Model Total MSE ( × 10 2 )
ρ-CNN-all  8.97     
ρ-CNN-rel  9.87     
ρ-PCNN-rel  1.47     
ρ-PCNN-all  3.08     
Forecasting B
Model Total MSE ( × 10 5) Total ( 1 / 128 2 ) | | · B | | 1 Total ϵ r , d ( × 10 4 )
B-CNN  35, 600  2.67 × 10 3  ⋯ 
B-PCNN  73, 700  1.84 × 1 0 5  ⋯ 
B-CNN-res  4.06  ⋯  7.706 
B-PCNN-res  5.21  ⋯  7.712 
B-CNN-res-rand  1.58  ⋯  6.93 
B-PCNN-res-rand  1.66  ⋯  5.38
Forecasting B
Model Total MSE ( × 10 5) Total ( 1 / 128 2 ) | | · B | | 1 Total ϵ r , d ( × 10 4 )
B-CNN  35, 600  2.67 × 10 3  ⋯ 
B-PCNN  73, 700  1.84 × 1 0 5  ⋯ 
B-CNN-res  4.06  ⋯  7.706 
B-PCNN-res  5.21  ⋯  7.712 
B-CNN-res-rand  1.58  ⋯  6.93 
B-PCNN-res-rand  1.66  ⋯  5.38
Predicting B
Model Total MSE Total ϵ r d
B ( v )-CNN  5.64  3.55   
B ( v )-PCNN  4.76  1.28   
B-PINN  1.99  2.58   
Predicting B
Model Total MSE Total ϵ r d
B ( v )-CNN  5.64  3.55   
B ( v )-PCNN  4.76  1.28   
B-PINN  1.99  2.58   

We first studied of the use of neural networks to forecast the mass density ρ of a subsequent time step ( t + Δ t ) based on field measurements from the previous time step (t). In this case, we compared using a CNN without any physics constraints, which takes all fields as input images at time t to predict ρ ( t + Δ t ). We also incorporate a hard physics constraint in enforcing ρ 0 by utilizing a ReLu activation function on the network's output, referred to as ρ-PCNN-all. For density forecasting, we also take into account Eq. (1) and train each of the above networks with only the relevant fields ρ ( t ) and v ( t ). These networks using only relevant data are referred to as ρ-CNN-rel and ρ-PCNN-rel.

As can be seen from Figs. 3 and 4, the models that performed best according to both the MSE and SSIM were the ones that used ReLU activation functions in their final layer. This is likely due to the fact that the ReLU function is enforcing the hard constraint ρ 0; thus, the model does not have to learn this, which facilitates the model learning quicker. We also tested enforcing non-negativity by using a linear output layer and then squaring its output. The MSE on the test set was nearly the same as with using a ReLU final layer. In terms of both MSE and SSIM, it is found that utilizing only the relevant data only slightly improves performance, while adding the hard physics constraint results in a dramatic improvement.

Figures 5 and 6 show the Bx and By fields generated by the B-CNN and B-PCNN relative to the true field values at several time steps. While the predictions of both models are very accurate and almost indistinguishable by eye, only the B-PCNN with hard physics constraints based on the approach in Ref. 14 generated physical fields with zero divergence, while the fields generated by the B-CNN had divergence values higher by a factor of 108, as seen in Figs. 7 and 8.

Zooming in on the absolute differences between true values and predictions in Figs. 5 and 6, a slightly higher error can be seen for the B-PCNN approach, which is quantified in terms of both MSE and SSIM in Fig. 9, which shows that the SSIM of both methods is very high 0.8, while the MSE of both methods is very low < 5 × 10 3. The slightly higher SSIM and slightly smaller MSE of the B-CNN predictions relative to those of the B-PCNN, given the same number of training steps, is due to two major factors.

First, a well-known issue with the B-PCNN approach on a mesh is that boundaries have to be handled carefully and even masked out, which was not done here, as can be seen by the visual artifacts that appear near the boundary of the simulation. This is due to taking finite-difference derivatives at a boundary which sample beyond the boundary (requiring periodic or zero padding). Using periodic padding in the encoder part of the B-PCNN architecture helped somewhat alleviate this, but would have to be carefully applied on the output of the PCNN so that when B is computed as the curl of A artificial numerical errors are not introduced. Second, while the B-CNN has no built in constraints and can generate arbitrary outputs, the B-PCNN is very highly constrained in that in can only generate fields that by construction satisfy the hard physics constraint of · B = 0.

While the accuracy of both model's predictions is very high in terms of both MSE and SSIM, only the B-PCNN generates physical fields (within numerical precision), as can be seen in Figs. 7 and 8. In Fig. 7, the pixel-wise | · B | values of the predicted fields are shown for several times steps on a fixed color scale. On that scale, while large | · B | values are clearly visible for B-CNN predictions, they are almost impossible to see for the B-PCNN predictions. This difference is quantified in Fig. 8, which shows the image-averaged | · B | values for all of the predicted test value time steps. While the B-PCNN | · B | values are 10 8, with a lower bound dictated by the numerical precision of the Tensorflow floating point 32 data type, the B-CNN physics constraint violation is catastrophic at 1, eight orders of magnitude higher.

As discussed above, one way to make this problem much easier for the CNN is to utilize a residual CNN, which takes into account our knowledge that this is a continuous dynamic process. Therefore, the magnetic field at a subsequent time step, B ( t + Δ ), should be close to its previous value B ( t ) for small values of Δ. The new field value should be a small perturbation of the form B ( t + Δ ) = B ( t ) + Δ B ( t ). It was found that by giving the network this strong hint and having it generate Δ B ( t ) directly, the accuracy of both the B-CNN and B-PCNN was greatly improved as seen in terms of MSE in Fig. 12, which decreased by four orders of magnitude.

In terms of · B, as before, only the B-PCNN could generate physical fields. However, because now only small perturbations are being generated by the network and are being added to model-based fields with zero divergence, the overall divergence of the subsequent fields became smaller for both networks as seen by considering the divergence of the ResNet's predicted magnetic field
· B ̂ ( t + Δ t ) = · ( B ( t ) + Δ t F ( t + Δ t ) ) = · B ( t ) 0 + Δ t 1 · F ( t + Δ t ) .
(36)
Because of large divergences of methods without hard physics constraints, researchers also consider the relative divergence quantity as discussed above
ϵ r d = | · B | Δ x Δ y 4 / 3 ρ u + | B | 2 ,
(37)
in which the actual divergence is divided by the field strength, thereby artificially lowering the physics violation for large field values. While not an actual physics constraint, ϵrd is useful in that the relative divergence is much smaller and is a unit-less quantity, which is less sensitive to the magnitude of the field being generated. Taking into account that the B-PCNN is the only network that was able to generate physical fields, in the remainder of the paper, we focus on this relative divergence measure, which is of interest for numerical simulations in which the hard physics constraint · B = 0 is not necessarily maintained, but for which some measure of divergence is still important.

In terms of ϵrd, the residual B-CNN-res and B-PCNN-res were found to have very similar values (besides a few random outliers) as each network easily found a correct small perturbation of the divergence-less B field from the previous time step provided by the simulation, as shown in Fig. 13. The ResNet setup is only predicting small perturbations, which are added to the correct simulation-based values whose divergence is zero, from previous states. Therefore, the overall divergence of the predicted fields will have contributions from only the small perturbations generated by the ResNets. That is why in this case, the “relative divergence” of the PCNN and CNN becomes very similar.

Next, it was shown that training on 50 random simulations, each of which was a slight perturbation of the original Orszag–Tang vortex, and then predicting on data from the Orszag–Tang vortex, yielded models that preformed even better than just training on data from the Orszag–Tang vortex alone, as also seen in Fig. 12. In this case, the networks had a much more rich set of data to learn from. In this residual approach, again with most of the output dominated by the known previous step's divergence-less fields, the relative divergences of both approaches were relatively low.

These results imply that this approach can be useful for a wide range of unseen initial conditions as long as the network is trained over a sufficiently large dataset with a large enough variety of field distributions. Furthermore, such a trained network should be useful for a completely different set of initial conditions if it is first quickly re-trained on just a few of the initial steps.

The only two models we were able to successfully apply to surrogate modeling were B-CNN-res-rand and B-PCNN-res-rand. The use of surrogate models had the greatest effect on B and v with errors increasing rapidly with time as can be seen in Fig. 14. The same trend is reflected in the SSIM. While ρ and p do start to diverge from the simulation as time goes on, this divergence is on a scale two orders of magnitude less than that of v and B.

In terms of relative divergences, B-PCNN-res-rand preformed the best overall.

For this more challenging problem of recursively applying the ML on its own outputs, the B-CNN-res-rand and the B-PCNN models both had high SSIM and low MSE, but in terms of relative divergences, the B-PCNN performance was much better, although after many steps both models diverged. Note that in this case, the use of relative rather than actual divergence makes artificially makes the B-PCNN look bad because, as seen in the bottom three rows of Fig. 16, as the MHD simulation becomes more diffuse over time, the magnitude of the magnetic field decreases significantly, which decreases the denominator or Eq. (37), while the numerator is relatively fixed in the B-PCNN case limited by numerical precision.

The final test of these approaches was to predict the magnetic field at a given time step based only on velocity field measurements from the same time step. Although it would have been possible to do this with a residual network approach using the previous time step's magnetic field, B ( t Δ ), this was not done in order to fully test the network's ability to match both field accuracy and the divergence-less condition, which as mentioned above is otherwise dominated by the correct B field values provided by the simulation. In this case, in addition to the CNNs, a point-wise PINN-based network was also tested.

All three networks performed well with low MSE as shown in Fig. 17, and visually indistinguishable field maps. As before, only the B-PCNN was able to produce divergence-less fields that matched the simulation, as shown in Fig. 18. For the convolutional models, B ( v )-PCNN outperformed B ( v )-CNN in both accuracy and upholding the divergence-less condition. The B-PINN performed better than the convolutional models with respect to MSE on the test data, but generated un-physical data that resulted in the worst violation of the physics constraints as can be seen in Fig. 18, despite having the explicit PDE constraint. Again the only approach able to generate physical field values was the PCNN. Furthermore, the PINN approach is slower to train and slower to generate images as it works on a point-by-point basis rather than seeing and generating entire images at once.

We showed that convolutional neural networks can be used for MHD, how certain physics constraints can be embedded and which ones led to improvements in performance. It was also seen that using ResNets and data augmentation could make a big difference. The forecasting and surrogate modeling tasks show that PCNNs could potentially be helpful to speed up simulations. The prediction task demonstrated that they may be of use in analyzing MHD phenomena when access to certain observables is not possible. Something similar has already been done with PINNs to study the solar coronal magnetic field.56 Deep generative encoder–decoder convolutional neural networks are also being combined with adaptive model-independent feedback to make robust surrogate models that generate all of the 15 2D projections of the 6D phase space of charged particle beams based only on a limited view of a single 2D measurement.57 Adaptive feedback methods have been combined with simulations to provide virtual views of complex dynamics based on limited data,58 and there is potential to develop such an approach for plasma systems by using fast MHD surrogates based on CNNs.

It is worth pointing out that for a fixed neural network architecture, a slightly different result can occur every time that the network is trained using a new random seed and new random initial weights and biases, which are sampled from probability distributions. In this work, we did re-train networks many times and none of the results were significantly changed. Furthermore, it is also true that various architecture choices such as the number of layers, the width of each layer, and the activation functions being used can all slightly change a network's performance. Despite this possibility of small variation, when our networks were trained with various activation functions or levels of regularization, the overall results still held. The overall main point is that the different approaches, independent of the details of the architecture used, have certain limitations, and the B-PCNN approach with hard physics constraints built into the architecture, based on the methods developed in Ref. 14, was the only method that generated physical fields that respected constraints.

Future work could go beyond ideal MHD to incorporating magnetic diffusion and fluid viscosity into the simulation. Also, the spatial resolution could be increased beyond 128 × 128. It was observed that the Orszag–Tang vortex picks up interesting detailed structures at higher resolutions. A systematic investigation into how robust the MHD CNNs/PCNNs and PINN models are to noise would be worth undertaking and inform their use in real world applications. Relatedly, the lack of interpretability of NNs makes uncertainty quantification crucial. The challenge is that many standard techniques would be prohibitively slow given the number of parameters the NNs have, and thus, other methods have been developed.59 Given their different strengths, finding effective ways of combining PCNNs and PINNs for MHD problems may prove quite fruitful. One such approach already investigated was the application of Physics Informed Neural Operators in the form of Fourier neural operators to MHD.44 

It would also be worthwhile to investigate using another type of constrained transport algorithm known as a centered difference scheme (CD).60 CD schemes preserve the following form of the divergence free condition:
( · B ) i , j = B i + 1 , j x B i 1 , j x 2 Δ x + B i , j + 1 y B i , j 1 y 2 Δ y = 0.
(38)
Therefore, such schemes allow us to avoid the added complications of having to predict b as an intermediary step or having to use relative divergence to quantify error. Finally, the forecasting tasks could likely be improved by using by using a recurrent structure, like a LSTM or GRU, at the beginning of the network architectures.

Employing PCNNs to study real-world MHD situations can help speed up simulations, as well as be an effective approach in cases with incomplete information or with problems that are not well-posed. Taking into account the trade-off between ease of training and generating physical fields, the choice of ML with or without hard physics constraints likely depends on the application of the user. If the goal is to simply create a fast-executing surrogate model that provides a fast qualitative view of a system's state, such as a display in a control room that provides intuition for an experimenter, then the use of ML without hard physics constraints may be sufficient with the benefit of faster training. If on the other hand the goal is to develop a physically consistent surrogate model with the goal of speeding up computationally expensive calculations in a physics model while generating physical fields that numerically satisfy all of the constraints, then ML with hard physics constraints should be used.

This work was supported by the Los Alamos National Laboratory Directed Research and Development (LDRD) Program Directed Research (DR) project “Charged Particle Beam Control and Diagnostics using Adaptive Machine Learning,” Project No. 20220074DR.

The authors have no conflicts to disclose.

Ari Bormanis: Conceptualization (equal); Data curation (lead); Formal analysis (equal); Investigation (equal); Writing – original draft (equal); Writing – review & editing (equal). Christopher Leon: Data curation (equal); Formal analysis (equal); Investigation (equal); Writing – original draft (equal); Writing – review & editing (equal). Alexander Scheinker: Conceptualization (lead); Data curation (equal); Formal analysis (equal); Funding acquisition (lead); Investigation (equal); Methodology (lead); Supervision (equal); Writing – original draft (supporting); Writing – review & editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

The 3D MHD equations are often expressed in terms of curls and cross products, which are operations particular to three spatial dimensions. The equations can be put into forms that make generalization to arbitrary dimensions possible. Using Einstein summation, the induction equation in ideal 3D MHD is as follows:
B i t = ( × ( v × B ) ) i = ϵ kij ϵ klm j ( v l B m ) = j ( v i B j B i v j ) = [ · ( v B T B v T ) ] i ,
(A1)
hence Eq. (3). The force density coming from the magnetic field is
f B i = [ ( × B ) × B ] i = i ( B 2 2 ) + B j j B i = i ( B 2 2 ) + j ( B i B j ) = i ( B 2 2 ) + [ · ( B B T ) ] i ,
(A2)
which gives us the magnetic terms in Eq. (2).
1. Finite volume method
In order to maintain conservation of mass, momentum, and density, we will instead consider an equivalent system to Eqs. (1)–(3) recast into “conservative” form, which is to say, a system of the form
U t = · F ( U ) .
(B1)
where U is a vector of conservative variables and F a vector that is some function of them. The conservative variables in this case are ρ, ρ v, ρe, and B, where e is the total specific energy given by
e = u + | v | 2 2 + | B | 2 2 .
(B2)
With these variables, we can derive the equivalent system of PDEs
ρ t = · ( ρ v ) ,
(B3)
t ( ρ v ) = · ( ρ v v T + p I B B T ) ,
(B4)
t ( ρ e ) = · ( ρ e v + p v B ( v · B ) ) ,
(B5)
B t = · ( B v T v B T ) .
(B6)
The finite volume method is used to advance Eqs. (B3)–(B5) on a static grid by viewing each grid point as the center of a cell and keeping track of the cell volume-averaged values of ρ, ρ v, and ρe. The cells update their cell-averaged values by exchanging conservative quantities via fluxes through their faces. This scheme ensures that these conservative quantities are maintained to machine error. However, special care must be taken when advancing B in time to ensure that Gauss's law for magnetism, · B = 0, is also maintained. This necessitates the need of constrained transport (see Sec. III A).
2. Understanding the simulation data

To ensure the simulation is accurate, it must obey the Courant–Friedrichs–Lewy (CFL) condition, which enforces a maximum time step at each update in the simulation. Heuristically, this is the restriction that changes cannot propagate faster than the fastest possible wave speed, here the fast magnetosonic speed. We have modified the simulation to use a constant time step of Δ t = 0.0008 and the reason this time step is so small is to ensure that it never violates the restriction placed on it by the CFL condition for the whole simulation duration.

In constrained transport, we have the following update rules for the face-averaged magnetic field b:
b i + 1 2 , j x , n + 1 = b i + 1 2 , j x , n Δ t E i + 1 2 , j + 1 2 z , n E i + 1 2 , j 1 2 z , n Δ x ,
(C1)
b i , j + 1 2 y , n + 1 = b i , j + 1 2 y , n + Δ t E i + 1 2 , j + 1 2 z , n E i 1 2 , j + 1 2 z , n Δ y ,
(C2)
where Ez is the electric field at the corners of a cell. Note that we use a slightly more compact notation than in the rest of the paper. The time step is now the second superscript. This allows us to rigorously asses the update equation for B. First for Bx,
B i , j x , n + 1 = 1 2 ( b i + 1 2 , j x , n + 1 + b i 1 2 , j x , n + 1 ) = 1 2 ( b i + 1 2 , j x , n Δ t E i + 1 2 , j + 1 2 z , n E i + 1 2 , j 1 2 z , n Δ x ) + 1 2 ( b i 1 2 , j x , n Δ t E i 1 2 , j + 1 2 z , n E i 1 2 , j 1 2 z , n Δ x ) = 1 2 ( b i + 1 2 , j x , n + b i 1 2 , j x , n ) Δ t 2 ( E i + 1 2 , j + 1 2 z , n E i + 1 2 , j 1 2 z , n Δ x ) Δ t 2 ( E i 1 2 , j + 1 2 z , n E i 1 2 , j 1 2 z , n Δ x ) = B i , j x , n Δ t 2 ( Δ j E i + 1 2 z , n + Δ j E i 1 2 z , n ) = B i , j x , n + Δ t F i , j x , n ,
where, to simplify notation, we define the following quantities for any (i, j) in the mesh:
Δ j E i + 1 2 z , n = E i + 1 2 , j + 1 2 z , n E i + 1 2 , j 1 2 z , n Δ x , Δ i E j + 1 2 z , n = E i + 1 2 , j + 1 2 z , n E i 1 2 , j + 1 2 z , n Δ x .
(C3)
So we see that
F i , j x , n = 1 2 ( Δ j E i + 1 2 z , n + Δ j E i 1 2 z , n )
(C4)
and similar calculation shows that
F i , j y , n = 1 2 ( Δ i E j + 1 2 z , n + Δ i E j 1 2 z , n ) .
(C5)
Therefore, what we are predicting with a ResNet is the average of the change in electric fields at the corners of a cell. So, it makes sense that the neural networks that use averaging just before their output layer do better than those that do not. Now, we show that F is not divergence free in the discretized sense when taken as a forward difference
· F = F i + 1 , j x F i , j x Δ x + F i , j + 1 x F i , j x Δ y = 1 2 Δ x ( [ ( Δ j E i + 3 2 z + Δ j E i + 1 2 z ) + ( Δ j E i + 1 2 z + Δ j E i 1 2 z ) ] + [ ( Δ i E j + 3 2 z + Δ i E j + 1 2 z ) ( Δ i E j + 1 2 z , n + Δ i E j 1 2 z ) ] ) = 1 2 Δ x ( [ Δ j E i 1 2 z Δ j E i + 3 2 z ] + [ Δ i E j + 3 2 z Δ i E j 1 2 z ] ) = 1 2 ( Δ x ) 2 ( [ E i 1 2 , j + 1 2 z E i 1 2 , j 1 2 z ] [ E i + 3 2 , j + 1 2 z E i + 3 2 , j 1 2 z ] + [ E i + 1 2 , j + 3 2 z E i 1 2 , j + 3 2 z ] [ E i + 1 2 , j 1 2 z E i 1 2 , j 1 2 z ] ) = 1 2 ( Δ x ) 2 ( [ E i 1 2 , j + 1 2 z + E i + 3 2 , j 1 2 z + E i + 1 2 , j + 3 2 z ] [ E i 1 2 , j + 3 2 z + E i + 3 2 , j + 1 2 z + E i + 1 2 , j 1 2 z ] ) .
1.
H.
Alfvén
, “
Existence of electromagnetic-hydrodynamic waves
,”
Nature
150
,
405
406
(
1942
).
2.
W. H.
Matthaeus
, “
Turbulence in space plasmas: Who needs it?
,”
Phys. Plasmas
28
,
032306
(
2021
).
3.
J.
Cho
and
A.
Lazarian
, “
Compressible sub-Alfvénic MHD turbulence in low-β plasmas
,”
Phys. Rev. Lett.
88
,
245001
(
2002
).
4.
F.
Haas
, “
A magnetohydrodynamic model for quantum plasmas
,”
Phys. Plasmas
12
,
062117
(
2005
).
5.
S. Q.
Korving
,
G. T. A.
Huijsmans
,
J.-S.
Park
,
A.
Loarte
, and
JOREK Team
, “
Development of the neutral model in the nonlinear MHD code JOREK: Application to E × B drifts in ITER PFPO-1 plasmas
,”
Phys. Plasmas
30
,
042509
(
2023
).
6.
F.
Orain
,
M.
Bécoulet
,
G.
Dif-Pradalier
,
G.
Huijsmans
,
S.
Pamela
,
E.
Nardon
,
C.
Passeron
,
G.
Latu
,
V.
Grandgirard
,
A.
Fil
et al, “
Non-linear magnetohydrodynamic modeling of plasma response to resonant magnetic perturbations
,”
Phys. Plasmas
20
,
102510
(
2013
).
7.
S. C.
Jardin
,
N. M.
Ferraro
,
W.
Guttenfelder
,
S. M.
Kaye
, and
S.
Munaretto
, “
Ideal MHD limited electron temperature in spherical tokamaks
,”
Phys. Rev. Lett.
128
,
245001
(
2022
).
8.
V. A.
Izzo
,
D. G.
Whyte
,
R. S.
Granetz
,
P. B.
Parks
,
E. M.
Hollmann
,
L. L.
Lao
, and
J. C.
Wesley
, “
Magnetohydrodynamic simulations of massive gas injection into Alcator C-Mod and DIII-D plasmas
,”
Phys. Plasmas
15
,
056109
(
2008
).
9.
W. W.
Heidbrink
, “
Basic physics of Alfvén instabilities driven by energetic particles in toroidally confined plasmas
,”
Phys. Plasmas
15
,
055501
(
2008
).
10.
A. V.
Arefiev
and
B. N.
Breizman
, “
Magnetohydrodynamic scenario of plasma detachment in a magnetic nozzle
,”
Phys. Plasmas
12
,
043504
(
2005
).
11.
S.
Smolentsev
, “
Physical background, computations and practical issues of the magnetohydrodynamic pressure drop in a fusion liquid metal blanket
,”
Fluids
6
,
110
(
2021
).
12.
A.
Gill
,
M.
Nornberg
,
H.
Ji
, and
J. L.
Peterson
, “
Development of a diagnostic array for the measurement of velocity profiles across open-channel liquid metal flows
,” in
APS Division of Plasma Physics Meeting Abstracts
,
2007
, Vol.
49
.
13.
S.
Baranov
, “
MHD method of measuring high-power ion beam parameters
,”
Sov. J. Plasma Phys.
17
,
156
157
(
1991
).
14.
A.
Scheinker
and
R.
Pokharel
, “
Physics-constrained 3D convolutional neural networks for electrodynamics
,”
APL Mach. Learn.
1
,
026109
(
2023
).
15.
S. A.
Orszag
and
C.-M.
Tang
, “
Small-scale structure of two-dimensional magnetohydrodynamic turbulence
,”
J. Fluid Mech.
90
,
129
143
(
1979
).
16.
J. M.
Picone
and
R. B.
Dahlburg
, “
Evolution of the Orszag–Tang vortex system in a compressible medium. II. Supersonic flow
,”
Phys. Fluids B
3
,
29
44
(
1991
).
17.
P.
Mocz
,
M.
Vogelsberger
, and
L.
Hernquist
, “
A constrained transport scheme for MHD on unstructured static and moving meshes
,”
Mon. Not. R. Astron. Soc.
442
,
43
55
(
2014
).
18.
B.
Snow
,
A.
Hillier
,
G.
Murtas
, and
G. J. J.
Botha
, “
Shock identification and classification in 2D magnetohydrodynamic compressible turbulence–Orszag–Tang vortex
,”
Exp. Results
2
,
e35
(
2021
).
19.
J.
Choi
,
R.
Wang
,
R. M.
Churchill
,
R.
Kube
,
M.
Choi
,
J.
Park
,
J.
Logan
,
K.
Mehta
,
G.
Eisenhauer
,
N.
Podhorszki
et al, “
Data federation challenges in remote near-real-time fusion experiment data processing
,” in
Driving Scientific and Engineering Discoveries through the Convergence of HPC, Big Data and AI: 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020, Oak Ridge, TN, USA, August 26-28, 2020
(
Springer
,
2020
), pp.
285
299
.
20.
R.
Kube
,
R. M.
Churchill
,
C. S.
Chang
,
J.
Choi
,
R.
Wang
,
S.
Klasky
,
L.
Stephey
,
E.
Dart
, and
M. J.
Choi
, “
Near real-time streaming analysis of big fusion data
,”
Plasma Phys. Controlled Fusion
64
,
035015
(
2022
).
21.
A.
Scheinker
and
D.
Scheinker
, “
Bounded extremum seeking with discontinuous dithers
,”
Automatica
69
,
250
257
(
2016
).
22.
A.
Scheinker
,
A.
Edelen
,
D.
Bohler
,
C.
Emma
, and
A.
Lutman
, “
Demonstration of model-independent control of the longitudinal phase space of electron beams in the Linac-Coherent Light Source with femtosecond resolution
,”
Phys. Rev. Lett.
121
,
044801
(
2018
).
23.
J. D.
Jackson
,
Classical Electrodynamics
, 2nd ed. (
Wiley
,
1975
).
24.
P.
Mocz
,
R.
Pakmor
,
V.
Springel
,
M.
Vogelsberger
,
F.
Marinacci
, and
L.
Hernquist
, “
A moving mesh unstaggered constrained transport scheme for magnetohydrodynamics
,”
Mon. Not. R. Astron. Soc.
463
,
477
488
(
2016
).
25.
K. T.
McDonald
, “
Electrodynamics in 1 and 2 spatial dimensions
” (unpublished) (2019).
26.
D.
Boito
,
L. N. S.
de Andrade
,
G.
de Sousa
,
R.
Gama
, and
C. Y. M.
London
, “
On Maxwell's electrodynamics in two spatial dimensions
,”
Rev. Bras. Ensino Fís.
42
,
3
9
(
2020
).
27.
That is, an ideal gas with atoms having only 2 translational degrees of freedom and the gas has constant heat capacities for both constant volume and constant pressure.
28.
C. R.
Evans
and
J. F.
Hawley
, “
Simulation of magnetohydrodynamic flows: A constrained transport method
,”
Astrophys. J.
332
,
659
677
(
1988
).
29.
K.
Hornik
,
M.
Stinchcombe
, and
H.
White
, “
Multilayer feedforward networks are universal approximators
,”
Neural Networks
2
,
359
366
(
1989
).
30.
S.
Park
,
C.
Yun
,
J.
Lee
, and
J.
Shin
, “
Minimum width for universal approximation
,” in
International Conference on Learning Representations
, 2020.
31.
T.
Chen
and
H.
Chen
, “
Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems
,”
IEEE Trans. Neural Networks
6
,
911
917
(
1995
).
32.
J. L.
Peterson
,
B.
Bay
,
J.
Koning
,
P.
Robinson
,
J.
Semler
,
J.
White
,
R.
Anirudh
,
K.
Athey
,
P.-T.
Bremer
,
F.
Di Natale
et al, “
Enabling machine learning-ready HPC ensembles with Merlin
,”
Future Gener. Comput. Syst.
131
,
255
268
(
2022
).
33.
M.
Leshno
,
V. Y.
Lin
,
A.
Pinkus
, and
S.
Schocken
, “
Multilayer feedforward networks with a nonpolynomial activation function can approximate any function
,”
Neural Networks
6
,
861
867
(
1993
).
34.
R.
Anirudh
,
R.
Archibald
,
M. S.
Asif
,
M. M.
Becker
,
S.
Benkadda
,
P.-T.
Bremer
,
R. H. S.
Budé
,
C. S.
Chang
,
L.
Chen
,
R. M.
Churchill
,
J.
Citrin
,
J. A.
Gaffney
,
A.
Gainaru
,
W.
Gekelman
,
T.
Gibbs
,
S.
Hamaguchi
,
C.
Hill
,
K.
Humbird
,
S.
Jalas
,
S.
Kawaguchi
,
G.-H.
Kim
,
M.
Kirchen
,
S.
Klasky
,
J. L.
Kline
,
K.
Krushelnick
,
B.
Kustowski
,
G.
Lapenta
,
W.
Li
,
T.
Ma
,
N. J.
Mason
,
A.
Mesbah
,
C.
Michoski
,
T.
Munson
,
I.
Murakami
,
H. N.
Najm
,
K. E. J.
Olofsson
,
S.
Park
,
J. L.
Peterson
,
M.
Probst
,
D.
Pugmire
,
B.
Sammuli
,
K.
Sawlani
,
A.
Scheinker
,
D. P.
Schissel
,
R. J.
Shalloo
,
J.
Shinagawa
,
J.
Seong
,
B. K.
Spears
,
J.
Tennyson
,
J.
Thiagarajan
,
C. M.
Ticoş
,
J.
Trieschmann
,
J.
van Dijk
,
B.
Van Essen
,
P.
Ventzek
,
H.
Wang
,
J. T. L.
Wang
,
Z.
Wang
,
K.
Wende
,
X.
Xu
,
H.
Yamada
,
T.
Yokoyama
, and
X.
Zhang
, “
2022 review of data-driven plasma science
,”
IEEE Trans. Plasma Sci.
51
,
1750
(
2023
).
35.
A. D.
Bonzanini
,
K.
Shao
,
D. B.
Graves
,
S.
Hamaguchi
, and
A.
Mesbah
, “
Foundations of machine learning for low-temperature plasmas: Methods and case studies
,”
Plasma Sources Sci. Technol.
32
,
024003
(
2023
).
36.
K. L.
Mentzer
and
J. L.
Peterson
, “
Neural network surrogate models for equations of state
,”
Phys. Plasmas
30
,
032704
(
2023
).
37.
K. D.
Humbird
and
J. L.
Peterson
, “
Transfer learning driven design optimization for inertial confinement fusion
,”
Phys. Plasmas
29
,
102701
(
2022
).
38.
J. A.
Gaffney
,
K.
Humbird
,
M.
Kruse
,
E.
Kur
,
B.
Kustowski
,
R.
Nora
, and
B.
Spears
, “
Iterative sampling of expensive simulations for faster deep surrogate training
,”
Contrib. Plasma Phys.
63
,
e202200190
(
2023
).
39.
J.
Lim
and
D.
Psaltis
, “
MaxwellNet: Physics-driven deep neural network training based on Maxwell's equations
,”
APL Photonics
7
,
011301
(
2022
).
40.
M.
Raissi
,
A.
Yazdani
, and
G. E.
Karniadakis
, “
Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations
,”
Science
367
,
1026
1030
(
2020
).
41.
J. A.
Gaffney
,
L.
Yang
, and
S.
Ali
, “
Constraining model uncertainty in plasma equation-of-state models with a physics-constrained Gaussian process
,” arXiv:2207.00668 (
2022
).
42.
D.
Tretiak
,
A. T.
Mohan
, and
D.
Livescu
, “
Physics-constrained generative adversarial networks for 3D turbulence
,” arXiv:2212.00217 (
2022
).
43.
A. T.
Mohan
,
N.
Lubbers
,
M.
Chertkov
, and
D.
Livescu
, “
Embedding hard physical constraints in neural network coarse-graining of three-dimensional turbulence
,”
Phys. Rev. Fluids
8
,
014604
(
2023
).
44.
S. G.
Rosofsky
and
E. A.
Huerta
, “
Magnetohydrodynamics with physics informed neural operators
,”
Mach. Learn.
4
,
035002
(
2023
).
45.
R.
Wang
, “
Incorporating symmetry into deep dynamics models for improved generalization
,” in
International Conference on Learning Representations (ICLR)
,
2021
.
46.
A.
Bogatskiy
,
S.
Ganguly
,
T.
Kipf
,
R.
Kondor
,
D. W.
Miller
,
D.
Murnane
,
J. T.
Offermann
,
M.
Pettee
,
P.
Shanahan
,
C.
Shimmin
et al, “
Symmetry group equivariant architectures for physics
,” arXiv:2203.06153 (
2022
).
47.
K.
Simonyan
and
A.
Zisserman
, “
Very deep convolutional networks for large-scale image recognition
,” arXiv:1409.1556 (
2014
).
48.
M.
Abadi
,
A.
Agarwal
,
P.
Barham
,
E.
Brevdo
,
Z.
Chen
,
C.
Citro
,
G. S.
Corrado
,
A.
Davis
,
J.
Dean
,
M.
Devin
,
S.
Ghemawat
,
I.
Goodfellow
,
A.
Harp
,
G.
Irving
,
M.
Isard
,
Y.
Jia
,
R.
Jozefowicz
,
L.
Kaiser
,
M.
Kudlur
,
J.
Levenberg
,
D.
Mané
,
R.
Monga
,
S.
Moore
,
D.
Murray
,
C.
Olah
,
M.
Schuster
,
J.
Shlens
,
B.
Steiner
,
I.
Sutskever
,
K.
Talwar
,
P.
Tucker
,
V.
Vanhoucke
,
V.
Vasudevan
,
F.
Viégas
,
O.
Vinyals
,
P.
Warden
,
M.
Wattenberg
,
M.
Wicke
,
Y.
Yu
, and
X.
Zheng
, “
TensorFlow: Large-scale machine learning on heterogeneous systems
,” arXiv:1603.04467 (
2015
).
49.
D. P.
Kingma
and
J.
Ba
, “
Adam: A method for stochastic optimization
,” arXiv:1412.6980 (
2014
).
50.
Z.
Wang
,
A. C.
Bovik
,
H. R.
Sheikh
, and
E. P.
Simoncelli
, “
Image quality assessment: From error visibility to structural similarity
,”
IEEE Trans. Image Process.
13
,
600
612
(
2004
).
51.
Z.
Wang
and
E. P.
Simoncelli
, “
Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities
,”
J. Vision
8
,
8
8
(
2008
).
52.
Z.
Wang
and
A. C.
Bovik
, “
Mean squared error: Love it or leave it? a new look at signal fidelity measures
,”
IEEE Signal Process. Mag.
26
,
98
117
(
2009
).
53.
S. L.
Smith
,
P.-J.
Kindermans
, and
Q. V.
Le
, “
Don't decay the learning rate, increase the batch size
,” in
International Conference on Learning Representations
,
2018
.
54.
M.
Zhang
and
X.
Feng
, “
A comparative study of divergence cleaning methods of magnetic field in the solar coronal numerical simulation
,”
Front. Astron. Space Sci.
3
,
6
(
2016
).
55.
S. G.
Rosofsky
,
H.
Al Majed
, and
E. A.
Huerta
, “
Applications of physics informed neural operators
,”
Mach. Learn.
4
,
025022
(
2023
).
56.
R.
Jarolim
,
J. K.
Thalmann
,
A. M.
Veronig
, and
T.
Podladchikova
, “
Probing the solar coronal magnetic field with physics-informed neural networks
,”
Nat. Astron.
7
,
1171
1179
(
2023
).
57.
A.
Scheinker
,
F.
Cropp
, and
D.
Filippetto
, “
Adaptive autoencoder latent space tuning for more robust machine learning beyond the training set for six-dimensional phase space diagnostics of a time-varying ultrafast electron-diffraction compact accelerator
,”
Phys. Rev. E
107
,
045302
(
2023
).
58.
A.
Scheinker
and
S.
Gessner
, “
Adaptive method for electron bunch profile prediction
,”
Phys. Rev. Spec. Top. Accel. Beams
18
,
102801
(
2015
).
59.
A. F.
Psaros
,
X.
Meng
,
Z.
Zou
,
L.
Guo
, and
G. E.
Karniadakis
, “
Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons
,”
J. Comput. Phys.
477
,
111902
(
2023
).
60.
G.
Tóth
, “
The · B = 0 constraint in shock-capturing magnetohydrodynamics codes
,”
J. Comput. Phys.
161
,
605
652
(
2000
).