We study the 2D Orszag–Tang vortex magnetohydrodynamics (MHD) problem through the use of physics-constrained convolutional neural networks (PCNNs) for forecasting the density, ρ, and the magnetic field, B, as well as the prediction of B given the velocity field v of the fluid. In addition to translation equivariance from the convolutional architecture, other physics constraints were embedded: absence of magnetic monopoles, non-negativity of ρ, use of only relevant variables, and the periodic boundary conditions of the problem. The use of only relevant variables and the hard constraint of non-negative ρ were found to facilitate learning greatly. The divergenceless condition was implemented as a hard constraint up to machine precision through the use of a magnetic potential to define . Residual networks and data augmentation were also used to improve performance. This allowed for some of the residual models to function as surrogate models and provide reasonably accurate simulations. For the prediction task, the PCNNs were evaluated against a physics-informed neural network, which had the ideal MHD induction equation as a soft constraint. Several models were able to generate highly accurate fields, which are visually almost indistinguishable and have low mean squared error. Only methods with built-in hard constraints produced physical fields with . The use of PCNNs for MHD has the potential to produce physically consistent real-time simulations to serve as virtual diagnostics in cases where inferences must be made with limited observables.
I. INTRODUCTION
Magnetohydrodynamics (MHD) describes the low-frequency dynamics of a plasma, which is modeled as an electrically conducting fluid influenced by electromagnetic fields.1 MHD is used to describe a wide range of systems including turbulence in plasmas,2,3 quantum plasmas,4 Tokamaks,5–9 plasma detachment in magnetic nozzles,10 liquid metals,11,12 and intense ion beams in particle accelerators.13 MHD simulations are complex, describing systems with many evolving nonlinear fields and can be computationally expensive.
In this work, we utilize physics constrained neural networks (PCNNs) via a novel neural operator approach for building hard physics constraints into convolutional neural networks (CNNs),14 for the study of MHD. We focus on the Orszag–Tang vortex,15 a widely used test case for 2D MHD simulations.16–18 We generate synthetic MHD data using physics simulations and test various neural network-based methods for creating surrogate models of the MHD simulation, which are able to quickly predict the evolution of the data, such as using magnetic field and charge density values at one time step to predict those fields at subsequent time steps. We also test incorporating neural network-based methods as solvers within the physics model to speed up computationally expensive parts of the MHD simulation.
The use of PCNNs has the potential to provide physically consistent and accurate surrogates of MHD simulations, which run orders of magnitude faster than traditional approaches. Complex plasma processes could benefit from real-time data processing19 and real-time data analysis.20 The PCNN-based methods studied in this work, although not as accurate as physics models, are incredibly fast and therefore may be useful for real-time applications such as finding approximate solutions to inverse problems for operational systems with limited direct observations by providing a good initial guess, which is refined by computationally expensive physics models. Such ML-based methods, combined with robust model-independent control algorithms,21 as was demonstrated for intense electron beams in the LCLS free electron laser,22 have the potential to provide real-time feedback control for complex plasma physics systems.
A. Ideal MHD and the Orszag–Tang vortex
For what follows, we consider a monatomic, perfect gas.27 In this case, the equation of state becomes , where u is the gas's thermal energy density per mass and γ is the gas adiabatic index. We follow the example of previous works16,17 and let to have better analogy to the 3D case.
For this work, Orszag–Tang vortex data were generated using a physics simulation that utilizes a finite volume method together with constrained transport on a static mesh as described in Refs. 17 and 28. The physics code is used to generate large data sets, which include the temporal evolution of the fields in Eqs. (1)–(4). We then study various neural network-based approaches for reproducing these fields and for capturing their dynamics, such as using field values at one time to predict their values at a subsequent time step .
For most of this work, we utilize convolutional neural networks (CNN), which take 128 × 128 pixel images as inputs. CNNs are used for this image-based ML because they have been demonstrated to have the best performance for image processing and image generation. When generating a 2D image, such as the mass density at time t for a 128 × 128 grid of values of (x, y), the CNN generates the entire image in one shot, and therefore, when referring to CNN-generated values such as , we simply write . In the one case where we use a dense neural network to map out individual values at individual (x, y) positions, we write out .
II. SUMMARY OF RESULTS
For accurate predictions, one would expect low mean squared error (MSE) and high structural similarity (SSIM) (SSIM values are between −1 and 1 with a value of 1 for images that are exactly the same) between true and predicted images. In all of the results in this work, it was found that the MSE and SSIM were consistent; that is, as MSE increased, SSIM decreased and models with lower MSE also had higher SSIM values relative to each other. However, the sensitivity of MSE and SSIM was, as expected, very different. For example, Fig. 3 shows the true and predicted mass density ρ as predicted by several of the ML models at various time steps. Visually, the blue ρ predictions of all of the models look similar, but they clearly have very different levels of error as shown in the red sub-plots. In Fig. 4, this is numerically summarized by the MSE and SSIM. While all SSIM values remain within roughly 10% of each other, the MSE values differ by up to 1000%.
For the relatively simple problem of predicting mass density ρ, the networks having a simple positivity constraint of enforced with a ReLU output function had the best performance. For the more challenging problem of predicting the magnetic field B, some trade-offs were found between how easy it was to train a network in terms of MSE or SSIM accuracy of the B field prediction vs constraint violation. Networks with no constraints or only soft constraints were able to minimize a given cost function faster (in fewer training steps) than those with hard physics constraints, but produced nonphysical fields. This result is not very surprising because unconstrained networks (including PINNs, networks where the constraint is soft as part of the cost function) are free to generate completely arbitrary functions over the largest space of possibilities. On the other hand, the networks with the hard physics constraint of , which was strictly enforced by generating , were the most restricted in terms of the families of functions that they could generate. Of all of the approaches considered, only the networks with built-in hard constraints generated physical fields (within numerical precision).
III. SIMULATED DATA
The simulated Orszag–Tang vortex data were generated using the finite volume method and constrained transport on a static mesh.17,28 The fluid is simulated over a unit square subject to spatial periodic boundary conditions. The procedure makes use of conserved quantities while also maintaining a discretization of the divergence-free condition, . Furthermore, details of the simulation can be found in Appendix B.
Some models that appear later will mimic the constrained transport of the simulation; thus, we discuss briefly how it functions.
A. Constrained transport
B. Adding physics to neural networks
It is well known that deep neural networks can represent both functions and operators with arbitrary accuracy.29–31 Deep neural networks are now being utilized for a wide range of scientific research due to the availability of incredibly powerful high-performance computing,32 combined with their ability to represent complex functions over compact sets with the representation being learned directly from high dimensional data, such as large images.33 For example, in the field of plasma physics, deep neural networks have been applied to a wide range of problems,34,35 including the use of neural networks as surrogates for equations of state.36 Neural networks can take advantage of transfer learning for the design optimization of internal confinement fusion,37 and deep surrogate models can take advantage of intelligent iterative sampling of expensive simulations for faster training.38
Several methods have been developed in an effort to incorporate physics within neural networks. The most general and most popular approach by far is that of physics informed neural networks (PINNs). In a PINNs approach, additional terms are added to the cost function used to train a neural network, typically in the form of a PDE, which describes the underlying physics such as electrodynamics39 and fluid mechanics.40
Although very general, the main limitations of PINNs are a lack of any guarantee of respecting the physics as they use only soft constraints. A soft constraint is a condition the network attempts to fulfill. A hard constraint is one in which the NN automatically satisfies. Furthermore, in a PINNs approach, there is usually a trade-off between satisfying the physics constraint and in accurately generating the desired fields. For example, we consider the well-known physics constraint . In a PINNs approach, a cost function term such as is trivially satisfied by for any arbitrary constant vector C, which has absolutely nothing to do with the true magnetic field.
Recently, work has been done on incorporating hard physics constraints within PCNNs, such as the approach applied here to satisfy Maxwell's equations for electrodynamics,14 methods have been developed for physics-constrained Gaussian processes that enforce thermodynamic consistency by limiting the space of functions that are sampled,41 and PCNNs have also been implemented for fluid mechanics.42,43 Although PCNNs have now been applied for electrodynamics and fluid mechanics separately, comparatively little has been done to study their use for MHD.44 We carry out a proof-of-principal study of applying PCNNs to the 2D Orszag–Tang Vortex for predicting both the fluid density and the associated magnetic fields.
IV. PHYSICS-CONSTRAINED NEURAL NETWORKS
Another benefit of using CNNs is that the translational symmetry of the physics, formally spatial translational equivariance, can be approximated through the discrete translational symmetry of the CNN architecture.45,46 We can also impose relatively simple hard physics constraints, such as by taking the activation function in the final layer of our network to be a ReLU [i.e., ReLU]. For periodic boundary conditions, padding to enforce spatial periodicity can be used. A summary of physics constraints and how they are enforced in this paper can be found in Table I.
Various NN physics constraints.
Constraint . | Implementation . | Hard/soft . |
---|---|---|
Divergence-free | Hard | |
Translation equivariance | CNN architecture | Hard |
Non-negativity (e.g., ) | Final layer ReLU | Hard |
Periodic BC's | Padding | Soft |
Partial differential equation | Term in cost function | Soft |
Constraint . | Implementation . | Hard/soft . |
---|---|---|
Divergence-free | Hard | |
Translation equivariance | CNN architecture | Hard |
Non-negativity (e.g., ) | Final layer ReLU | Hard |
Periodic BC's | Padding | Soft |
Partial differential equation | Term in cost function | Soft |
The major advantages of using CNNs as operators that map entire images to other images are faster inference and greater generalizability. A trained CNN can map incredibly large images (or volumes) to their subsequent states within microseconds, whereas a function-approximating CNN would have to work its way through many individual points one batch at a time, which greatly slows down both inference and training. Because the CNN sees the entire global image, it can interpolate between various examples, whereas the function-approximating NN memorizes values at individual points and has to re-train for even small variations. The major advantages of using NNs for point-wise function approximation is auto-differentiation and smaller memory requirements because they can be relatively small dense networks that do not need to be able to handle large image or volume tensors.
A. NN architecture and training
Most of our NNs have a convolutional encoder–decoder architecture. The baseline architecture along with a description of the layers and parameters used can be seen in Fig. 2. This setup is based on the well-known VGG-style approach, which has been established for deep networks, in which several sets of two convolutional layers with 3 × 3 filters are followed by a max pool layer that is used for the encoder network.47 For some models, modifications to the initial and final layers were made and will be discussed later.
The baseline architecture for the convolutional NNs. The wide arrows represent several layers, with the exact meaning shown below the network diagram. The order of operation goes from left to right. The number of filters a convolution layer has is depicted in the box immediately following the arrow, unless stated otherwise.
The baseline architecture for the convolutional NNs. The wide arrows represent several layers, with the exact meaning shown below the network diagram. The order of operation goes from left to right. The number of filters a convolution layer has is depicted in the box immediately following the arrow, unless stated otherwise.
B. Cases examined
We had the models perform both forecasting and predicting tasks. For forecasting, the task was given certain input fields at t forecast a certain output field at . First, this was done for ρ and then B. For predicting, we looked at given the v field predicts the B field at the same time step, after which the models' predictions were compared to that of a PINN.
A summary of networks used, constraints, inputs, outputs, and training data is given in Table III.
V. RESULTS
For this study, several different datasets and ML approaches were used, which are summarized in Table II. Throughout the paper, to quantify the accuracy of our ML-based predictions, we utilized mean squared error (MSE) and structural similarity index (SSIM), as each has its own strengths and limitations. Our formulations of MSE and SSIM are discussed in more detail below. MSE is a popular metric for comparing numerical data, and MSE-based optimization problems are popular because of the many analytical results (such as least squares fits). A limitation of MSE is its sensitivity to small perturbations and to outliers, which becomes especially important for images. Regarding outliers, if even a single pixel of a generated high-resolution image is corrupted, while all other pixels are identical to the true values, a large MSE value will result. Regarding small perturbations, we consider a generated image that is identical to the true image, but the entire image is shifted by just a single pixel. While this difference may be unnoticed by the human eye, it will result in an incredibly high MSE.
Neural network types and names.
Name . | Constraints . | Training data . | Inputs . | Outputs . |
---|---|---|---|---|
ρ-CNN-rel/all | none | 1 MHD | images (t) | ρ images |
ρ-PCNN-rel/all | 1 MHD | images (t) | ρ images | |
B-CNN-all | none | 1 MHD | images | B images |
B-PCNN-all | 1 MHD | images (t) | A images | |
B-CNN-res/res-rand | none | 1/50 MHD | images (t) | images |
B-PCNN-res/res-rand | 1/50 MHD | images (t) | images | |
-CNN | none | 1 MHD | images (t) | B images (t) |
-PCNN | 1 MHD | images (t) | A images (t) | |
-PINN | soft | 1 MHD | points (t) | points (t) |
Name . | Constraints . | Training data . | Inputs . | Outputs . |
---|---|---|---|---|
ρ-CNN-rel/all | none | 1 MHD | images (t) | ρ images |
ρ-PCNN-rel/all | 1 MHD | images (t) | ρ images | |
B-CNN-all | none | 1 MHD | images | B images |
B-PCNN-all | 1 MHD | images (t) | A images | |
B-CNN-res/res-rand | none | 1/50 MHD | images (t) | images |
B-PCNN-res/res-rand | 1/50 MHD | images (t) | images | |
-CNN | none | 1 MHD | images (t) | B images (t) |
-PCNN | 1 MHD | images (t) | A images (t) | |
-PINN | soft | 1 MHD | points (t) | points (t) |
Compared to MSE, SSIM captures a higher level more wholistic view of images, similarly to what we do with our eyes. In fact, interesting examples have been generated in which the MSE between two images is kept constant, while the SSIM is either maximized or minimized, resulting in images that are completely different or visually almost identical.51 A much more thorough comparison of MSE and SSIM, as well as an interesting discussion and set of examples, has been published.52 We present MSE and SSIM for all of the generated quantities to simultaneously summarize both the pixel-by-pixel and the neighborhood-dependent features of the predictions.
A. Forecasting density
To create training data, the physics-based MHD simulation ran from t = 0 to t = 1 with producing 1250 snapshots. The first 936 snapshot pairs or were used for training of the neural networks and the last 313 snapshot pairs or were used for testing. All ML models used early stopping and were trained using a learning rate of . Models were trained for 240 epochs with a batch size starting at 2 and doubling every 60 epochs, as recommended in Ref. 53.
It should be noted that we are training on a turbulent part of the Orszag–Tang vortex and then trying to predict the evolution of a smoother, more uniform part. Predictions of the models and their absolute errors at various times are shown in Fig. 3. The SSIMs and the MSEs of the model predictions on the test set are shown in Fig. 4.
The true value of ρ and the model predictions are shown in blue. The absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
The true value of ρ and the model predictions are shown in blue. The absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
The SSIM and MSE on the test set for different models forecasting ρ. The sum of MSE over the test set for each model is displayed next to its label in the MSE plot.
The SSIM and MSE on the test set for different models forecasting ρ. The sum of MSE over the test set for each model is displayed next to its label in the MSE plot.
B. Forecasting the magnetic field
The next problem was the use of the B and v field images at a time steps t to forecast the magnetic field at the next time step . The generated magnetic fields should satisfy physics constraints. It is known that not preserving in numerical simulations leads to magnetic monopole numerical artifacts and the Lorentz force no longer being orthogonal to the magnetic field.17
Predictions of Bx generated by the B-CNN and B-PCNN models shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
Predictions of Bx generated by the B-CNN and B-PCNN models shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
Predictions of By generated by the B-CNN and B-PCNN models shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
Predictions of By generated by the B-CNN and B-PCNN models shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
In Fig. 7, the models' adherence to the divergence-free condition is seen in two ways. First, we look at , as computed as in Eq. (11), for the models at certain time steps. In terms of , only the B-PCNN would generate physical fields (by construction), while the B-CNN created wildly nonphysical fields (by many orders of magnitude). Therefore, we also used a modified less stringent measure of divergence, referred to as relative divergence.17,54
Top three rows: stream plots of B for simulation and models (note: the density of lines does not represent magnitude) Bottom four rows: divergence and the logarithm of the relative divergence error [Eq. (21)] for the models. Note that was calculated using Eq. (11). In the top three rows, the field lines look similar because both networks are reproducing the magnetic fields accurately in terms of SSIM and MSE. In the bottom four rows, the relative divergence is different by orders of magnitude with only the B-PCNN respecting the physics constraints. Because they are plotted on the same color scales, only the very high relative divergence of the B-CNN is visible while that of the B-PCNN looks blank. Although qualitatively the vector fields of both networks look correct, the divergences of the fields they predict differ by many orders of magnitude.
Top three rows: stream plots of B for simulation and models (note: the density of lines does not represent magnitude) Bottom four rows: divergence and the logarithm of the relative divergence error [Eq. (21)] for the models. Note that was calculated using Eq. (11). In the top three rows, the field lines look similar because both networks are reproducing the magnetic fields accurately in terms of SSIM and MSE. In the bottom four rows, the relative divergence is different by orders of magnitude with only the B-PCNN respecting the physics constraints. Because they are plotted on the same color scales, only the very high relative divergence of the B-CNN is visible while that of the B-PCNN looks blank. Although qualitatively the vector fields of both networks look correct, the divergences of the fields they predict differ by many orders of magnitude.
The denominator in Eq. (21) comes from the idea that the effective magnetic field of interest is (recall the magnetic pressure is given by ). In both cases, the divergence of the PCNN was found to be many orders of magnitude lower than that of the CNN, with the lower bound on the PCNN's divergence given by the numerical precision used, which in our case was for a float32, as shown in Fig. 8. Note that for both cases, we are using Eq. (10) to estimate the divergences in order for them to be directly comparable to the constraint transport simulation's notion of the divergence-free condition. Also, we let denote the mean of ϵrd taken over the whole grid at fixed time. A comparison of the MSE and SSIM is shown in Fig. 9.
The mean of the absolute value of the divergences of B [computed as in Eq. (11)], which we denote as , at each time step. The top graph was created using the true values of b generated by the simulation.
The mean of the absolute value of the divergences of B [computed as in Eq. (11)], which we denote as , at each time step. The top graph was created using the true values of b generated by the simulation.
SSIM and MSE on the test set for the different models forecasting B. Note that the sum of the MSE at all time steps for each model is displayed next to its label in the MSE plot.
SSIM and MSE on the test set for the different models forecasting B. Note that the sum of the MSE at all time steps for each model is displayed next to its label in the MSE plot.
1. Residual networks
The hyperparameters for both models were different from previous sections. The models were trained with a learning rate of . The loss function was changed to be the squared error (SE), which is just the MSE without any normalizing constant. Training was run for 90 epochs. The batch size started at 2 and doubled every 30 epochs for a final batch size of 8. Early stopping was not utilized.
Since the change in B from one time step to the next tends to be small, we also made sure that the networks were doing significantly better than predicting zero for [i.e., predicting ]. This is indeed the case. Predictions of the models and their absolute errors at various times are shown in Figs. 10 and 11. The SSIMs and the MSEs of the model predictions over time on the test set are shown in Fig. 12. The absolute error of the relative divergences of these models is displayed in Fig. 13.
Predictions of Bx generated by the ResNet models B-CNN-res and B-PCNN-res shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
Predictions of Bx generated by the ResNet models B-CNN-res and B-PCNN-res shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
Predictions of By generated by the ResNet models B-CNN-res and B-PCNN-res shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
Predictions of By generated by the ResNet models B-CNN-res and B-PCNN-res shown in gold and green, along with the absolute errors of the predictions shown in red. The first row is data generated by the simulation. For the other rows, the snapshots in the first column are the trained models' predictions on a time step in the training data, while the next three columns are predictions on time steps in the test set.
SSIM and MSE on the test set for the different ResNet models forecasting B. Note that the sum of the MSE at all time steps for each model is displayed next to its label in the MSE plot. The result of predicting at each time step is labeled “predicting zero” in each graph.
SSIM and MSE on the test set for the different ResNet models forecasting B. Note that the sum of the MSE at all time steps for each model is displayed next to its label in the MSE plot. The result of predicting at each time step is labeled “predicting zero” in each graph.
The absolute error between the mean of the relative divergence of the simulation and mean of the relative divergence of the various ResNet models forecasting B for time steps in the test set. The sum of the absolute errors of the relative divergence means over the whole test set for each model is displayed next to its label. Note that was calculated as in Eq. (29).
The absolute error between the mean of the relative divergence of the simulation and mean of the relative divergence of the various ResNet models forecasting B for time steps in the test set. The sum of the absolute errors of the relative divergence means over the whole test set for each model is displayed next to its label. Note that was calculated as in Eq. (29).
2. Data augmentation
NNs often require much data to learn properly; thus, often data augmentation is used in their training. Thus, to improve learning and see how well the NNs could generalize, we trained the ResNets on 50 ideal MHD simulations that started with random initial fluid velocity fields and magnetic vector potentials, but kept the same initial pressure and density as in the Orszag–Tang vortex. The NNs were then tested on the Orszag–Tang vortex, which they had never encountered before. Here, we take advantage of the fact that the neural operators work for a wide range of conditions, unlike PINNs which just approximate a particular solution of a PDE.
The 50 physics-based MHD simulations started with random initial v and A fields, where , and B is computed from b via the same averaging operation as in Eq. (10), created by starting with a Gaussian noise field at t = 0 and then applying a Gaussian filter (i.e., a convolution with a normal distribution) to smooth things out, similar to what has been done in other works.55 Wrapping was used in the Gaussian filter to maintain spatial periodicity of the initial conditions. The parameters for the random field generation were chosen so that the values of fields in the initial conditions were of the same order of magnitude as that of the Orszag–Tang vortex's (on average). The simulation was run from t = 0 to 1 and the Orszag–Tang vortex test ran on the same interval. All of the simulations were combined into one training dataset.
Two neural network models were trained on these data: B-CNN-res-rand, which has the same architecture as B-CNN-res, and B-PCNN-res-rand, which has the same architecture as B-PCNN-res. These models were trained for 100 epochs and used a learning rate of . The SE was used again as the loss function. The batch size started at 16 and doubled every 25 epochs, with a final batch size of 128. Early stopping was not used.
3. Surrogate modeling
Our last task for forecasting B was to embed our neural network models inside of the physics-based MHD Orszag–Tang simulation. Specifically, we ran the simulation using constrained transport and finite volume methods until we reached the test set. Then, we switched over to using one of our forecasting models to advance B instead of using constrained transport.
For this setup, we allowed the simulation to produce data up until where the test set starts. Then, we deactivated the part of the simulation that uses constrained transport to advance B and instead used B-CNN-res-rand or B-PCNN-res-rand. These models would take in and and then produce . The simulation would then be used to advance all other variables, incorporating where relevant.
The B-CNN-res and B-PCNN-res models failed rapidly, quickly making predictions that resulted in the simulation becoming plagued with NaN errors even with the fix discussed above. One of the difficulties faced by this preliminary study of embedding the CNNs within the physics model was that we were using both and as NN inputs, but only predicting , while was generated by the physics model itself. This disconnect resulted in the build-up of nonphysical numerical errors, but was greatly improved by the use of residual networks, which only predict small perturbations. Both the B-CNN-res-rand and B-PCNN-res-rand models were able to carry out the whole simulation without producing NaN errors. These models still ended up predicting some negative values for the cell-centered version of Eq. (30), but this value is not necessary to advance the simulation, so it did not lead to additional problems. The result of this in terms of the MSE and SSIM for all primitive variables is shown in Fig. 14. How the relative divergences differ is shown in Fig. 15. A qualitative comparison of the primitive variables ρ, p, vx, and By is shown in Fig. 16.
SSIM and log base 10 of the MSE on the test set for all primitive variables in the simulation produced using B-CNN-res-rand or B-PCNN-res-rand as a surrogate model. Note that the sum of the MSE at all time steps for each primitive variable is displayed next to its label in the MSE plots. Also note that the SSIM was calculated using the same data range for all primitive variables.
SSIM and log base 10 of the MSE on the test set for all primitive variables in the simulation produced using B-CNN-res-rand or B-PCNN-res-rand as a surrogate model. Note that the sum of the MSE at all time steps for each primitive variable is displayed next to its label in the MSE plots. Also note that the SSIM was calculated using the same data range for all primitive variables.
The mean of the relative divergence of the simulation data, and the data generated using surrogate models, for time steps in the test set. The sum of the relative divergence means over the whole test set for each model is displayed next to its label. Note that was calculated as in Eq. (29).
The mean of the relative divergence of the simulation data, and the data generated using surrogate models, for time steps in the test set. The sum of the relative divergence means over the whole test set for each model is displayed next to its label. Note that was calculated as in Eq. (29).
Qualitative comparisons of the primitive variables ρ, p, vx, and By for data generated by the simulation and the simulation using surrogate models at certain time steps. For this figure, B-CNN-res-rand has be abbreviated to CNN and B-PCNN-res-rand has be abbreviated to PCNN.
Qualitative comparisons of the primitive variables ρ, p, vx, and By for data generated by the simulation and the simulation using surrogate models at certain time steps. For this figure, B-CNN-res-rand has be abbreviated to CNN and B-PCNN-res-rand has be abbreviated to PCNN.
C. Predicting the magnetic field
One advantage of PINNs is that by automatic differentiation, you can compute time/spatial derivatives efficiently and accurately, hence their use in looking for PDE solutions. They are also independent of meshes. On the other hand, unlike the CNNs, there is no translation equivariance built into B-PINN. Also, by approximating a particular solution of a PDE, it is less general than the neural operator, which can, in principle, approximate the solution operator of the PDE, thus solve for any given initial conditions.
The MSE of the models on the test data and is shown in Figs. 17 and 18, respectively. Both the -CNN and -PCNN have low on the training data, with -PCNN matching the simulation nearly exactly from about t = 0.15 to before end of the training data at t = 0.75, then rising for the test data. In contrast, B-PINN had a steady relative divergence error of slightly less than 1% (Fig. 18).
MSE on the test set for the different models predicting given . The sum of the MSE on the test set for each model shown in the legend.
MSE on the test set for the different models predicting given . The sum of the MSE on the test set for each model shown in the legend.
The mean relative divergence error on B predictions for different models at each time step. For the simulation, the volume-averaged B was used, and for the convolutional models, B was used instead of the face valued b, which satisfy Eq. (11), in order to have direct comparisons. The sum of on the test set for each model shown in the legend.
The mean relative divergence error on B predictions for different models at each time step. For the simulation, the volume-averaged B was used, and for the convolutional models, B was used instead of the face valued b, which satisfy Eq. (11), in order to have direct comparisons. The sum of on the test set for each model shown in the legend.
D. Other configurations
We investigated ways to improve both the convolutional models and the PINNs. For the CNN and PCNN, we used data augmentation, with the models training on 50 simulations like earlier. Also, v was added as an input to the PINN, with the idea that access to it on the test data may improve predictions. These approaches only increased the error on the test set.
VI. DISCUSSION
We briefly summarize the above results with a list of all of the various neural networks that were used and the names with which we refer to them, as shown in Table I. A summary of quantitative MSE and SSIM values for the various architectures is given in Table III.
Summary of results. Bold indicates best performing model according to column metric. “Total” here means the sum over all time steps on the test set.
Forecasting ρ . | |||
---|---|---|---|
Model . | Total MSE ( . | . | . |
ρ-CNN-all | 8.97 | ||
ρ-CNN-rel | 9.87 | ||
ρ-PCNN-rel | |||
ρ-PCNN-all | 3.08 |
Forecasting ρ . | |||
---|---|---|---|
Model . | Total MSE ( . | . | . |
ρ-CNN-all | 8.97 | ||
ρ-CNN-rel | 9.87 | ||
ρ-PCNN-rel | |||
ρ-PCNN-all | 3.08 |
Forecasting B . | |||
---|---|---|---|
Model . | Total MSE () . | Total . | Total ( . |
B-CNN | 35, 600 | ⋯ | |
B-PCNN | 73, 700 | ⋯ | |
B-CNN-res | 4.06 | ⋯ | 7.706 |
B-PCNN-res | 5.21 | ⋯ | 7.712 |
B-CNN-res-rand | ⋯ | 6.93 | |
B-PCNN-res-rand | 1.66 | ⋯ | 5.387 |
Forecasting B . | |||
---|---|---|---|
Model . | Total MSE () . | Total . | Total ( . |
B-CNN | 35, 600 | ⋯ | |
B-PCNN | 73, 700 | ⋯ | |
B-CNN-res | 4.06 | ⋯ | 7.706 |
B-PCNN-res | 5.21 | ⋯ | 7.712 |
B-CNN-res-rand | ⋯ | 6.93 | |
B-PCNN-res-rand | 1.66 | ⋯ | 5.387 |
Predicting B . | |||
---|---|---|---|
Model . | Total MSE . | Total . | . |
-CNN | 5.64 | 3.55 | |
-PCNN | 4.76 | ||
B-PINN | 2.58 |
Predicting B . | |||
---|---|---|---|
Model . | Total MSE . | Total . | . |
-CNN | 5.64 | 3.55 | |
-PCNN | 4.76 | ||
B-PINN | 2.58 |
A. Forecasting density
We first studied of the use of neural networks to forecast the mass density ρ of a subsequent time step based on field measurements from the previous time step (t). In this case, we compared using a CNN without any physics constraints, which takes all fields as input images at time t to predict . We also incorporate a hard physics constraint in enforcing by utilizing a ReLu activation function on the network's output, referred to as ρ-PCNN-all. For density forecasting, we also take into account Eq. (1) and train each of the above networks with only the relevant fields and . These networks using only relevant data are referred to as ρ-CNN-rel and ρ-PCNN-rel.
As can be seen from Figs. 3 and 4, the models that performed best according to both the MSE and SSIM were the ones that used ReLU activation functions in their final layer. This is likely due to the fact that the ReLU function is enforcing the hard constraint ; thus, the model does not have to learn this, which facilitates the model learning quicker. We also tested enforcing non-negativity by using a linear output layer and then squaring its output. The MSE on the test set was nearly the same as with using a ReLU final layer. In terms of both MSE and SSIM, it is found that utilizing only the relevant data only slightly improves performance, while adding the hard physics constraint results in a dramatic improvement.
B. Forecasting the magnetic field
Figures 5 and 6 show the Bx and By fields generated by the B-CNN and B-PCNN relative to the true field values at several time steps. While the predictions of both models are very accurate and almost indistinguishable by eye, only the B-PCNN with hard physics constraints based on the approach in Ref. 14 generated physical fields with zero divergence, while the fields generated by the B-CNN had divergence values higher by a factor of 108, as seen in Figs. 7 and 8.
Zooming in on the absolute differences between true values and predictions in Figs. 5 and 6, a slightly higher error can be seen for the B-PCNN approach, which is quantified in terms of both MSE and SSIM in Fig. 9, which shows that the SSIM of both methods is very high , while the MSE of both methods is very low . The slightly higher SSIM and slightly smaller MSE of the B-CNN predictions relative to those of the B-PCNN, given the same number of training steps, is due to two major factors.
First, a well-known issue with the B-PCNN approach on a mesh is that boundaries have to be handled carefully and even masked out, which was not done here, as can be seen by the visual artifacts that appear near the boundary of the simulation. This is due to taking finite-difference derivatives at a boundary which sample beyond the boundary (requiring periodic or zero padding). Using periodic padding in the encoder part of the B-PCNN architecture helped somewhat alleviate this, but would have to be carefully applied on the output of the PCNN so that when B is computed as the curl of A artificial numerical errors are not introduced. Second, while the B-CNN has no built in constraints and can generate arbitrary outputs, the B-PCNN is very highly constrained in that in can only generate fields that by construction satisfy the hard physics constraint of .
While the accuracy of both model's predictions is very high in terms of both MSE and SSIM, only the B-PCNN generates physical fields (within numerical precision), as can be seen in Figs. 7 and 8. In Fig. 7, the pixel-wise values of the predicted fields are shown for several times steps on a fixed color scale. On that scale, while large values are clearly visible for B-CNN predictions, they are almost impossible to see for the B-PCNN predictions. This difference is quantified in Fig. 8, which shows the image-averaged values for all of the predicted test value time steps. While the B-PCNN values are , with a lower bound dictated by the numerical precision of the Tensorflow floating point 32 data type, the B-CNN physics constraint violation is catastrophic at , eight orders of magnitude higher.
As discussed above, one way to make this problem much easier for the CNN is to utilize a residual CNN, which takes into account our knowledge that this is a continuous dynamic process. Therefore, the magnetic field at a subsequent time step, , should be close to its previous value for small values of Δ. The new field value should be a small perturbation of the form . It was found that by giving the network this strong hint and having it generate directly, the accuracy of both the B-CNN and B-PCNN was greatly improved as seen in terms of MSE in Fig. 12, which decreased by four orders of magnitude.
In terms of ϵrd, the residual B-CNN-res and B-PCNN-res were found to have very similar values (besides a few random outliers) as each network easily found a correct small perturbation of the divergence-less B field from the previous time step provided by the simulation, as shown in Fig. 13. The ResNet setup is only predicting small perturbations, which are added to the correct simulation-based values whose divergence is zero, from previous states. Therefore, the overall divergence of the predicted fields will have contributions from only the small perturbations generated by the ResNets. That is why in this case, the “relative divergence” of the PCNN and CNN becomes very similar.
Next, it was shown that training on 50 random simulations, each of which was a slight perturbation of the original Orszag–Tang vortex, and then predicting on data from the Orszag–Tang vortex, yielded models that preformed even better than just training on data from the Orszag–Tang vortex alone, as also seen in Fig. 12. In this case, the networks had a much more rich set of data to learn from. In this residual approach, again with most of the output dominated by the known previous step's divergence-less fields, the relative divergences of both approaches were relatively low.
These results imply that this approach can be useful for a wide range of unseen initial conditions as long as the network is trained over a sufficiently large dataset with a large enough variety of field distributions. Furthermore, such a trained network should be useful for a completely different set of initial conditions if it is first quickly re-trained on just a few of the initial steps.
C. Surrogate modeling
The only two models we were able to successfully apply to surrogate modeling were B-CNN-res-rand and B-PCNN-res-rand. The use of surrogate models had the greatest effect on B and v with errors increasing rapidly with time as can be seen in Fig. 14. The same trend is reflected in the SSIM. While ρ and p do start to diverge from the simulation as time goes on, this divergence is on a scale two orders of magnitude less than that of v and B.
In terms of relative divergences, B-PCNN-res-rand preformed the best overall.
For this more challenging problem of recursively applying the ML on its own outputs, the B-CNN-res-rand and the B-PCNN models both had high SSIM and low MSE, but in terms of relative divergences, the B-PCNN performance was much better, although after many steps both models diverged. Note that in this case, the use of relative rather than actual divergence makes artificially makes the B-PCNN look bad because, as seen in the bottom three rows of Fig. 16, as the MHD simulation becomes more diffuse over time, the magnitude of the magnetic field decreases significantly, which decreases the denominator or Eq. (37), while the numerator is relatively fixed in the B-PCNN case limited by numerical precision.
D. Predicting the magnetic field
The final test of these approaches was to predict the magnetic field at a given time step based only on velocity field measurements from the same time step. Although it would have been possible to do this with a residual network approach using the previous time step's magnetic field, , this was not done in order to fully test the network's ability to match both field accuracy and the divergence-less condition, which as mentioned above is otherwise dominated by the correct B field values provided by the simulation. In this case, in addition to the CNNs, a point-wise PINN-based network was also tested.
All three networks performed well with low MSE as shown in Fig. 17, and visually indistinguishable field maps. As before, only the B-PCNN was able to produce divergence-less fields that matched the simulation, as shown in Fig. 18. For the convolutional models, -PCNN outperformed -CNN in both accuracy and upholding the divergence-less condition. The B-PINN performed better than the convolutional models with respect to MSE on the test data, but generated un-physical data that resulted in the worst violation of the physics constraints as can be seen in Fig. 18, despite having the explicit PDE constraint. Again the only approach able to generate physical field values was the PCNN. Furthermore, the PINN approach is slower to train and slower to generate images as it works on a point-by-point basis rather than seeing and generating entire images at once.
VII. CONCLUSIONS
We showed that convolutional neural networks can be used for MHD, how certain physics constraints can be embedded and which ones led to improvements in performance. It was also seen that using ResNets and data augmentation could make a big difference. The forecasting and surrogate modeling tasks show that PCNNs could potentially be helpful to speed up simulations. The prediction task demonstrated that they may be of use in analyzing MHD phenomena when access to certain observables is not possible. Something similar has already been done with PINNs to study the solar coronal magnetic field.56 Deep generative encoder–decoder convolutional neural networks are also being combined with adaptive model-independent feedback to make robust surrogate models that generate all of the 15 2D projections of the 6D phase space of charged particle beams based only on a limited view of a single 2D measurement.57 Adaptive feedback methods have been combined with simulations to provide virtual views of complex dynamics based on limited data,58 and there is potential to develop such an approach for plasma systems by using fast MHD surrogates based on CNNs.
It is worth pointing out that for a fixed neural network architecture, a slightly different result can occur every time that the network is trained using a new random seed and new random initial weights and biases, which are sampled from probability distributions. In this work, we did re-train networks many times and none of the results were significantly changed. Furthermore, it is also true that various architecture choices such as the number of layers, the width of each layer, and the activation functions being used can all slightly change a network's performance. Despite this possibility of small variation, when our networks were trained with various activation functions or levels of regularization, the overall results still held. The overall main point is that the different approaches, independent of the details of the architecture used, have certain limitations, and the B-PCNN approach with hard physics constraints built into the architecture, based on the methods developed in Ref. 14, was the only method that generated physical fields that respected constraints.
Future work could go beyond ideal MHD to incorporating magnetic diffusion and fluid viscosity into the simulation. Also, the spatial resolution could be increased beyond 128 × 128. It was observed that the Orszag–Tang vortex picks up interesting detailed structures at higher resolutions. A systematic investigation into how robust the MHD CNNs/PCNNs and PINN models are to noise would be worth undertaking and inform their use in real world applications. Relatedly, the lack of interpretability of NNs makes uncertainty quantification crucial. The challenge is that many standard techniques would be prohibitively slow given the number of parameters the NNs have, and thus, other methods have been developed.59 Given their different strengths, finding effective ways of combining PCNNs and PINNs for MHD problems may prove quite fruitful. One such approach already investigated was the application of Physics Informed Neural Operators in the form of Fourier neural operators to MHD.44
Employing PCNNs to study real-world MHD situations can help speed up simulations, as well as be an effective approach in cases with incomplete information or with problems that are not well-posed. Taking into account the trade-off between ease of training and generating physical fields, the choice of ML with or without hard physics constraints likely depends on the application of the user. If the goal is to simply create a fast-executing surrogate model that provides a fast qualitative view of a system's state, such as a display in a control room that provides intuition for an experimenter, then the use of ML without hard physics constraints may be sufficient with the benefit of faster training. If on the other hand the goal is to develop a physically consistent surrogate model with the goal of speeding up computationally expensive calculations in a physics model while generating physical fields that numerically satisfy all of the constraints, then ML with hard physics constraints should be used.
ACKNOWLEDGMENTS
This work was supported by the Los Alamos National Laboratory Directed Research and Development (LDRD) Program Directed Research (DR) project “Charged Particle Beam Control and Diagnostics using Adaptive Machine Learning,” Project No. 20220074DR.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Ari Bormanis: Conceptualization (equal); Data curation (lead); Formal analysis (equal); Investigation (equal); Writing – original draft (equal); Writing – review & editing (equal). Christopher Leon: Data curation (equal); Formal analysis (equal); Investigation (equal); Writing – original draft (equal); Writing – review & editing (equal). Alexander Scheinker: Conceptualization (lead); Data curation (equal); Formal analysis (equal); Funding acquisition (lead); Investigation (equal); Methodology (lead); Supervision (equal); Writing – original draft (supporting); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX A: MHD EQUATIONS IN ARBITRARY DIMENSIONS
APPENDIX B: CONSTRAINED TRANSPORT SIMULATIONS
1. Finite volume method
2. Understanding the simulation data
To ensure the simulation is accurate, it must obey the Courant–Friedrichs–Lewy (CFL) condition, which enforces a maximum time step at each update in the simulation. Heuristically, this is the restriction that changes cannot propagate faster than the fastest possible wave speed, here the fast magnetosonic speed. We have modified the simulation to use a constant time step of and the reason this time step is so small is to ensure that it never violates the restriction placed on it by the CFL condition for the whole simulation duration.