Developments in machine learning promise to ameliorate some of the challenges of modeling complex physical systems through neural-network-based surrogate models. High-intensity, short-pulse lasers can be used to accelerate ions to mega-electronvolt energies, but to model such interactions requires computationally expensive techniques such as particle-in-cell simulations. Multilayer neural networks allow one to take a relatively sparse ensemble of simulations and generate a surrogate model that can be used to rapidly search the parameter space of interest. In this work, we created an ensemble of over 1,000 simulations modeling laser-driven ion acceleration and developed a surrogate to study the resulting parameter space. A neural-network-based approach allows for rapid feature discovery not possible for traditional parameter scans given the computational cost. A notable observation made during this study was the dependence of ion energy on the pre-plasma gradient length scale. While this methodology harbors great promise for ion acceleration, it has ready application to all topics in which large-scale parameter scans are restricted by significant computational cost or relatively large, but sparse, domains.

With the advent of chirped-pulse amplification came the prospect of using high-intensity, short-pulse lasers to accelerate particles to high energies.1–4 There are many proposed approaches for laser-driven ion acceleration (LIA),5–8 but the predominant mechanism studied to date has been that of target-normal sheath acceleration (TNSA).9 In TNSA, hot electrons generated by the ponderomotive force of the impinging laser propagate through the target and create a standing electric field, i.e., sheath field, on the far side of the target, which in turn accelerates ions. The generated ion beams are noted for their high brightness and spectral cutoff, high directionality and laminarity, and short-pulse duration beam properties.7,10–12 There are several applications of LIA to high-energy density physics, notably isochoric heating in inertial confinement fusion (ICF),13,14 radiography,15–17 and deflectometry.18–20 There are several promising future applications such as high-brightness injectors for accelerators,21,22 ion therapy,23 and radioisotope production.24 Generally, experiments have been limited to low repetition rates, though new technological advances promise significantly higher rates as the field matures.25 

An important motivation for this work is advances in short-pulse laser technology that allow us to carefully control and shape the laser pulse in time. This capability will allow for unprecedented control of the experiment and potentially new approaches to ion acceleration.26 It is inspired by standard approaches in ICF where nanosecond long pulses are temporally shaped.27 However, by adding such complexity to our problem we enormously expand the potential parameter space we wish to explore. Simulations, which are necessary to study the detailed physics of the experiment either theoretically or as a synthetic diagnostic, remain relatively expensive, particularly the particle-in-cell (PIC) simulations that are used to study short-pulse, laser-matter interactions.28–30 

While artificial neural networks (NNs) were first proposed in 1957,31 they did not see significant widespread appreciation nor application until the explosion of big data at the beginning of the 21st century.32,33 Whereas traditional machine learning (ML) algorithms (such as Random Forests) may saturate with respect to their capabilities with large amounts of data, NNs seem to continue to improve as data increases.34 An arbitrary wide layer of artificial neurons, also known as perceptrons, has been theoretically established as a universal approximator.35 Still, it was only with the reorientation toward deep, multilayer NNs that they really demonstrated their capabilities.36,37

NNs continue to see their primary application in industry, particularly with respect to classification problems,38 image analysis,39 and natural language processing.40 Industrial applications of modern machine learning greatly outpace scientific ones chiefly due to the difference in data scale. Where internet search engines may be able to analyze over a billion new images daily, most scientific endeavors depend on making do with orders of magnitude fewer viable datapoints over the entire course of a project, which can sometimes take years to collect. The first case provides uniform, densely-sampled, independent and identically distributed data, while in the second case we are dealing with nonuniform, sparsely sampled and small-scale data.

Nonetheless, there have been several successful applications of NNs to the natural sciences in recent years, whether analyzing data from the Large Hadron Collider,41 stabilizing beam properties at the LINAC Coherent Light Source42 and Advanced Light Source,43 or studying large ensembles of ICF simulations.44–46 As far as the authors are aware, this work is the first to apply NNs to LIA in general. Previous efforts have focused on other ML techniques such as genetic algorithms as optimization technique47,48 or a different physical regime of laser-plasma physics for feature identification and indirect measurement of experimental parameters.49 With the NN approach, we can make a surrogate model with continuous mapping over parameter space as opposed to a pointwise parameter scan.

We demonstrate how we can train a multilayer NN on the characteristics of the simulation ensemble and the ability of the NN to investigate multidimensional, sparse parameter spaces. Creating a generalized function for nonlinear regression, we can reproduce the maximum ion energy curves Ei and hot electron temperature Te as functions of time and all relevant input parameters. This surrogate model can in turn be used to map out large swaths of parameter space to illustrate the dependence of these observable quantities on the inputs, something otherwise too expensive to achieve with a traditional parameter scan. These parameter studies can in turn be used to identify regions of interest for further elucidation, known typically as exploration/exploitation in standard ML literature.

In our case, the dataset is composed of scalar outputs extracted from over 1000 PIC simulations in 1D performed using the EPOCH PIC code.50 The basic dataset consists of several distinct ensembles of data in which we vary the initial laser intensity I0[W/cm2], the laser pulse duration τfs, the pre-plasma gradient length scale Lgμm, the target foil thickness D[μm], target electron plasma density n0#cm3, and distance between laser antenna and target d[μm] (effectively the simulation box size). An example of the problem setup can be seen in Fig. 1. The laser pulse envelope is defined as It=I0expt2/τ02, where τ0 is related to the Full-Width Half-Maximum (FWHM) value τ via τ0=τ/2ln2, and target plasma as

nz=n0expzd/Lgfor dlog0.001Lgz<d,n0for dz<d+D,0elsewhere.
FIG. 1.

Input parameters in consideration: simulation time t, laser intensity I0, pulse duration τ, foil thickness D, gradient length scale Lg, and target density n0.

FIG. 1.

Input parameters in consideration: simulation time t, laser intensity I0, pulse duration τ, foil thickness D, gradient length scale Lg, and target density n0.

Close modal

The cutoff at dlog0.001Lg defines a minimum plasma density value of 0.1nc for the target plasma. Considering time t as well, we generated a base dataset of 800 simulations with 6 input parameters. Time is treated as an equal parameter to the other inputs to allow for targeted exploration of the parameter space as well as data augmentation with respect to t as described in  Appendix A. The parameter d is not used as a training parameter but is instead incorporated into the simulation time so that for simulations with d=50μm the laser pulse arrives at the target at the same time relative to those for which d=100μm, i.e., t=(50d)/c. There is a minor difference in how thermalized the plasma was given the later arrival of some pulses vs earlier ones but this was determined to be a negligible effect. An additional 307 simulations were generated to examine specific parameter sub-spaces, which we designate as ensembles E3–E5.

As opposed to a full TNSA model, for which one would include a layer of contaminants on the far side of the target, we consider only a single target composed of electrons and deuterons, following the assumptions of the self-similar model by Mora.51 This was done to greatly reduce the complexity of the simulation, e.g., removing considerations of target foil material and contaminant layer properties, while still capturing the essential physics of TNSA. A commonly used quasi-empirical formulation of the Mora model known as the Fuchs model takes the following form:

Emax=2Thlogtp+tp2+11/22,
(1)

where Emax is the maximum proton energy achieved, Th=mec21+a02/21,9,me is the electron mass, c is the speed of light, a02=7.3×1019λμm2I0[W/cm2] is the normalized scalar potential of the laser squared, λ is the laser wavelength, tp=ωpitacc/2e1, ωpi2=4πZiq2n0/mi is the plasma frequency of the ions, Zi is the charge number, mi is the ion mass, q is the fundamental charge, and tacc1.3τ is related to the laser pulse duration.52 

An example of our physical scenario can be seen in Fig. 2. Here we have depicted the phase space diagrams for the electrons and deuterons at t= 500 fs for initial parameters of I=1.6×1019W/cm2,τ=29fs,D=5μm,Lg=0.5μm,n0=100nc,d=100μm, taken from E1, where nc=meω2/4πq2 is the critical plasma density and ω is the laser frequency. Additional physical and numerical parameters in this simulation that are shared across all ensembles are initial ion and electron temperatures Ti=Te=100eV, laser wavelength λ=1μm, simulation box size d,150μm, where d is either 50μm or 100μm for E1 and E2-E5, respectively. Spatial resolution was set to Δz=0.01λ, and temporal resolution was automatically set by the PIC code. The maximum ion energy, prior to post-processing, is the most energetic ion in the simulation box [the solid circle in Fig. 2(b)]. Peak ion energy is a typical benchmark of experiments and simulations modeling them, cf. Fuchs et al.,52 although there is a chance that the leading population of the ion front could be too sparsely populated in a low ppc PIC simulation. However, this did not seem to be problematic in this study. The electron temperature, Te, is extracted from the hot electron tail, as seen in Fig. 2(c).

FIG. 2.

Example phase space diagrams for (a) the electrons and (b) the deuterons at t=500 fs and the corresponding energy spectra (c) and (d) derived from the PIC simulation. The maximum ion energy curve corresponds to circled ions in the phase diagram (b). Initially ions at the front of the target are accelerated the most (dashed), but then are outpaced by ions accelerated by the electron sheath at the rear of the target (solid). The hot electron temperature fit Te is shown as well in (c).

FIG. 2.

Example phase space diagrams for (a) the electrons and (b) the deuterons at t=500 fs and the corresponding energy spectra (c) and (d) derived from the PIC simulation. The maximum ion energy curve corresponds to circled ions in the phase diagram (b). Initially ions at the front of the target are accelerated the most (dashed), but then are outpaced by ions accelerated by the electron sheath at the rear of the target (solid). The hot electron temperature fit Te is shown as well in (c).

Close modal

For each time step, we extracted the maximum positive ion momentum, from which we derive the maximum ion energy, Fig. 2(b), and the hot electron temperature, Fig. 2(c). The ion energy relates to the momentum via Eit=6.242×1012mic21+pmaxt/mic21. The total dataset gives 121,530 points, i.e., #sims×#timesteps, and this is after post-processing where spurious data are discarded. While not a small quantity on its own, for a multidimensional dataset for which several of the dependent parameters scale over several orders of magnitude, this number of datapoints was still statistically insufficient to generate reliable results using a NN for all input parameters. A more detailed discussion of the total dataset and data processing can be found in  Appendix A.

The full ensemble of simulations used to train the neural network is comprised of multiple focused samplings, systematic and random. All simulations were generated using the EPOCH PIC code.50 The two largest ensembles were composed of 300 simulations, E1, for which we only vary the intensity I0 and pulse duration τ, and 500 simulations, E2, for which we vary intensity I0, pulse duration τ, target thickness D, target plasma density n0, and pre-plasma gradient length Lg. The E1 simulations were ran up to 3 ps but it was found that many ion curves had not converged to a maximum value and so E2 and subsequent ensembles were ran up to 5 ps.

The simulation box for E1 was Ls=200μm long, with the laser antenna located at z=0 and the target front located z=50μm but then for the larger simulation box in E2, Ls=250μm, the target was located at z=100μm. This was done to effectively capture the interaction between longer pulses and the pre-plasma that expands toward the incoming laser for longer pulses. All 800 simulations in E1 and E2 were initially run with particles-per-cell counts (ppc) of 200.

A third ensemble, E3, was conducted where we focused on a region in parameter space to investigate a discontinuity that appeared there. In ensembles E1-E3, the input parameters τ,D,andn0 were randomly but linearly varied uniformly across a specified range and then I0 and Lg were varied uniformly over log-space, i.e., log10I0=uniformxmin,xmax. Two additional, systematic parameter scans, E4 and E5, were made to study the effect of the pre-plasma gradient length scale. The exact boundaries in parameter space for all the ensemble datasets are detailed in Table I and visualized in Fig. 3. The physical boundaries of our parameter space were chosen so as to test the limits of the parameter space for interest for TNSA, for both existing and future experimental facilities, e.g., low intensities for multipicosecond pulses.

TABLE I.

Simulation dataset prepared for this study. For ensemble number E No., we have the number of simulations No., the maximum simulation time tm(ps), the pulse duration τ(fs), the intensity I0(W/cm2), the target foil thickness D(μm), the target density n0(#/cm3), the pre-plasma gradient length scale Lg(μm), the simulation box size Ls(μm), and the particle-per-cell count ppc(#macroparticles). Ensembles E1* and E2* include simulations from ER for which tm=5ps,Ls=250, and ppc=1000.

EnsembleNo. of simstmax(ps)τ (fs)I0 (W/cm2)D(μm)n0 [nc]Lg[μm]Ls[μm]ppc [No.]
E1* 300 [20,200] 1018,1020 5 100 0.5 200 200 
E2* 500 [20,500] [1017,1021] 5,25 80,120 0.1,10 250 200 
ER 200 [20,500] [1017,1021] 5,25 80,120 0.1,10 250 1000 
E3 100 [75,175] 1019,5.1020 5,25 80,120 0.1,10 250 1000 
E4 99 [50,300] 1018,1020 100 0,7.5 250 1000 
E5 108 [30,400] 1018,1020 1,4 100 1,10 250 1000 
EnsembleNo. of simstmax(ps)τ (fs)I0 (W/cm2)D(μm)n0 [nc]Lg[μm]Ls[μm]ppc [No.]
E1* 300 [20,200] 1018,1020 5 100 0.5 200 200 
E2* 500 [20,500] [1017,1021] 5,25 80,120 0.1,10 250 200 
ER 200 [20,500] [1017,1021] 5,25 80,120 0.1,10 250 1000 
E3 100 [75,175] 1019,5.1020 5,25 80,120 0.1,10 250 1000 
E4 99 [50,300] 1018,1020 100 0,7.5 250 1000 
E5 108 [30,400] 1018,1020 1,4 100 1,10 250 1000 
FIG. 3.

Scatter plot depiction of data ensembles as a function of input parameters I0 and τ. E1 is depicted in green, E2 in black, E3 in orange, E4 in red, and E5 in blue.

FIG. 3.

Scatter plot depiction of data ensembles as a function of input parameters I0 and τ. E1 is depicted in green, E2 in black, E3 in orange, E4 in red, and E5 in blue.

Close modal

A revised dataset, ER, was generated to correct for simulations in E1 and E2 that were found to be numerically problematic. ER consists of 200 simulations with the same parameters as a selection of runs in E1 and E2 that were rerun with higher ppc count and larger box size. This was done because they were either found to suffer from excessive numerical heating or simply diverged from the NN prediction too much. This revised ensemble, ER, and its generation, is discussed in further detail in the  Appendix B.

Various architectures were tested but the ones used for training the ion energy and electron temperature data are depicted in Fig. 4. The general structure remains the same as the basic form for a fully connected NN for regression analysis, a multilayer NN converging to a single output value. The post-processed and normalized data are injected into the first layer and then proceed through five internal layers. The electron temperature was found to train better if an additional internal layer was introduced.

FIG. 4.

The neural network architecture used in this study, where the ion energy data Ei was trained with a 6-layer network and the electron temperature Te with 7 layers.

FIG. 4.

The neural network architecture used in this study, where the ion energy data Ei was trained with a 6-layer network and the electron temperature Te with 7 layers.

Close modal

To improve the loss function, i.e., the mean square error (MSE) between the NN-based surrogate model fit and the training data, we used data augmentation, a standard technique in deep learning. The dataset was augmented by simple, cubic interpolation by a factor of 20-fold. This generated a post-processed dataset of several million datapoints and resulted in a sufficiently low loss function value. It is important to note, while this rudimentary augmentation was sufficient to elucidate the dependency of the outputs on the time parameter t, the other variables remain relatively sparsely populated in parameter space. This deficiency becomes pronounced when attempting to extrapolate any parameter besides t using the trained NN. In this study, t is naively treated as an equal parameter as the other inputs but can be treated more rigorously via recurrent and time-series techniques that can be implemented as well. In addition to augmenting the data, we also normalized it. This is particularly important given that the parameters vary greatly with respect to one another in range and value. The specifics of data augmentation and normalization, which varied depending on the parameter in consideration, are discussed in the  Appendix A.

The benefits were relatively small, but we also included dropout layers after the bulk of the internal layers and used both Lasso (L1) and Ridge (L2) regularization, though these provided minor corrections. Dropout was set to 0.01 throughout all trainings and then we used L1=109,L2=5×109 for the ion energy data and L1=1010,L2=1010 for the electron temperature. Batch size was 1024 for the augmented ion energy data and 2048 for the electron temperature. Using higher values of dropout and regularization would excessively slow down training and seemed to converge to less optimal solutions, while no dropout or regularization seemed to give earlier overfitting.

Several activation functions were tested and it was found that a combination of parametric rectified linear unit (PReLU) and sigmoid activation functions gave the best results. While the more standard ReLU function also achieved appreciable results, the PReLU function was able to train much faster and avoids the possibility of the “dead” ReLU. The layer weights were initialized with a normal distribution centered at zero with a default standard deviation of 0.05. For the ion energy data, we had a total of 199,170 training parameters and for the electron temperature 265,218, consisting of both weights and biases. The network is trained on each dataset over several hundred epochs until a satisfactory accuracy is achieved. The neural network model was implemented using the Keras-Tensorflow 2.1 API.53 

For this problem, we used supervised learning whereby we trained the neural network on the output data given specific inputs. For the optimizer, we use adaptive moment estimation (Adam) and for the loss function MSE. The data are separated into 5% test data, 20% validation data, and 75% training data. The test data are separately allocated to preserve the structure of the input curves while the validation and training data are shuffled to improve statistics and reduce overfitting.

For both ion and electron data, we trained the network for 500 epochs, which was done using the TensorFlow backend. For both ion and electron datasets, it took about 12 s to complete one epoch on an NVIDIA V100 Volta GPU on LLNL's Lassen computing platform for a total training time of about 1–2 h to train the networks on their respective datasets (Ei and Te). It is a simple process to train a single network on both Ei and Te concurrently but we found better performance if kept separate. Slightly longer but tolerable training times were also achieved on a standard personal computer GPU such as an NVIDIA 1080 Ti. The neural networks using just E1 + E2 were trained up to 500 epochs, after which the NN can begin to overfit Ei and then suffer significant training noise for Te. The learning rate for this case was lr=0.001, the default for TensorFlow. Further details can be found in  Appendix C.

Ours is a regression problem and there is no objective figure-of-merit that can be made for the training process as one can for classification problems in ML. Therefore, a satisfactory error threshold must be determined through trial and error. The most successful NN surrogates typically had a final MSE error of 5×106 or less after training. Lower errors are possible given that no strong signs of overfitting were observed in loss history curves but would have required much longer training times than used in this study. To put into context, the output data, i.e., Ei and Te, are normalized to a range of values [0,1] and have a standard deviation in of approximately around σEi=0.1 for Ei and σTe=0.01 for Te. This is for the entire ensemble of data, E1–E5. The maximum normalized value of the ion energy, i.e., the simulation which reached the largest ion energy value in the entire ensemble, prior to the introduction of augmentation and noise, was Ẽi,max=0.99 with σmax=0.34 and the minimum was Ẽi,min=0.012 with σmin=0.003, to give a more explicit reference point for the MSE loss of the final NN surrogate. It was found that loss errors greater than 105 would give unsatisfactory results when comparing the surrogate to the training data directly. These values reflect the approach to normalization taken as well as the data itself, so a different threshold value should be expected for different data and approaches to data preparation. If anything, it was most surprising that total model loss should be several orders of magnitude less than the standard deviation of values within the data, but this did not hinder the development of the surrogate model in any way.

Once the NN is trained, we can use it as a generalized function depending on the normalized input values. Unlike a parametric scan, which is discrete, the NN-trained surrogate model defines a continuous mapping between the inputs and the output.

To verify the veracity of the surrogate, we examine individual cases of the ion energy curves. Plotted in Fig. 5, we have the maximum ion energy Ei for the deuterons at each time step as well as the corresponding hot electron temperature curves Te. Provided are three case examples with initial laser intensities I0=1.1×1018,1.1×1019, and 5.1×1020W/cm2, taken from ensembles E1 and E2, as well as additional variations in the other input parameters, with MSE values for individual simulations ranging between 107 and 104. The surrogate faithfully reproduced the training data except at low intensities I01018W/cm2. A caveat regarding the data is that the small step visible in Ei near the onset of ion acceleration is due to the ions being directly accelerated by the laser on the front-side of the target as opposed to ions accelerated on the rear-side due to the TNSA mechanism. This is noted in the phase diagram of Fig. 2(b), where the ions in the dashed circle correspond to the bump in the distribution function shown in Fig. 2(d) (note that the higher energy ions correspond to the TNSA ions leaving the rear surface of the target).

FIG. 5.

Ei and Te from PIC simulation (black) and the corresponding reproduction using the surrogate (red). Subplots (a) & (b) correspond to I0=1.1×1018W/cm2, τ=40.3fs,n0=100nc, D=5μm, and Lg=0.5μm, (c) and (d) to I0=1.1×1019W/cm2, τ=153fs,n0=97.4nc, D=20.7μm, and Lg=1.4μm, and (e) and (f) to I0=5.1×1020W/cm2, τ=246fs,n0=89.9nc, D=17.4μm, and Lg=3.3μm. The corresponding final MSE values for each simulation, with respect to the normalized quantities, are (a) 6.5×105, (b) 1.7×107, (c) 2×104, (d) 3.1×106, (e) 7.9×104, and (d) 4.3×105.

FIG. 5.

Ei and Te from PIC simulation (black) and the corresponding reproduction using the surrogate (red). Subplots (a) & (b) correspond to I0=1.1×1018W/cm2, τ=40.3fs,n0=100nc, D=5μm, and Lg=0.5μm, (c) and (d) to I0=1.1×1019W/cm2, τ=153fs,n0=97.4nc, D=20.7μm, and Lg=1.4μm, and (e) and (f) to I0=5.1×1020W/cm2, τ=246fs,n0=89.9nc, D=17.4μm, and Lg=3.3μm. The corresponding final MSE values for each simulation, with respect to the normalized quantities, are (a) 6.5×105, (b) 1.7×107, (c) 2×104, (d) 3.1×106, (e) 7.9×104, and (d) 4.3×105.

Close modal

To verify the generalizability of the surrogate, we tested it on untrained data not included in the validation set, as seen in Fig. 6. In Fig. 6(a) we test the surrogate on a curve with input parameters I0=3.26×1019W/cm2, τ=94fs, Lg=2.3μm, n0=86.8nc, and D=5μm, taken from ensemble E1. These datapoints are from a simulation originally sequestered into a test set that was neither part of the validation or training set. The surrogate model can extrapolate to good measure with respect to time past the training limit of 5 ps, where it starts to diverge from an expected asymptotic value after 7 ps. In Fig. 6(b), we consider another simulation curve that is also outside the parameter space with respect to pulse duration, i.e., τ=700fs, while the NN was only trained up to τ= 500 fs. The remaining parameters were I0=3×1019W/cm2, n0=100nc, D=5μm, and Lg=0.5μm, a simulation not included in any ensemble for training. The surrogate model was able to generate the expected result with relatively good measure and was able to extrapolate past the final simulation time considered in the training data (t=5ps). Even without further improvements, the NN is able to perform limited extrapolation beyond the trained data within reason.

FIG. 6.

(a) Example of surrogate ensemble, SE, matching untrained data from the test subset, extrapolating the Ei curve in time past the trained data but where there is support from the ensemble (t=35ps) and beyond the upper limit of the original parameter set (t>5ps). (b) Extrapolation with the NN with respect to t as well as τ=700 fs, where the maximum of the dataset was τ=500 fs. The curve in (a) & (b) is from a weighted average using equation 2 and the fill-between is the model variance.

FIG. 6.

(a) Example of surrogate ensemble, SE, matching untrained data from the test subset, extrapolating the Ei curve in time past the trained data but where there is support from the ensemble (t=35ps) and beyond the upper limit of the original parameter set (t>5ps). (b) Extrapolation with the NN with respect to t as well as τ=700 fs, where the maximum of the dataset was τ=500 fs. The curve in (a) & (b) is from a weighted average using equation 2 and the fill-between is the model variance.

Close modal

NNs will inherently give a slightly different result after every training instance given that the weights are randomly initialized. This is exacerbated for small and sparse datasets. To compensate for this variation, we take the weighted average of an ensemble of ten surrogates trained with different weights. Each of the individual surrogates is weighted by the average inverse loss of the final 100 epochs, out of a total of 500–700 epochs, of its training history and normalized by the sum of the loss weights of all surrogates in the ensemble, i.e.,

SEΘ=[ΣinwiSiΘ]/[Σinwi],
(2)

where SE, the surrogate ensemble, is the weighted average of surrogates Si, Θ is the set of input parameters, wi=Li1 is the weight, Li is the average loss of the final 100 epochs of Si, and n=10 is the number of trained networks in consideration. A standard approach for other approaches to ML such as Random Forests, which take the average prediction of an ensemble of Decision Trees,34 NNs can be composed into an ensemble in a similar fashion.54 

The red lines in Figs. 6(a) and 6(b) is result of this weighted average and the red filled region indicates the standard deviation σ=ni=1nwiSiS¯2/n1i=1nwi for a weighted output of this ensemble of surrogates, where S¯ is the mean of Si. This ensemble approach gives us an idea of the variance in model predictions, as the surrogate suffers greater variance in Fig. 6(b) where we are extrapolating with respect to input parameter τ, where τ=700 fs but the NN was trained only up to τ=500 fs, which is to be expected, as opposed to Fig. 6(a). It is important to note that the above σ is not a metric of uncertainty in the dataset but rather just the variability in the results of the surrogate models. In addition, some caution should be taken when using ensembles of NN surrogates, as a weighted average might prioritize a surrogate that performs exceptionally well in an isolated region of parameter space while penalize another surrogate that generalizes more successfully overall.

An example correlation matrix for the normalized data can be found in Fig. 7(a), where ion energy Ei and electron temperature Te are most strongly correlated with the initial laser intensity and diminishingly less so with pulse duration, scale length, target density, and target thickness. In this case, we selected the peak values for Ei and Te for each simulation and found the correlation of the inputs with those values. The expectation that Te and Ei most strongly correlate with I0 is not surprising. Notable here is the lower dependence of Te on pulse duration τ as opposed to Ei, which along with I0 is related to the total laser energy and is expected to contribute to enhancing the sheath field. Likewise, the gradient length scale has a greater influence than pulse duration on ion energy, a phenomenon we will touch upon in this work.

FIG. 7.

(a) Correlation matrix for the input parameters I0,τ,Lg,n0,D and the output parameters Ei and Te. Parameter scan with baseline inputs of t=3ps,I0=1019W/cm2,τ=100fs, n0=100nc, Lg=0.5μm, and D=5μm, varying one parameter at a time. Subplot (b) is dependency of Ei on the intensity I0, (c) is the pulse duration while (d) is the pulse duration but at time t=t0+τ to compensate for the later arrival of the peak, (e) is initial plasma density n0, (f) is the target foil thickness, d, and (g) is the pre-plasma gradient length scale Lg.

FIG. 7.

(a) Correlation matrix for the input parameters I0,τ,Lg,n0,D and the output parameters Ei and Te. Parameter scan with baseline inputs of t=3ps,I0=1019W/cm2,τ=100fs, n0=100nc, Lg=0.5μm, and D=5μm, varying one parameter at a time. Subplot (b) is dependency of Ei on the intensity I0, (c) is the pulse duration while (d) is the pulse duration but at time t=t0+τ to compensate for the later arrival of the peak, (e) is initial plasma density n0, (f) is the target foil thickness, d, and (g) is the pre-plasma gradient length scale Lg.

Close modal

With a trained surrogate, we can now perform parameter scans of our sample space. Given the correlation matrix in Fig. 7(a), where we only show the correlation between the outputs and the inputs, we can expect the surrogate to vary most greatly with I0. With a baseline set of parameters t=3ps,I0=1019W/cm2,τ=100fs, n0=100nc, Lg=0.5μm, and D=5μm, we vary these parameters one at a time within the prescribed limits of our ensemble. An SE model is used as in Fig. 6, with the fill-between denoting the standard deviation of the SE in each case. The results can be seen in Figs. 7(b)–7(g). As expected, maximum ion energy Ei depends most greatly on laser parameters I0 as seen in Fig. 7(b). The plateau with respect to τ seen in Fig. 7(c) may be due to the fact that we are sampling the ion energy at the same time for the same intensities but the peak intensity of the longer laser pulse duration arrives later in time. To compensate for this, we add a factor of τ to our sampling time t=3ps, the results for which can be seen in Fig. 7(d), so that the peak intensity arrives at the target at approximately the same time. The plateau is not as shallow, but the energy gain still diminishes significantly for when τ>150fs, all other parameters remaining the same. For both the intensity and pulse duration increasing values mean more energy in the laser to drive acceleration, given that the other parameter is held constant, making the pulse duration plateau more peculiar. The ion energy has a very weak dependence on target density, as seen in Fig. 7(e), which generally is not a free parameter in experiments anyways. Foil thickness, Fig. 7(f), yields mostly a minor contribution except when the foil is very thin, D<5μm. Notable is the peak in Ei as a function of gradient length scale Lg, Fig. 7(g), which suggests potentially interesting physics in this region of parameter space. For where the NN learned a strong dependency of the output data, e.g., I0, or where the parameter space was heavily sampled for a weak dependency, e.g., D5μm and n0100nc, we see little variability in the NN prediction. Where the parameter space becomes more sparsely populated the variance grows. An exception to this seems to be the gradient length scale, which does not show a noticeable quenching despite being heavily sampled near Lg0.5μm. One possible explanation for this is the more strongly nonlinear dependence of the ion energy on Lg, making it more difficult for the NN to give a more definite prediction, even where data are more plentiful.

Initial investigations of our dataset focus on the two laser parameters, I0 and τ, the product of which is approximately equivalent to the laser energy, i.e., ELI0τr02, where r0 is the laser spot size but is not considered in 1D. For this study, we first considered our data subsets, E1–E5, separately to illustrate some limitations of sparse datasets. First considering only E1 & E2, we create a heat map of Ei varying between ranges I01×1019,5×1020 and τ20,200, with t=3ps,n0=100nc, Lg=0.5μm, and D=5μm, which can be seen in Fig. 8(a). We see the expected result that Ei increases for increasing I0 and τ. Unexpectedly, we can see an unusual feature in the form of streak in the region τ120,180, highlighted with the dashed box, which persisted after several retrainings of the network. This is an example of exploration/exploitation in the domain of NNs, but it proves to be more of a cautionary example. Retraining the NN on E1 + E2 + E3, where E3 is a denser subset of 100 simulations between I00.1×1020,5×1020 and τ75,175, we see that the feature mostly disappears as depicted in Fig. 8(b). Likewise, a different approach to regularization during the training process may have assisted in preventing such structures from arising, although standard approaches such as Ridge and Lasso regression were tested and the feature persisted. While NNs provided a rapid way to explore parameter space, features such as those found in Fig. 8(a) require further investigation to validate whether they are real or not.

FIG. 8.

(a) Parameter scan of Ei over I00.1×1020,5×1020 and τ20,200 for ensemble E1+E2. (b) Parameter scan of Ei over I00.1×1020,5×1020 and τ20,200 for E1+E2+E3 and with constant energy curves for I*τ=100 (red), 65 (blue), and 35 (green), (b) as a function of I0 and (c) of τ.

FIG. 8.

(a) Parameter scan of Ei over I00.1×1020,5×1020 and τ20,200 for ensemble E1+E2. (b) Parameter scan of Ei over I00.1×1020,5×1020 and τ20,200 for E1+E2+E3 and with constant energy curves for I*τ=100 (red), 65 (blue), and 35 (green), (b) as a function of I0 and (c) of τ.

Close modal

Having a sufficient mapping of the parameter space within the region of I01×1019,5×1020 and τ20,200, we can further investigate. We take constant energy curves, by which we mean Iτ=const.arb.units, as visualized in Fig. 8(b), where I=I0×1019[fs1]. We have varied the effective energy as Iτ=100, 65, and 35, shown also as a function of τ in Fig. 8(c). From these curves, we can conclude that at least in the sub-ps regime that shorter, more intense laser pulses are more effective at accelerating ions than longer, less intense pulses within the parameter space encompassed by this study. A remnant of the feature highlighted in Fig. 8(a) seems to remain and can be seen in the bump near τ=125 fs and τ=175 fs in Fig. 8(c) for Iτ=65 (blue) and Iτ=100 (red). However, this is likely still just fictitious in nature.

An interesting feature that arose during our exploration of parameter space using NNs is the peak in Ei visible in Fig. 7(g) near Lg0.3μm. While not as strong of a tuning parameter as intensity or pulse duration, it does appear to span over 10 MeV in ion energy and thus is not a trivial contribution. Likewise, it suggests potentially unique coupling between the laser and the plasma in this regime.

To explore this feature, we performed two focused, parameter scans, E4 and E5, varying parameters in a systematic fashion as opposed to randomly selecting parameters and in particular, sampling Lg in the region of interest. These results are portrayed in Fig. 9. In Fig. 9(a), we have an ensemble of NNs trained on E1 and E2 and the results of parameter scans E4 and E5 plotted with dots for different pulse durations. Plotted is the maximum ion energy Ei at t=3ps with D=5μm and n0=100nc. The fill-between corresponds to the standard deviation in the predicted results of the SE in each case about the mean value as a function of Lg.

FIG. 9.

Parameter scan of Ei as a function of Lg and τ. (a) NN trained on E1 and E2. (b) trained on E1–E5, and (c) trained on E1 and E2 but retrained via transfer learning on E3–E5.

FIG. 9.

Parameter scan of Ei as a function of Lg and τ. (a) NN trained on E1 and E2. (b) trained on E1–E5, and (c) trained on E1 and E2 but retrained via transfer learning on E3–E5.

Close modal

The scale length Lg is fixed to 0.5μm in E1 but varied in E2, potentially giving a heavy bias to values near 0.5μm. The basic ensemble of E1 and E2 predicts a peak in Ei near Lg0.2μm and then an increasing value after a minimum at Lg12μm. However, there is significant variation in the surrogate model with respect to Ei throughout but particularly for Lg>2μm, likely due to early timeouts in individual simulations, since larger Lg correspond to more macroparticles in the PIC simulation. Likewise, this limited training set gives a surrogate whose results can be off by approximately a factor of 2 near Lg=1μm, which is where the most interesting results may be found.

To test the capabilities of this application of NNs, we also included the parameters scans E4 and E5 as well as the dataset E3 into the training set of the networks. The results are depicted in Fig. 9(b). The standard deviation is much smaller for the SE trained on E1-E5 but while the SE predicts values more closely to the points it was trained on, there is still some discrepancy as well as likely overfitting. In both cases of Figs. 9(a) and 9(b), we observe a minimum in Ei for τ50fs and then peaks in Ei for all three τ values near zero.

A third surrogate ensemble was trained and is shown in Fig. 9(c). In this case, we took the already trained model used in Fig. 9(a), only trained on E1+E2, we froze the weights of all the layers except for the last two and then retrained the model on the complete dataset E1-E5 but with a larger noise factor of 20% as well as stronger regularization settings of 20% dropout and L1=L2=1.5×108. This technique is known as transfer learning and is typically used when you have a small dataset which is typically too small to effectively train a NN on its own, such as E4 and E5. In these cases, one can take a model trained on a qualitatively comparable dataset that is much larger, such as E1 and E2, where it has learned a good trend with respect to the data. By freezing the earlier layers of the NN, we can preserve the trends learned on the larger dataset but retune the results to a new dataset, as seen in Fig. 9(c). Here the peaks and minimum seen in Fig. 9(a) are preserved and brought much closer to the simulation results but without the overfitting seen in Fig. 9(b). This instance of transfer learning is discussed in further detail in  Appendix C.

The generated dataset is limited in several respects, notably the particle-per-cell count (ppc=2001000), the simulation box size (Lsim200μm), the simulation run time (tmax=5ps), the fact that we are modeling a single species target per the self-similar model and not a target with contaminant layer, and most importantly that simulations are purely one dimensional. For future studies in the vein of this paper, we will need to generate more realistic data and find ways to bridge the gap between our simulation ensembles and experimental data. However, there are experiments where the physical parameters in consideration may tolerate a 1D approximation with which we compared the results of our NN surrogate.

Considering the results from several publications, we found 7 experimental datapoints for which the laser parameters I07×1018,6.1×1020W/cm2, τ=320 fs, and D20,30μm all fell approximately within our sampled parameter space,52,59 as detailed in  Appendix D. Likewise, for all these experiments λ=1.053μm. The laser spot size was approximately 5–10μm at focus for all these samples but information on n0 and Lg was not provided. A standard expectation is that 1D simulations like those in the current ensemble will overestimate the hot electron temperature and subsequently the ion energy. This is due to the cooling effect as a 2D/3D hot electron cloud disperses while a 1D cloud cannot. The most important discrepancy between the simulation ensemble and the experimental dataset is that we modeled deuterons while the experiments studied protons. However, for the input parameters in consideration the maximum ion energy differed approximately ∼25% for predicted deuteron vs proton energies due to the dependence of ωpi on ion mass mi in the normalized acceleration time tpωpitacc per the Fuchs model. This scaling difference was relatively consistent over the parameters sampled and so was not seen as an irreconcilable problem for the purposes of comparison, albeit the fact that there will be a systematic bias in the inferred values. As a crude demonstration of a potential application of NNs, and not necessarily the predictive power of the NN in their current architecture or training data, we attempt to fit our NN to the data.

The results of this approximate fit can be seen in Fig. 10. In this figure, all the experimental points (X's) are approximated by the SE (red), feeding it the experimental parameters for I0,τ, and D, setting n0=100nc, extracting the value for Ei at t4.5ps and fitting Lg to all the points via the method of least squares. The model fit for each experimental datapoint is shown via the red scatterplot while a corresponding plot for the same parameters is shown with hollow red circles where Lg=0. The green fill-between curves represent the standard deviation of the SE for gradient length scale values Lg=0.0,0.1, and 0.175μm. The numerical values inferred for Lg are plotted in Fig. 10(b).

FIG. 10.

(a) Experimental datapoints (black X's), SE approximation to experiments by fitting Lg (red dots), SE approximation but with Lg=0 (red circles), standard deviation fill-between of SE model as a function of I0 and Lg=0,0.10, and 0.175μm (green fill-between). (b) Lg fit values corresponding SE approximations (red dots).

FIG. 10.

(a) Experimental datapoints (black X's), SE approximation to experiments by fitting Lg (red dots), SE approximation but with Lg=0 (red circles), standard deviation fill-between of SE model as a function of I0 and Lg=0,0.10, and 0.175μm (green fill-between). (b) Lg fit values corresponding SE approximations (red dots).

Close modal

One conclusion that can be made is that the pre-plasma is an important quantity in the interaction between the laser and the target. It would behoove the community in general to carefully measure the laser prepulse and amplified spontaneous emission during experiments, which can then be used to model the pre-plasma. In Fuchs et al.,52 a linear pre-plasma of 3 μ m was estimated in their simulations as opposed to the exponential pre-plasma used in our studies. Future studies will be focusing on this issue, i.e., varying the pre-plasma profile.

It is important to emphasize that this result is mostly just a demonstration of the potential capability of a simulation trained NN as opposed to an actual reproduction of the experiments. The simulations used simply cannot exactly correlate with the physics present in an actual experiment. Likewise, simulation considerations like the plasma density n0 and Lg are not free parameters in an experiment. However, an NN used in such a manner can be used to inversely infer physical parameters that can be difficult to measured directly, e.g., the pre-plasma gradient length scale. A systematic parameter scan via simulation is expensive, slow, and is not guaranteed to provide a satisfactory explanation of the experiment. An NN-based ensemble approach benefits from cumulative experience as new simulations inform trends learned from old ones and is fast to execute once it has been trained, allowing for almost instant comparison to experiment. A NN trained on cheap but realistic simulation data could be used as a synthetic diagnostic to extract information from such an experiment. Modern ML techniques will be an important factor in the new scientific discoveries for the foreseeable future and will allow us to synthesize experimental, theoretical, and computational results into a single model that has not been consistently possible until now.

In this paper, we have demonstrated how NNs can be used to explore and analyze data generated from an ensemble of simulations. The PIC simulations themselves were the most computationally expensive component of the study, but much of the work entailed preparing and processing the data for training the NN as well as finding the ideal NN architecture. In addition to modeling Ei and Te via surrogate models rather than running new PIC simulations, we were able to leverage qualities of NNs in several ways. As a continuous mapping of the parameter space, we were able to use the surrogate to map out regions of interest. In some cases, discovered features proved to be fictitious under greater scrutiny but in other cases they persisted. In addition, the NN surrogate can be used as a synthetic diagnostic to extract physical information from experimental data that can be modeled in a simulation but not directly observed.

The network presented in this paper is also relatively simple with respect to the state-of-the-art in modern ML, meaning there are many possibilities for even more robust modeling. Future efforts will also make work of recurrent and convolutional techniques standard in modern deep learning. In the future, advanced tools such as MERLIN will be employed to launch and coordinate large ensembles of simulations to investigate regions of phase space in a recursively informed manner.55 The benefit of a data-centric approach is also cumulative, in that old data need not be discarded upon the completion of a research topic but can be used as the foundation for subsequent investigations. If the data are properly processed for the problem at hand, ever larger data pools will only improve the predictive capabilities of a neural network.

From a physics perspective, the interesting dependency on the pre-plasma gradient length scale also merits further investigation, particularly in higher dimensions. Subsequent steps in this research will be to apply such studies to more complex pulse shapes, incorporation of higher-dimensional simulations for transfer learning, and the exploration of more extreme parameter ranges (e.g., higher intensities, thinner targets) in which more complex acceleration mechanisms arise, such as radiation-induced transparency,56 breakout afterburner,57 and radiation pressure acceleration.58 

This work was funded under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract No. DE-AC5207NA27344 with funding support from Laboratory Directed Research and Development under tracking code 20-ERD-048, and DOE Office of Science SCW1772 and Early Career Research Program under SCW1651. Computational work was performed with Livermore Computing and using LLNL Grand Challenge allocations. We would like to thank Kelli Humbird, Denise Hinkel, and Zoran Djordjeviá for useful discussions. This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Before simulation data were submitted to the network for training, it is processed and reduced to the scalars of interest, such as the maximum ion energy and hot electron temperature. Examples of data before processing can be seen in Fig. 11. In Fig. 11(a), we present the maximum ion momentum curve for a sample simulation. What is clearly noticeable is that at t=2.5ps the logarithmic curve precipitously falls off. At that point, the fastest ions are leaving the simulation box and are no longer accounted for. To compensate for this, we simply truncate every output dataset (Ei and Te) at the time at which this highest momentum value is reached. This reduces the dataset from 100,694 datapoints to 78,321 for E1 + E2. The total E1–E5 dataset was reduced to 121,530 datapoints, which we determine as #timesteps×#simulations.

FIG. 11.

(a) Example ion momentum curve pmax, which corresponds to Ei, where the fast ions have left the simulation box (red circle) and subsequent falloff data are removed from the dataset (red X). The data after that point are removed. (b) Example electron energy spectrum data and an unsuccessful fit for Te.

FIG. 11.

(a) Example ion momentum curve pmax, which corresponds to Ei, where the fast ions have left the simulation box (red circle) and subsequent falloff data are removed from the dataset (red X). The data after that point are removed. (b) Example electron energy spectrum data and an unsuccessful fit for Te.

Close modal

In Fig. 11(b), we present the electron energy spectrum within the target at an arbitrary time, where we extract the electron temperature (red) from the energy spectrum (blue). This is an example of where the curve-fitting algorithm, Levenberg–Marquardt least squares, failed to properly fit the spectrum. To perform a curve fit for this spectrum is easy by hand but more challenging when automated for more than 100,000 datapoints. We only sampled the tail end of the spectrum to extract the hot electron temperature. Failed fits, where the extracted temperature is an unphysical value, were set to the initial electron temperature of Te,i=100eV. This was justified for failed fits prior to the arrival of the laser pulse, when the plasma is still cold, and was infrequent enough during the laser–target interaction time that the network would just view such instances effectively as noise.

Both the ion energy and electron temperature data are cubically interpolated with respect to time to increase the size of the dataset. This procedure is effectively a rudimentary form of data augmentation. For both datasets, we used an interpolation factor of 20-fold. This increased the dataset size from 121,530 datapoints to 2,425,169. An example of this procedure can be seen in Fig. 12. Here, a sample electron temperature curve Te has been augmented in time. In addition, in order that the NN not learn locally monotonic features exaggerated by the time interpolation, we also introduce noise as data*=1+noise×data, where 1 is a unity vector and noise is a vector of Gaussian random noise centered at 0 with a baseline standard deviation specifically set to the data, on the order of 0.1, and multiplied with data. The unity vector was included to avoid introducing spurious negative values and to weight the noise relative to the datapoint in time t, so as not to wash out the lowest values with an absolute noise threshold. The noise applied to the data varied, with the maximum normalized ion momentum given a noise value of 0.5% but its corresponding input parameters noise of 0.1% and the electron temperature 5% and its inputs 1%. For example, to an output ion energies of Ei=10 MeV and Ei=100 MeV we would apply scaled noise factors of ±0.05 and ±0.5 MeV, respectively, but for a corresponding input intensity we could have I0=1±0.001×1019 W/cm2.

FIG. 12.

Example time augmentation of Te data where we have increased the original dataset (black) by 5× and added 20% noise for exaggerated effect (red).

FIG. 12.

Example time augmentation of Te data where we have increased the original dataset (black) by 5× and added 20% noise for exaggerated effect (red).

Close modal

The final step in preparing the data for submission to the neural network is to organize it, as depicted in Fig. 13. In Fig. 13(a) is depicted all the post-processed ion energy curves stacked one after the other for the E1 and E2 simulations and then in Fig. 13(b) are depicted the first few stacked curves in that compilation. Parallel to the stacked output data, here ion energy but an analogous stack is made for the electron temperature, the corresponding times for each datapoint are stacked into an array, Fig. 13(c), as well as the corresponding input values, e.g., intensity and pulse duration in Figs. 13(d) and 13(e). The same is done for the rest of the input parameters not shown. The data are organized accordingly so that when the network sweeps through each array from left to right during training that for every output value it will associate the corresponding time and input parameters. Since there are many different inputs of various scales, all the parameters are also normalized. In this case they are simply normalized to [0,1], although at times it was found that the logarithmically distributed data, e.g., intensity and gradient length scale, were better trained on when first initialized as log10(x̃+1) and then normalized to [0,1], where x̃=xx¯ and x¯ is the mean value of the input parameter. For example, we linearly rescaled the pulse duration of the combine ensemble E1–E5 from [20,500] to [0,1] by subtracting the mean and dividing by the difference between the maximum and minimum value. For the intensity, i.e., [1017,1021], we instead converted the corresponding values to17,21 by taking the base-ten logarithm, which we then normalized linearly to [0,1].

FIG. 13.

A depiction of how data are organized prior to training, where the unnormalized data are presented for clarity. In (a), we have Ei values stacked in an array for E1 and E2. In (b), we focus in on the first few example Ei curves to show what the NN sees. In (c)–(e), we present several of the input parameters given to the NN, specifically t, I0, and τ.

FIG. 13.

A depiction of how data are organized prior to training, where the unnormalized data are presented for clarity. In (a), we have Ei values stacked in an array for E1 and E2. In (b), we focus in on the first few example Ei curves to show what the NN sees. In (c)–(e), we present several of the input parameters given to the NN, specifically t, I0, and τ.

Close modal

While studying the electron temperature curves, we were also able to elucidate another capability of the surrogate model. A trained NN can produce the expected generalization of an output given the inputs without directly reproducing the trained data if they diverge from that trend, otherwise one might find overfitting. During initial training effort using data from E1 and E2, an R2 metric was used to identify simulations which diverged significantly from the learned trend obtained with the NN model. Here R2=1iyifi2/iyiy¯2, where yi is the training datapoint, fi is the value predicted by the surrogate, and y¯ is the mean of the observed data.

This can be seen in Fig. 14 where we provide a histogram highlighting simulations for which R2<0.5 and were predominantly at lower intensities ( 1018 W/cm2). Upon closer inspection of the simulations for which R2 was noticeably less than unity, we noticed that multiple simulations in this intensity range demonstrated unphysical numerical heating of the electrons, as evidenced in Fig. 15(a). In this figure, we see the expected spike in electron temperature due to acceleration by the short laser pulse (40 fs). However, the electron temperature continues to rise as a function of time after the laser energy has been deposited, indicating a numerical issue as the laser is no longer contributing at this point. By increasing the ppc count we were able to reduce numerical heating in most of these instances. At higher intensities, there is greater target heating due to the laser and thereby a larger Debye length, resulting in a more stable simulation. At lower intensities, numerical heating can compete with the laser-driven energy deposition, giving a non-negligible effect. This is not fully understood at this point and merits further investigation in the generation of future ensembles. There were approximately 127 simulations for which R2<0.5 out of the 800 simulations in E1+E2. To correct for this, an additional, revised ensemble, ER, of 200 simulations was made for those which originally had low R2 values and were replaced with another set of simulations with identical parameters except now with ppc = 1000 and Ls=200μm. For consistency, we also reran a subsequent 73 simulations with R2>0.5 but less than unity in order to ensure that the modification of ppc=1000 was more equitably distributed throughout the parameter space and did not introduce a discontinuity in simulation phase space. The problematic simulations in E1 and E2 were subsequently replaced by the revised simulations from ER. The altered data, in form E1* and E2*, was kept separate from the original data E1 and E2.

FIG. 14.

Simulations in E1+E2 for which R2<0.9 (blue) and R2<0.5 (orange).

FIG. 14.

Simulations in E1+E2 for which R2<0.9 (blue) and R2<0.5 (orange).

Close modal
FIG. 15.

(a) An example temperature curve from E1 where the NN (red) greatly diverges from the simulation (black). (b) An example of a corrected temperature curve from ER.

FIG. 15.

(a) An example temperature curve from E1 where the NN (red) greatly diverges from the simulation (black). (b) An example of a corrected temperature curve from ER.

Close modal

General temperature curves can be seen in Fig. 15 where Fig. 15(a) shows a simulation for which the extracted temperature diverges significantly from the expected result per the surrogate. While the final surrogate is generally sought for its reproductive and predictive capabilities, exceptions to its predictions can point us either to unexpected physics or erroneous data, and in this case the second. Here, the particle-per-cell count is too low (ppc=200) to properly resolve the interaction between the laser and the particles and numerical heating dominates. An example of a corrected simulation can be seen in Fig. 15(b), where it can be seen that the model has come in closer agreement with the data.

While this happened to be an unexpected application of the NN, this may not work in all circumstances. Ideally outliers will not be learned by the NN prior to overfitting of the NN, but it is possible that given enough failures that the NN might learn the unphysical result, i.e., the increasing electron temperature in Fig. 15(a). Future applications of NNs to error and outlier analysis could make using of generative adversarial networks which are specifically designed to isolate and identify outlying data.

The neural networks for examples using just E1 and E2 were trained up to 500 epochs, after which the NN would begin to overfit Ei and then suffer significant training noise for Te. The learning rate for this case was lr=0.001, the default for TensorFlow. For the more robust examples involving E1-E5, we retrained the NN for an additional 200 epochs with a smaller learning rate of lr=0.0001. The improvement was not very significant, going from an MSE loss of about 5×106 to a loss of about 3×106 with retraining. For the instance of transfer learning shown in Fig. 9(c) we retrained the network and additional 200 epochs. The training histories are shown in Fig. 16 for both Ei and Te.

FIG. 16.

Example training histories for Ei and Te. The neural networks for examples using just E1 and E2 were trained up to 500 epochs (a) and (c), after which the NN began to overfit Ei and then suffer significant training noise for Te. For the more robust examples involving E1-E5, we retrained the NN for an additional 200 epochs with a smaller lr value (b) and (d).

FIG. 16.

Example training histories for Ei and Te. The neural networks for examples using just E1 and E2 were trained up to 500 epochs (a) and (c), after which the NN began to overfit Ei and then suffer significant training noise for Te. For the more robust examples involving E1-E5, we retrained the NN for an additional 200 epochs with a smaller lr value (b) and (d).

Close modal

A calibration plot for the augmented training data and the predictions of the NN are shown in Fig. 17. In blue is presented the Ei data and in red the Te data. We see good correlation between the expected values and the predicted values except for very small energies (note the logarithmic scaling). There are several implications for why this may be. Certainly, many of the points correspond to Ei and Te values prior to the arrival of the laser at the target. Likewise, there may still be some simulations at low intensities for which numerical heating is still present to some degree. This failure at small values could also be a consequence of the regularization process we used to minimize overfitting, since this puts an absolute minimum restraint on the loss function used. The points circled in Fig. 17(b) for the Te calibration plot is actually just identification of the data where distribution fits were poor and the temperatures were arbitrary set to Te,i=100eV. This is not a concern for our studies, although higher particle numbers and spectral binning could be used in simulations to produce better electron temperature fits. However, this will come at a significant increase in simulation cost and data processing time.

FIG. 17.

Calibration plot between the simulation and NN-predicted values for (a) Ei and (b) Te. Failed temperature values set to initial Te,i=100eV circled in (b).

FIG. 17.

Calibration plot between the simulation and NN-predicted values for (a) Ei and (b) Te. Failed temperature values set to initial Te,i=100eV circled in (b).

Close modal

For fitting the surrogate model to experimental data, we used the following data provided in Table II. The data were completely captured in our parameter space for I0, τ, and D. We expect the TNSA mechanism to work as expected in this regime and so expect the surrogate to already understand the region in question. The surrogate derived values for Lg are given as well, assuming n0=100nc. Lg was derived by iteratively scanning Lg until the surrogate derived Ei value closely matched that from the experiment.

TABLE II.

Compilation of experimental datapoints used in Fig. 6 and the inferred Lg values.

No.Ei(MeV)I0(W/cm2)τ (fs)D(μm)r0 (μm)Lg(μm)
3 [Ref. 527×1018 320 25 8.25 0.0 
9.1 [Ref. 521.9×1019 320 25 8.25 0.078 
10.9 [Ref. 522.66×1019 320 25 8.25 0.084 
12.4 [Ref. 523×1019 320 25 8.25 0.107 
20 [Ref. 594×1019 320 20 0.145 
13 [Ref. 524×1019 320 30 0.136 
20 [Ref. 526×1019 320 25 8.25 0.163 
No.Ei(MeV)I0(W/cm2)τ (fs)D(μm)r0 (μm)Lg(μm)
3 [Ref. 527×1018 320 25 8.25 0.0 
9.1 [Ref. 521.9×1019 320 25 8.25 0.078 
10.9 [Ref. 522.66×1019 320 25 8.25 0.084 
12.4 [Ref. 523×1019 320 25 8.25 0.107 
20 [Ref. 594×1019 320 20 0.145 
13 [Ref. 524×1019 320 30 0.136 
20 [Ref. 526×1019 320 25 8.25 0.163 
1.
R. A.
Snavely
,
M. H.
Key
,
S. P.
Hatchett
,
T. E.
Cowan
,
M.
Roth
,
T. W.
Phillips
,
M. A.
Stoyer
,
E. A.
Henry
,
T. C.
Sangster
,
M. S.
Singh
 et al, “
Intense high-energy proton beams from petawatt-laser irradiation of solids
,”
Phys. Rev. Lett.
85
,
2945
(
2000
).
2.
E.
Esarey
,
C. B.
Schroeder
, and
W. P.
Leemans
, “
Physics of laser-driven plasma-based electron accelerators
,”
Rev. Mod. Phys.
81
,
1229
(
2009
).
3.
T.
Tajima
and
J. M.
Dawson
, “
Laser electron accelerator
,”
Phys. Rev. Lett.
43
,
267
(
1979
).
4.
R. M.
Wilson
, “
Half of Nobel prize in physics honors the inventors of chirped pulse amplification
,”
Phys. Today
71
(
12
),
18
21
(
2018
).
5.
J.
Denavit
, “
Collisionless plasma expansion into a vacuum
,”
Phys. Fluids
22
,
1384
(
1979
).
6.
S. J.
Gitomer
,
R. D.
Jones
,
F.
Begay
,
A. W.
Ehler
,
J. F.
Kephart
, and
R.
Kristal
, “
Fast ions and hot electrons in the laser–plasma interaction
,”
Phys. Fluids
29
,
2679
(
1986
).
7.
H.
Daido
,
M.
Nishiuchi
, and
A. S.
Pirozhkov
, “
Review of laser-driven ion sources and their applications
,”
Rep. Prog. Phys.
75
,
056401
(
2012
).
8.
S. V.
Bulanov
,
J. J.
Wilkens
,
T. Zh.
Esirkepov
,
G.
Korn
,
G.
Kraft
,
S. D.
Kraft
,
M.
Molls
, and
V. S.
Khoroshkov
, “
Laser ion acceleration for hadron therapy
,”
Phys. Usp.
57
,
1149
1179
(
2014
).
9.
S. C.
Wilks
,
A. B.
Langdon
,
T. E.
Cowan
,
M.
Roth
,
M.
Singh
,
S.
Hatchett
,
M. H.
Key
,
D.
Pennington
,
A.
MacKinnon
, and
R. A.
Snavely
, “
Energetic proton generation in ultra-intense laser–solid interactions
,”
Phys. Plasmas
8
,
542
(
2001
).
10.
M.
Borghesi
,
J.
Fuchs
,
S. V.
Bulanov
,
A. J.
MacKinnon
,
P. K.
Patel
, and
M.
Roth
, “
Fast ion generation by high-intensity laser irradiation of solid targets and applications
,”
Fusion Sci. Technol.
49
,
412
(
2006
).
11.
L.
Robson
,
P. T.
Simpson
,
R. J.
Clarke
,
K. W. D.
Ledingham
,
F.
Lindau
,
O.
Lundh
,
T.
McCanny
,
P.
Mora
,
D.
Neely
,
C.-G.
Wahlström
 et al, “
Scaling of proton acceleration driven by petawatt-laser–plasma interactions
,”
Nat. Phys
3
,
58
(
2007
).
12.
J.
Fuchs
,
T. E.
Cowan
,
P.
Audebert
,
H.
Ruhl
,
L.
Gremillet
,
A.
Kemp
,
M.
Allen
,
A.
Blazevic
,
J.-C.
Gauthier
,
M.
Geissel
 et al, “
Spatial uniformity of laser-accelerated ultrahigh-current MeV electron propagation in metals and insulators
,”
Phys. Rev. Lett.
91
,
255002
(
2003
).
13.
R. A.
Snavely
,
B.
Zhang
,
K.
Akli
,
Z.
Chen
,
R. R.
Freeman
,
P.
Gu
,
S. P.
Hatchett
,
D.
Hey
,
J.
Hill
,
M. H.
Key
 et al, “
Laser generated proton beam focusing and high temperature isochoric heating of solid matter
,”
Phys. Plasmas
14
,
092703
(
2007
).
14.
P. K.
Patel
,
A. J.
MacKinnon
,
M. H.
Key
,
T. E.
Cowan
,
M. E.
Foord
,
M.
Allen
,
D. F.
Price
,
H.
Ruhl
,
P. T.
Springer
, and
R.
Stephens
, “
Isochoric heating of solid-density matter with an ultrafast proton beam
,”
Phys. Rev. Lett.
91
,
125004
(
2003
).
15.
C.
Barty
,
M.
Key
,
J.
Britten
,
R.
Beach
,
G.
Beer
,
C.
Brown
,
S.
Bryan
,
J.
Caird
,
T.
Carlson
,
J.
Crane
 et al, “
An overview of LLNL high-energy short-pulse technology for advanced radiography of laser fusion experiments
,”
Nucl. Fusion
44
,
S266
(
2004
).
16.
A. J.
Mackinnon
,
P. K.
Patel
,
M.
Borghesi
,
R. C.
Clarke
,
R. R.
Freeman
,
H.
Habara
,
S. P.
Hatchett
,
D.
Hey
,
D. G.
Hicks
,
S.
Kar
 et al, “
Proton radiography of a laser-driven implosion
,”
Phys. Rev. Lett.
97
,
045001
(
2006
).
17.
G.
Sarri
,
M.
Borghesi
,
C. A.
Cecchetti
,
L.
Romagnani
,
R.
Jung
,
O.
Willi
,
D. J.
Hoarty
,
R. M.
Stevenson
,
C. R. D.
Brown
,
S. F.
James
 et al, “
Application of proton radiography in experiments of relevance to inertial confinement fusion
,”
Eur. Phys. J. D.
55
,
299
(
2009
).
18.
C. K.
Li
,
F. H.
Seguin
,
J. A.
Frenje
,
J. R.
Rygg
,
R. D.
Petrasso
,
R. P.
Town
,
P. A.
Amendt
,
S. P.
Hatchett
,
O. L.
Landen
,
A. J.
Mackinnon
 et al, “
Measuring E and B fields in laser-produced plasmas with monoenergetic proton radiography
,”
Phys. Rev. Lett.
97
,
135003
(
2006
).
19.
C. K.
Li
,
F. H.
Séguin
,
J. A.
Frenje
,
R. D.
Petrasso
,
P. A.
Amendt
,
R. P. J.
Town
,
O. L.
Landen
,
J. R.
Rygg
,
R.
Betti
,
J. P.
Knauer
 et al, “
Observations of electromagnetic fields and plasma flow in hohlraums with proton radiography
,”
Phys. Rev. Lett.
102
,
205001
(
2009
).
20.
P.-E.
Masson-Laborde
,
S.
Laffite
,
C. K.
Li
,
S. C.
Wilks
,
R.
Riquier
,
R. D.
Petrasso
,
G.
Kluth
, and
V.
Tassin
, “
Interpretation of proton radiography experiments of hohlraums with three-dimensional simulations
,”
Phys. Rev. E
99
,
053207
(
2019
).
21.
S.
Steinke
,
J.
Van Tilborg
,
C.
Benedetti
,
C. G. R.
Geddes
,
C. B.
Schroeder
,
J.
Daniels
,
K. K.
Swanson
,
A. J.
Gonsalves
,
K.
Nakamura
,
N. H.
Matlis
 et al, “
Multistage coupling of independent laser-plasma accelerators
,”
Nat. Lett.
530
,
190
(
2016
).
22.
W. P.
Leemans
,
A.  J.
Gonsalves
,
H.-S.
Mao
,
K.
Nakamura
,
C.
Benedetti
,
C.  B.
Schroeder
,
C.
Tóth
,
J.
Daniels
,
D.  E.
Mittelberger
,
S.  S.
Bulanov
 et al, “
Multi-GeV electron beams from capillary-discharge-guided subpetawatt laser pulses in the self-trapping regime
,”
Phys. Rev. Lett.
113
,
245002
(
2014
).
23.
U.
Masood
,
M.
Bussmann
,
T. E.
Cowan
,
W.
Enghardt
,
L.
Karsch
,
F.
Kroll
,
U.
Schramm
, and
J.
Pawelke
, “
A compact solution for ion beam therapy with laser accelerated protons
,”
Appl. Phys. B
117
,
41
(
2014
).
24.
I.
Spencer
,
K. W. D.
Ledingham
,
R. P.
Singhal
,
T.
McCanny
,
P.
McKenna
,
E. L.
Clark
,
K.
Krushelnick
,
M.
Zepf
,
F. N.
Beg
,
M.
Tatarakis
 et al, “
Laser generation of proton beams for the production of short-lived positron emitting radioisotopes
,”
Nucl. Instrum. Methods Phys. Res. B
183
,
449
(
2001
).
25.
T.
Ma
, “
Superfast, superpowerful lasers are about to revolutionize physics
,”
Sci. Am.
https://blogs.scientificamerican.com/observations/superfast-superpowerful-lasers-are-about-to-revolutionize-physics/ (
2020
).
26.
J.
Kim
,
A. J.
Kemp
,
S. C.
Wilks
,
D. H.
Kalantar
,
S.
Kerr
,
D.
Mariscal
,
F. N.
Beg
,
C.
McGuffey
, and
T.
Ma
, “
Computational modeling of proton acceleration with multi-picosecond and high energy, kilojoule, lasers
,”
Phys. Plasmas
25
,
083109
(
2018
).
27.
J.
Lindl
, “
Development of the indirect‐drive approach to inertial confinement fusion and the target physics basis for ignition and gain
,”
Phys. Plasmas
2
,
3933
(
1995
).
28.
C. K.
Birdsall
and
A. B.
Langdon
,
Plasma Physics via Computer Simulation
, 1st ed. (
CRC Press
,
1985
).
29.
T.
Katsouleas
and
J. M.
Dawson
, “
Unlimited electron acceleration in laser-driven plasma waves
,”
Phys. Rev. Lett.
51
,
392
395
(
1983
).
30.
L.
Greengard
and
V.
Rokhlin
, “
A fast algorithm for particle simulations
,”
J. Comp. Phys.
73
,
325
348
(
1987
).
31.
F.
Rosenblatt
, “
The perceptron: A perceiving and recognizing automaton
,”
Cornell Aeronautical Laboratory Techhnical Report No. 85-46-0-1
, New York (
1957
).
32.
J.
Schmidhuber
, “
Deep learning in neural networks: An overview
,”
Neural Networks
61
,
85
117
(
2015
).
33.
Y.
LeCun
,
Y.
Bengio
, and
G.
Hinton
, “
Deep learning
,”
Nat.
521
,
436
444
(
2015
).
34.
A.
Géron
,
Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
, 1st ed. (
O'Reilly Media, Inc
.,
2017
).
35.
G.
Cybenko
, “
Approximation by superpositions of a sigmoidal function
,”
Math. Control Signal Syst.
2
,
303
314
(
1989
).
36.
Z.
Lu
,
H.
Pu
,
F.
Wang
,
Z.
Hu
, and
L.
Wang
, “
The expressive power of neural networks: A view from the width
,” in
Proceedings of the 31st International Conference on Neural Information Processing Systems
(
2017
).
37.
B.
Hanin
and
M.
Nica
, “
Products of many large random matrices and gradients in deep neural networks
,”
Commun. Math. Phys.
376
,
287
322
(
2020
).
38.
G. E.
Hinton
,
S.
Osindero
, and
Y.-W.
Teh
, “
A fast learning algorithm for deep belief nets
,”
Neural Comp.
18
,
7
(
2006
).
39.
A.
Krizhevksy
,
I.
Sutskever
, and
G. E.
Hinton
, “
ImageNet classification with deep convolutional neural networks
,”
Commun. ACM
60
,
6
(
2017
).
40.
R.
Collobert
,
J.
Weston
,
L.
Bottou
,
M.
Karlen
,
K.
Kavukcuoglu
, and
P.
Kuksa
, “
Natural language processing (almost) from scratch
,”
J. Mach. Learn. Res.
12
,
2493
2537
(
2011
).
41.
D.
Guest
,
K.
Crammer
, and
D.
Whiteson
, “
Deep learning and its application to LHC physics
,”
Annu. Rev. Nucl. Part. Sci.
68
,
161
122
(
2018
).
42.
C.
Emma
,
A.
Edelen
,
M.  J.
Hogan
,
B.
O'Shea
,
G.
White
, and
V.
Yakimenko
, “
Machine learning-based longitudinal phase space prediction of particle accelerators
,”
Phys. Rev. Accel. Beams
21
,
112802
(
2018
).
43.
S. C.
Leemann
,
S.
Liu
,
A.
Hexemer
,
M. A.
Marcus
,
C. N.
Melton
,
H.
Nishimura
, and
C.
Sun
, “
Demonstration of machine learning-based model-independent stabilization of source properties in synchrotron light sources
,”
Phys. Rev. Lett.
123
,
194801
(
2019
).
44.
K. D.
Humbird
,
J. L.
Peterson
,
B. K.
Spears
, and
R. G.
McClarren
, “
Transfer learning to model inertial confinement fusion experiments
,”
IEEE Trans. Plasma Sci.
48
(
1
),
61
(
2020
).
45.
V.
Gopalaswamy
,
R.
Betti
,
J. P.
Knauer
,
N.
Luciani
,
D.
Patel
,
K. M.
Woo
,
A.
Bose
,
I. V.
Igumenshchev
,
E. M.
Campbell
, and
K. S.
Anderson
, “
Tripled yield in direct-drive laser fusion through statistical modelling
,”
Nature
565
,
581
(
2019
).
46.
G.
Kluth
,
K. D.
Humbird
,
B. K.
Spears
,
J. L.
Peterson
,
H. A.
Scott
,
M. V.
Patel
,
J.
Koning
,
M.
Marinak
,
L.
Divol
, and
C. V.
Young
, “
Deep learning for NLTE spectral opacities
,”
Phys. Plasmas
27
,
052707
(
2020
).
47.
S. J. D.
Dann
,
C. D.
Baird
,
N.
Bourgeois
,
O.
Chekhlov
,
S.
Eardley
,
C. D.
Gregory
,
J.-N.
Gruse
,
J.
Hah
,
D.
Hazra
,
S. J.
Hawkes
 et al, “
Laser wakefield acceleration with active feedback at 5 Hz
,”
Phys. Rev. Accel. Beams
22
,
041303
(
2019
).
48.
J. R.
Smith
,
C.
Orban
,
J. T.
Morrison
,
K. M.
George
,
G. K.
Ngirmang
,
E. A.
Chowdhury
, and
W. M.
Roquemore
, “
Optimizing laser-plasma interactions for ion acceleration using particle-in-cell simulations and evolutionary algorithms
,”
New J. Phys.
22
,
103067
(
2020
).
49.
A.
Gonoskov
,
E.
Wallin
, and
A.
Polovinkin
, “
Employing machine learning for theory validation and identification of experimental conditions in laser-plasma physics
,”
Sci. Rep.
9
,
7043
(
2019
).
50.
T. D.
Arber
,
K.
Bennett
,
C. S.
Brady
,
A.
Lawrence-Douglas
,
M. G.
Ramsay
,
N. J.
Sircombe
,
P.
Gillies
,
R. G.
Evans
,
H.
Schmitz
,
A. R.
Bell
, and
C. P.
Ridgers
, “
Contemporary particle-in-cell approach to laser-plasma modelling
,”
Plasma Phys. Controlled Fusion
57
,
113001
(
2015
).
51.
P.
Mora
, “
Plasma expansion into a vacuum
,”
Phys. Rev. Lett.
90
,
185002
(
2003
).
52.
J.
Fuchs
,
P.
Antici
,
E.
d'Humières
,
E.
Lefebvre
,
M.
Borghesi
,
E.
Brambrink
,
C. A.
Cecchetti
,
M.
Kaluza
,
V.
Malka
,
M.
Manclossi
 et al, “
Laser-driven proton scaling laws and new paths towards energy
,”
Nat. Phys.
2
,
48
54
(
2006
).
53.
M.
Abadi
,
A.
Agarwal
,
P.
Barham
,
E.
Brevedo
,
Z.
Chen
,
C.
Citro
,
G. S.
Corrado
,
A.
Davis
,
J.
Dean
,
M.
Devin
 et al, “
Tensorflow: Large-scale machine learning on heterogeneous distributed systems
,” arXiv:1603.04467 (
2016
).
54.
Z.-H.
Zhou
,
J.
W
, and
W.
Tang
, “
Ensembling neural networks: Many could be better than all
,”
Artif. Intell.
137
,
239
263
(
2002
).
55.
J. L.
Peterson
,
R.
Anirudh
,
K.
Athey
,
B.
Bay
,
P.-T.
Bremer
,
V.
Castillo
,
F.
Di Natale
,
D.
Fox
,
J. A.
Gaffney
,
D.
Hysom
 et al, “
Merlin: Enabling machine learning-ready HPC nsembles
,” arXiv:1912.02892v1.
56.
R.
Mishra
,
F.
Fiuza
, and
S.
Glenzer
, “
Enhanced ion acceleration in transition from opaque to transparent plasmas
,”
New J. Phys.
20
,
043047
(
2018
).
57.
L.
Yin
,
B. J.
Albright
,
B. M.
Hegelich
, and
J. C.
Fernandez
, “
GeV laser ion acceleration from ultrathin targets: The laser break-out afterburner
,”
Laser Part. Beams
24
,
291
298
(
2006
).
58.
T.-Z.
Esirkepov
,
M.
Borghesi
,
S. V.
Bulanov
,
G.
Mourou
, and
T.
Tajima
, “
Highly efficient relativistic-ion generation in the laser-piston regime
,”
Phys. Rev. Lett.
92
,
175003
(
2004
).
59.
M.
Borghesi
,
D. H.
Campbell
,
A.
Schiavi
,
O.
Willi
,
A. J.
MacKinnon
,
D.
Hicks
,
P.
Patel
,
L. A.
Gizzi
,
M.
Galimberti
, and
R. J.
Clarke
, “
Laser-produced protons and their application as a particle probe
,”
Laser Part. Beams
20
,
269
275
(
2002
).