We present progress in utilizing a machine learning (ML) assisted optimization framework to study the trends in a parameter space defined by spectrally shaped, high-intensity, petawatt-class (8 J, 45 fs) laser pulses interacting with solid targets and give the first simulation-based overview of predicted trends. A neural network (NN) incorporating uncertainty quantification is trained to predict the number of hot electrons generated by the laser–target interaction as a function of pulse shaping parameters. The predictions of this NN serve as the basis function for a Bayesian optimization framework to navigate this space. For post-experimental evaluation, we compare two separate neural network (NN) models. One is based solely on data from experiments, and the other is trained only on ensemble particle-in-cell simulations. Reviewing the predicted and observed trends across the experiment-capable laser parameter search space, we find that both ML models predict a maximal increase in hot electron generation at a level of approximately 12%–18%; however, no statistically significant enhancement was observed in experiments. On direct comparison of the NN models, the average discrepancy is 8.5%, with a maximum of 30%. Since shot-to-shot fluctuations in experiments affect the observations, we evaluate the behavior of our optimization framework by performing virtual experiments that vary the number of repeated observations and the noise levels. Here, we discuss the implications of such a framework for future autonomous exploration platforms in high-repetition-rate experiments.
I. INTRODUCTION
Machine learning (ML) coupled to new high-power, high-repetition-rate (HRR) laser systems promise to transform high energy density (HED) physics experimentation1 and rapid progress is being made in algorithm-guided HRR laser-driven science.2–4 In typical shot-per-hour experiments at laser facilities, scientists have been able to make informed decisions about input parameters (e.g., laser energy, laser pulse duration) in the hour-long interval between experiments. Newer HRR lasers, such as Colorado State University's ALEPH,5 Lawrence Berkeley Laboratory's BELLA-PW,6 APOLLON,7 and the HAPLS at the Extreme Light Infrastructure8 facility, are capable of delivering petawatt-class laser pulses (tens of Joules in tens of femtoseconds) at up to 10 Hz repetition rates. In the realm of HRR, it is no longer possible for scientists to make decisions between shots, and thus genetic algorithms and simplex methods2 or Bayesian optimization3,4 have been successfully demonstrated as means of sequential experimental optimization. These techniques allow evaluation in a given parameter space, often three-dimensional (3D) or greater, with little or no human intervention. It is now possible to search within non-intuitive operating regimes to find optima, and soon it will be possible to validate theory and simulations on-the-fly. The present work is a step in support of the latter goal with the development of a new framework for ML-assisted guidance. Here, we perform HRR laser experiments with solid targets and employ a neural network (NN) as the underlying function for a parallel-evaluation (i.e., batched) Bayesian optimization (BO) framework. Although the framework ultimately utilizes BO, it is distinct from sequential optimization as it allows for the gathering of multiple samples simultaneously, with explicit selection of the balance between exploration and exploitation. In this work, we equally split experimental samples between the two in an attempt to maximize knowledge about the parameter space while simultaneously searching for optima. The underlying NN incorporates a novel uncertainty quantification method aimed at improving sampling decisions and advancing the goal of autonomous research. The use of an NN as the underlying function offers greater flexibility and scalability as algorithm-driven, high-repetition-rate (HRR) scientific platforms grow in dataset sizes and complexity compared to other model types.9
Here, we demonstrate the use of such a framework to guide experimental sample acquisition on a petawatt-class laser facility. High-intensity ( ) laser interactions with solid targets generate megaelectron volt-energy electrons, ions, and kiloelectron volt to megaelectron volt x-rays, which are typically used to produce radiographs of static or dynamic phenomena,10 serve as probes, or act as sources to drive secondary samples to high temperature at high density.11–13 Our scientific goal here is to guide sampling through several rounds of experiments aimed at enhancing megaelectron volt electron yield, specifically through the use of laser pulse shaping. We then compare experimental results with ensemble one-dimensional particle-in-cell (PIC) simulations in order to evaluate the consistency of predicted trends in a three-dimensional parameter space. Using neural network (NN) models for this evaluation, we show that the simulations capture qualitatively similar trends. Finally, as the observed experimental sampling noise is high, we present a set of virtual experiments to study the expected behavior of the guidance framework in the presence of varying degrees of random noise.
II. EXPERIMENTS
Experiments were performed using the ALEPH laser system at Colorado State University, capable of delivering 30 J in 30 fs pulses at a fundamental wavelength of 800 nm, with a maximum repetition rate of 3.3 Hz. The nominal pulse width was measured at 45 fs under “best compression” settings. The system utilizes an f/25 final focusing optic with a beam diameter of 185 mm, resulting in a focal distance of 4.625 m. This configuration achieves a 45 μm FWHM focal spot that contains 70% of the laser energy. Typically, the experiments were operated with 8 ± 1 J of laser energy delivered to the target, though the source of this variation remains unknown. The peak irradiance at best compression was . The long focal length configuration minimizes the criticality of precise target positioning due to the extended Rayleigh range, approximately 2.5 cm, meaning the focal spot size changes gradually along the propagation direction. Despite some jitter in target position with high-repetition-rate targetry, the positioning accuracy is maintained well below 1 mm, ensuring minimal impact on laser focal irradiance and, consequently, on output variability. While factors such as wavefront distortions, laser pre-pulses, and intensity fluctuations can also affect target interactions, these were not characterized during these experiments.
The experiments were typically operated at a repetition rate of 0.2 or 0.5 Hz, allowing sufficient time for target raster and vibration settling between shots. The targets were 2.5 μm thick Al foils, set in a raster array of 10 rows by 27 columns, totaling 270 available targets per “batch” with each having a 2.5 mm opening diameter, as seen in Fig. 1. The primary diagnostic tool was the repetition-rate electron positron proton spectrometer (REPPS),14 a magnetically dispersing charged particle spectrometer. This device is a modified version of the EPPS15 and is used with a Lanex™ scintillator coupled with a Thorlabs CS2100-M camera to capture data at the high repetition rates. The spectrometer measures the hot electron distributions from the interaction at 13° from the laser axis, with the pinhole located 324 mm from the interaction point.
Since many high-intensity laser-driven physics phenomena such as proton acceleration16 and x-ray generation17 scale with hot electron number and temperature, we chose the hot electron production from laser–solid interactions as an optimization target. While scanning parameters such as pulse length and intensity are of general interest in the experiments, the scalings for output parameters such as hot electron temperature18 are well-established and largely depend largely on laser intensity. High-intensity short-pulse pulse-shaping has been an area of interest in recent years for both laser wakefield acceleration (LWFA)2,3 and solid target interactions4,19 as it potentially provides the ability to modify physics outputs such as electron beam charge, x-ray conversion efficiency, and proton maximum energy without explicitly varying the laser energy, focal spot, or target dimensions. Here, phase-only pulse-shaping is accomplished through an acousto-optical programmable dispersive filter (AOPDF)20,21 or a commercial device known as a DAZZLER™. The spectral phase [ ] is modified by a residual perturbation that can be described by a polynomial derived from a Taylor expansion about the central frequency of the laser (ω0) into the form , where ω0 is the center frequency and is the group delay. Here the components are referred to as the group delay dispersion (GDD), third order dispersion (TOD), and fourth order dispersion (FOD), or GTF, i.e., GTF=(GDD,TOD,FOD), when referring to the pulse shape settings collectively. Another advantage of adjusting pulse shapes with an AOPDF is that it provides a very fast “knob” that is readily controlled via software.
The examples shown in Fig. 1 demonstrate that the theoretical pulse shapes are no longer well described by the typical definitions of laser intensity or pulse length since the pulses are non-Gaussian. Considering the three-dimensional parameter space (assuming fixed laser energy and focal spot), brute force parameter scans could quickly become impractical. The total number of experiments scales as the number of samples per dimension (N) to the number of dimensions (M) (NM). For example, if one were to take ten spaced samples across each of the three dimensions, then 1000 experiments (target shots) would be required for just one sample at each setting.
III. MACHINE-LEARNING-ASSISTED GUIDANCE FRAMEWORK
The intent of this ML-assisted framework is to identify potential optima for various outputs, such as hot electron production, and to understand the overall response topology of the GTF input parameter space for comparison with simulations. The general framework for ML-guided experiments, illustrated in Fig. 2, involves constructing a surrogate model that interpolates across the dataset and serves as the basis function for optimization. Bayesian optimization (BO) is then applied to this surrogate model to guide subsequent experiments.
Traditionally, BO strategies utilize Gaussian processes (GPs) as the surrogate model due to their ability to provide predictive mean and uncertainty estimates. However, GPs face computational challenges as they scale roughly with the cube of the number of data points, primarily due to the computational cost of matrix inversion required for predictions.22 Consequently, when dealing with very large and complex datasets, deep learning models become a more suitable choice for the surrogate model. Although deep learning models do not inherently provide uncertainty estimates, they can be adapted to do so and handle larger datasets more efficiently. A common empirical approach to derive uncertainty estimates from neural network (NN) models involves training multiple (typically 10–20) models with identical structures on the same dataset.23 The variation in predictions from this model ensemble, which arises from random initializations of NN weights at the start of training, can be used to calculate mean and variance estimates. While this method is effective, its major drawback is the significant computational resources required to train each model in the ensemble.
Here we employ an uncertainty quantification method called 24 that uses “Stochastic data centering” to achieve similar mean and variance predictions at significantly reduced computational costs. This is done by training a single model and then performing inference on datasets that have been randomly shifted to a new origin. This method has been shown to be a robust measure of uncertainty and has previously outperformed several state-of-the-art methods for black-box optimization problems. While a Gaussian process (GP) would likely suffice for this problem given its relatively small sample size and low dimensionality (i.e., mapping from three scalars to a single scalar), the framework is designed to accommodate larger and more complex datasets. This anticipatory design is in preparation for future work, which aims to utilize models capable of simultaneously processing and predicting multi-modal, non-scalar data such as in Anirudh et al.25
Since the data flow and control infrastructure to run our framework in-line with the experiments was not available, we elected to utilize an “offline” operating mode whereby a full set of samples was collected with the guidance framework being run during target replacement and chamber vacuum cycling. Batches of approximately 270 experiments (i.e., laser “shots”) were performed, and upon collecting the diagnostic images, the electron spectra were processed and fed to the guidance framework. With at least three repeat shots at a given setting, approximately 80 unique pulse shaping samples were collected in each batch, comprising 240 of the typical 270 targets available in each array. The target metric is the sum of all signal counts in the electron spectrum with an energy greater than 0.1 MeV. This signal is additionally divided by the delivered laser energy to account for variations in laser energy , or ∼13% observed throughout the dataset, which typically trended downward from the beginning to the end of each batch. The ML surrogate model maps from the spectral phase components to the integrated signal level (ISL), which is defined as , effectively making electron production efficiency the optimization objective. Once the surrogate model has been trained on new data and uncertainty has been evaluated, BO is then performed on the surrogate. The framework utilizes the BoTorch26 package with the qExpectedImprovement (qEI) acquisition function. While expected improvement (EI) is commonly used for this purpose since it balances the trade-off between exploration (i.e., placing samples in high-uncertainty regions) and exploitation (i.e., placing samples in regions of known good performance) based on the improvement potential over the current best, using qEI enables “batch mode,” enabling the optimizer to consider the effects of adding multiple new points to the dataset simultaneously. This process was then iterated over two batches (two rounds of ML-guidance) with the surrogate model being trained on the cumulative dataset each time. The framework does not suggest samples that have already been selected in previous batches. In other words, the framework explicitly avoids previous samples, and all suggested settings were new additions to the dataset. Practically, this means that if the optimal setting was found in an earlier batch, it would not be re-tested, allowing the model to continue exploring more of the parameter space.
IV. RESULTS
The initial batch of pulse shapes was a quasi-informed grid scan. Here, 15 GTF settings were suggested by the framework when only trained on simulations. However, the simulations (as discussed further below) utilized larger GDD extremes than those deemed safe for laser operation. Therefore, only the TOD and FOD values were retained, and a regular grid scan was applied in GDD with five regularly spaced samples within the allowable limits, as shown in the exploration row of Table I. For subsequent rounds, the limits were further restricted the TOD and FOD values to the “exploration” limits in the table to mitigate risk to the laser system. The intent of the first scan was to provide an initial global shape of the parameter space and establish a bounding box within which the framework is directed to explore. This first round of data was then used to update the surrogate model (NN) that was originally only trained on simulation data before selecting the second round of experiments.
. | GDD (lower) (fs2) . | GDD (upper) (fs2) . | TOD (lower) (fs3) . | TOD (upper) (fs3) . | FOD (lower) (fs4) . | FOD (upper) (fs4) . |
---|---|---|---|---|---|---|
Experiment (grid search) | −4000 | 4000 | −160 000 | 240 000 | −400 000 | 1.4 × 106 |
Simulation | −20 000 | 20 000 | −200 000 | 200 000 | −2 × 106 | 2 × 106 |
Exploration | −4000 | 4000 | −160 000 | 160 000 | −400 000 | 400 000 |
. | GDD (lower) (fs2) . | GDD (upper) (fs2) . | TOD (lower) (fs3) . | TOD (upper) (fs3) . | FOD (lower) (fs4) . | FOD (upper) (fs4) . |
---|---|---|---|---|---|---|
Experiment (grid search) | −4000 | 4000 | −160 000 | 240 000 | −400 000 | 1.4 × 106 |
Simulation | −20 000 | 20 000 | −200 000 | 200 000 | −2 × 106 | 2 × 106 |
Exploration | −4000 | 4000 | −160 000 | 160 000 | −400 000 | 400 000 |
Within our framework, we chose a balanced (i.e., 50/50) division of the sample suggestions to allocate a specified fraction of settings to either reduce uncertainty in the model (i.e., explore) or test the existence of optima (i.e., exploit). To monitor overall system variation over time, the so-called “standard candle” experiments were interspersed throughout the scripted batch (i.e., list of pulse shape settings) to monitor the baseline performance and variation in the system. These standard candles are simply shots where the laser pulse shape is set to its nominal settings (i.e., no phase shaping). In the next round, 80 new samples are suggested by the ML-guided framework, and the data are collected and appended to the previous dataset for a subsequent round of experiments. While using a simulation-based model may initially bias the sample suggestions for the first round, the balanced sample division and subsequent model retraining allow the model to learn the experimental trends as the number of samples increases.
Figure 3(a) shows a plot of the semi-grid scan and two subsequent rounds of ML-guided sampling where the ISL for each sample is normalized to the average of the baseline (or standard candle, red diamonds) settings (dashed blue line), a quantity we refer to as relative ISL. The blue shaded region represents the 1σ deviation from the mean of the standard candles and nearly all samples sit within this band. Multiple outliers are also seen, and this can be attributed to “hard hits,” or energetic x-rays depositing their energy on the camera sensor within the electron data region. While a median filter was applied with the analysis routine during the experiments, no attempt to deliberately remove or correct these points during the experiments was made and they were used in model training for the guidance framework. If they are infrequent, the impact of the outlier samples should generally be minimized as the NN attempts to fit the largest number of samples in the set and effectively averages out the effect on the models as sample numbers increase. In Fig. 3(b), ISL from common samples is averaged and used to summarize the collected distributions from the three rounds of experiments. The mean of all distributions lies to the left of one, indicating that most pulse shaping parameters resulted in lower performance than the baseline settings. The majority of the top performing spectra originate from a family of pulse shapes with positive TOD and negative GDD regions of parameter space as discussed below.
V. DISCUSSION
A. Comparing experimental and simulation trends
To study the trends in pulse shaping parameter space, an ensemble (885 in total, presented here) of 1D simulations was performed using the EPOCH particle-in-cell (PIC) code. The shaped pulse simulation ensemble is based on numerical and target conditions found in Djordjević et al.27 To generate spectrally shaped pulses, a spectral phase is applied to the 1D relative intensity spectrum from CSU ALEPH parameterized by GTF. The shaped pulse is converted to the temporal domain via fast Fourier transform (FFT). In this ensemble, all laser pulses have approximately identical energy, within 1% error. Because the laser energy is identical for all simulations, normalization by laser energy is not required and the comparable quantity here is the sum of the particles contained in the electron distribution with energy greater than 0.1 MeV, as in the experiments. Both sets of data traverse the same parameter space since simulations initially guided the selection of bounds for the investigation. Note that in the case of the experiments, the GTF values are relative to the baseline values for the system, which had previously been optimized for laser pulse compression. For the simulations, GTF = (0,0,0) represents the idealized baseline pulse shape at best compression and 45 fs. Simulations encompass much larger parameter space than experiments. A comparison of the maximum and minimum GTF values from the experiment, simulation, and the limits imposed on ML-guidance (“exploration”) is listed in Table I. Based on an idealized model of pulse shaping via spectral phase modification, the simulations explore the theoretical limits of what is possible with pulse shaping and its impact on electron production. In reality, extreme values of GTF could severely reduce the temporal stretch of the laser pulse in the CPA scheme, thereby risking damage to the laser system. The limits, especially on GDD, for this laser system were deemed to retain an acceptable amount of laser spectrum at the settings listed for “exploration” in the table.
B. Comparing surrogate models from simulations and experiments
To better visualize the trends in these spaces as would be seen by the optimization framework, we train a modest ensemble (ten networks each) on each dataset (simulations or experiments) to map from GTF to ISL. Since experimental and simulation units are not directly comparable, we normalize each output quantity to that predicted by each model at its origin, GTF = (0,0,0), for comparison and refer to this quantity as relative ISL. Further details of the data preprocessing and model training can be found in the Appendix. Since we aim to both compare trends and examine behavior of the framework as it was seen during the experiments, we retain outlier data points contained in the original experimental dataset as described above.
These models then provide a means of exploring the larger trends in the “exploration” parameter space and enable comparison of experimental data to simulations over this space. After training the models, we use fine grid sampling to over-sample the models (31 linearly spaced samples in each dimension, bounded by the “exploration” parameters in Table I) and compare the predictions, as seen in Fig. 4. Here, we show the average predicted relative ISL along the FOD dimension as a function of GDD and TOD for models based on the simulation [Fig. 4(a)] and experimental [Fig. 4(b)] data. While there are clear discrepancies between the two models, there are some qualitative consistencies, such as the slope in higher predicted output (bright regions) spanning from middling GDD to increasing GDD and TOD, toward the top right of the plots. We also show the averaged ensemble uncertainty estimates along the same FOD axis for the simulation [Fig. 4(c)] and experimental [Fig. 4(d)] models. Here, the uncertainty is one standard deviation calculated from the predictions at each point by the ensemble relative to the average prediction. The average relative uncertainty ranges in magnitude from a few percent to greater than 5%. Both models have lower average uncertainty where each predicts the highest outputs and both contain their highest average uncertainties near the negative TOD boundary.
We utilize correlation matrices for each set of model predictions to examine the dependence of electron production on the pulse shaping (i.e., input) parameters, as seen in Fig. 5. Here, it can be seen that both cases show a positive correlation to ISL with the TOD parameter, while the GDD correlations are in opposition between the simulations and experiments, as would be expected from the plots in Fig. 4. In the simulation-based model, TOD is significantly more dominant than in the experiment-based model. Both models agree that correlation with FOD is considerably weaker than with the other two parameters, justifying the choice to project model predictions along the FOD axis in Fig. 4.
Finally, we directly compare the overall predictions from both the experiment and simulation models by showing the distribution of relative ISL over the over-sampled parameter space, as seen in Fig. 6(a). Both models have a mean prediction value below unity, indicating that the majority of pulse shapes are actually more likely to degrade, rather than enhance, the hot electron output. The simulation-based distribution is both wider and more pessimistic than the experimental distribution. They are, however, in reasonable agreement with respect to the predicted maximal improvement at approximately 18% and 13% over the baseline, for the simulation- and experiment-based models, respectively. In Fig. 6(b), we calculate a point-by-point squared error, . Despite the discrepancy in the distribution shapes seen in Fig. 6(a), the models are in reasonable agreement on broad trends across the over-sampled models, having a mean error of 8.5%, as seen in Fig. 6(b), and a maximum error of 30%.
C. Simulation case examples
In Fig. 7, we compare three simulations drawn from the ensemble including a high and low performing simulated electron spectrum to that of the baseline (no pulse shaping) case in parts (a) and (b), respectively. Although the slope of the distributions are nearly identical in (b), the electron spectrum with TOD equal to 160k fs2 contains 7% more particles with energies greater than 0.1 MeV compared to the nominal case. The case with negative 80 000 fs2 contains 40% fewer. It is noted here that all simulations are initialized with identical target properties, such as macroparticle density and pre-plasma scale length. These changes in electron spectra are despite the peak intensity for the positive TOD interaction being approximately a factor of two lower in the simulations than the idealized best compression pulse. While the full description of the mechanics is still under investigation using this 1D dataset in combination with 2D simulations, where the laser absorption is more realistically modeled, the implementation of positive TOD generally steepens the rising slope of the laser intensity. In 1D simulations, the steepened edge benefits the laser absorption into electrons in the 0.1 MeV electron energy band due to the maintenance of a short pre-plasma scale length for a longer duration. This leads to better ponderomotive acceleration at the critical density surface compared to the negative TOD case where pre-pulses arrive before the main pulse. The electron spectra between nominal and positive TOD are very similar at 0.1–0.5 MeV energies, so the difference primarily resides in the tail of the spectra at energies greater than this.
D. ML-guidance behavior in the presence of noise
While several of the higher performing settings were located by the ML-guidance framework, the improvement in averaged relative ISL is statistically insignificant. Given the large system noise over the course of the experiments, as represented by the 1σ error bars in Fig. 3(a), and low sampling statistics, we aim to evaluate the expected behavior of our framework in such an environment. Some of this variance, especially the anomalously high values observed, as seen in Fig. 3(b), can be attributed to energetic x-rays depositing energy randomly on the camera sensor and spuriously elevating the inferred signal levels during operations. An example of such a spike is visible in the electron spectrum shown in Fig. 1 at approximately 1.25 MeV energy. Additionally, the laser energy and spectrum fluctuate throughout the experiments. While normalizing output to the laser energy was our strategy to account for fluctuations, changes in the laser spectrum can also subsequently change the details of the pulse shape delivered to the target. The laser pulse shape was not measurable on-shot; however, the amplified spectrum is measured prior to entering the pulse compressor. We use the same model used in simulations to apply polynomial spectral phase to the measured laser spectrum, as shown in Fig. 8(a), and calculate time-dependent laser intensity simulations to demonstrate example intensity profile differences due only to changes in the laser spectrum at a given setting as seen in Fig. 8(b). Within a set of three spectra for nominally identical experiments, it can be seen that the calculated intensity profile could substantially differ. The predicted impact of such differences is under further investigation using PIC simulations.
We investigate the framework's behavior in the presence of variable system noise and the number of repeat samples taken at each location (or GTF setting) through virtual experiments. The approach is identical to that shown in Fig. 2, but here we utilize a surrogate model trained on the full simulation dataset and restrict our search space to that used in the actual experiments. Here, we are not interested in the details of the model-predicted trends as a function of the input variables, but rather in using them as the “ground truth” in virtual experiments. The first round of samples collected in the virtual experiments consists of a grid scan sampling the corners, face, and origin of the cubic parameter space. When samples are drawn from this surrogate model, random noise with a fixed mean percentage (i.e., sigma from a normal distribution) is applied to each sample to represent output fluctuations due to unknown sources (e.g., hard hits on the camera sensor, variations in the laser pulse shape). Additionally, the number of samples drawn at each pulse shape parameter is varied to observe the effect of repeated sampling on the framework's evolution and “experimental” outcomes. Akin to the experiments, we draw 80 unique GTF settings for each of the guided rounds and a set of 30 “standard candle” experiments are drawn during each round to establish the average baseline performance. After all data from a round of experiments are collected, the samples are averaged and normalized to the average standard candle value.
A subset of the parameters investigated in this study and the trends against these parameters are summarized in the box plots shown in Fig. 9. The two plots represent two of the three rounds of virtual experiments: a grid scan and one round of ML-guided sampling. The results of the third round are similar to the second and are omitted for clarity. In each case, the distribution at 1% random noise with 27 repeat samples is considered the “ground truth” for each sampling set, as this represents near-perfect data and should reflect the ideal behavior of the guidance framework. For reference, dashed lines originating from this distribution in each case indicate the expected mean (blue dashed) and maximum value (green dashed) from the sampling. It is observed that the mean and maximum values in the ground truth distribution only marginally increase in round 1. This suggests that the grid scan effectively sampled near the optimal inputs. Consequently, the optimization framework is essentially only able to find comparable solutions, given that it does not re-sample settings that have been previously acquired.
With increasing noise and low repeat sampling (i.e., one or three repeat samples), the mean of the distribution fluctuates significantly compared to cases with near-perfect data. These distributions are also substantially wider than those with high sampling (27 repeat samples). When sampling is increased to this level, the distribution widths become similar to their low-noise counterparts, and the mean also stabilizes near the ground truth at a medium noise level of 25%. Results at extreme levels of noise indicate that even 27 repeat samples are not sufficient to obtain reliable guidance results. This suggests that the guidance framework requires a minimum number of repeat samples to reduce uncertainty during ML-guided operations; however, this is predicated on the assumption that the system noise is well represented by a normal distribution. It is also evident that increasing system noise results in higher and lower inferred values compared to the ground truth, even when the number of repeat samples is increased to 27. In the present experiments, the observed mean standard deviation across all samples is approximately 23%, indicating that the number of repeat samples used in the actual experiments is insufficient for statistically relevant optimization. Additionally, the statistics garnered from the experiments are limited, with only three samples per setting, and repeat samples were placed adjacent to each other in the sequences, which may not adequately represent the true system variance.
VI. SUMMARY
In this work, we demonstrate a new framework for machine learning-guided of high-intensity, petawatt-class laser–solid experiments. This framework is geared toward flexibility in deployment by being based on a neural network (NN) rather than a Gaussian process (GP), enabling operation with very large datasets and multi-dimensional inputs and outputs. With a balanced setting of 50/50 exploration to exploitation ratio in the Bayesian optimization (BO) portion, the framework is designed to be capable of both model validation and the pursuit of optimization. Samples are collected in batches, and lists of new samples are suggested for subsequent rounds. Although the majority of the top-performing pulse shapes were selected by the framework, the mean observation noise across the dataset was higher (23%) than the mean improvement of the top ten samples at just 13%, indicating that the optimization aspect is highly uncertain.
Using separate NNs to model the 3D parameter space, we compare trends in hot electron generation as a function of group delay dispersion (GDD), third-order dispersion (TOD), and fourth-order dispersion (FOD) from ensemble particle-in-cell (PIC) simulations and experiments. Both models tended to prefer positive TOD values, with opposite correlations to GDD and minimal dependence on FOD. Both the simulation and experimental models agree on the magnitude of maximal increase at a level of 18% or 13%, respectively. Comparing the same samples drawn from each NN model, it is seen that the highest discrepancies occur at a level of approximately 30%, while the mean squared error between the two is approximately 8.5%, indicating reasonable agreement across the pulse shaping parameter space.
Finally, we study the framework's behavior in the presence of high experimental sampling noise by utilizing virtual experiments. These experiments show that the observed distribution evolution is similar to that observed in actual experiments, in that only minimal improvement is expected when initialized with a grid scan, consistent with either of the surrogate model predictions. In order to stabilize search trajectories using this framework, decreasing system noise or increasing the number of repeat samples would be required. Additional improvements in the guidance framework could also potentially be realized by altering the acquisition function of the BO to account for varying levels of system noise.
Algorithm-guided experiments will be limited by the ability to control and stabilize the systematic variances of high-power, high-repetition-rate (HRR) laser experimental platforms as they evolve toward higher complexity. In the near-term, deployments of such a framework for guiding experiments should increase the number of samples at each setting to combat this issue. Ultimately, finer control and characterization of the system inputs (laser pulse shape,28 target position,29 focal spot quality, etc.) and their relation to physical outputs30 will lead to overall decreased uncertainty in the resulting data and increased efficiency in HRR experiments. Virtual experiments can be used to evaluate sampling choices and explore the performance of different acquisition functions, other BO parameters, or surrogate model hyper-parameter choices to determine the optimal framework design under various sampling conditions. Emphasis on increasing the precision of system control and decreasing measurement uncertainty will continue to improve algorithm capability in HRR laser experiments.
ACKNOWLEDGMENTS
This work was funded under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344 with funding support from the Laboratory Directed Research and Development Program under tracking code Nos. 23-ERD-035, 20-ERD-048, and 21-ERD-015, and the DOE Office of Science Early Career Research Program under No. SCW1651; DOE Office of Science Fusion Energy Sciences (Nos. DOE-SC SCW1720 and DOE-SC SCW1722). Experimental facility work was supported by the DOE Office of Science, Fusion Energy Sciences under Contract No. DE-SC0019076: the LaserNetUS initiative at CSU Advanced Beam Laboratory. This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
D. A. Mariscal: Investigation (lead); Methodology (equal); Validation (lead); Writing – original draft (lead); Writing – review & editing (lead). B. Z. Djordjevic: Software (lead); Writing – review & editing (equal). R. Anirudh: Methodology (lead); Software (lead). J. Jayaraman-Thiagarajan: Methodology (lead); Software (lead). E. S. Grace: Investigation (supporting). R. A. Simpson: Investigation (supporting). K. K. Swanson: Investigation (equal). T. Galvin: Methodology (supporting). D. Mittelberger: Methodology (supporting). J. Heebner: Methodology (supporting). R. Muir: Methodology (supporting). E. Folsom: Investigation (supporting). M. P. Hill: Investigation (supporting). S. Feister: Methodology (equal). E. Ito: Methodology (supporting). K. Valdez-Sereno: Methodology (supporting). J. J. Rocca: Resources (equal). J. Park: Investigation (supporting). S. Wang: Investigation (supporting). R. Hollinger: Investigation (supporting). R. Nedbailo: Investigation (supporting). B. Sullivan: Investigation (supporting). G. Zeraouli: Investigation (equal). A. Shukla: Methodology (supporting). P. Turaga: Methodology (supporting). A. Sarkar: Methodology (supporting). B. Van Essen: Methodology (supporting). S. Liu: Methodology (equal); Visualization (equal). B. Spears: Conceptualization (equal); Supervision (equal). P.-T. Bremer: Conceptualization (equal); Funding acquisition (equal); Methodology (equal); Project administration (equal); Software (equal); Supervision (equal); Visualization (equal). T. Ma: Conceptualization (equal); Funding acquisition (equal); Project administration (equal); Supervision (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX: ENSEMBLE MODEL TRAINING AND OVER-FITTING MITIGATION
We further describe the process used to train a modest ensemble of neural networks employed to represent the simulation and experimental datasets in this work.
We utilize a common neural network structure known as a multi-layer perceptron (MLP), which consists of one input layer followed by three layers, each containing 128 neurons, and prescribed with the rectified linear unit (ReLU) activation function. The architecture concludes with one output layer consisting of a single neuron with a linear activation function. The ten MLPs in each ensemble are trained on their respective datasets to map from group delay dispersion (GDD), third-order dispersion (TOD), and fourth-order dispersion (FOD) (collectively, GTF) to electron output, with random weight initialization for each individual model. As a measure against model overfitting, we additionally implement L2 regularization at a level of to each of the three main layers. This regularization penalizes the network for assigning large weights,31 thereby reducing the likelihood of memorizing noise or generally overfitting.
To expand our experimental and simulation datasets to a sufficient size for training, we augment the datasets to increase the total number of samples used in both cases. There are 230 unique GTF samples available from the averaged experimental dataset, and we apply up to 1% noise to both the input and output scalar values 30 times using a random sampling from a normal distribution, resulting in 6900 samples for the training process. For the simulation-based model, we start with 885 initial samples and apply the same augmentation strategy ten times to obtain 8850 total samples for training. The application of this noise serves to both increase our number of available samples and to alleviate the problem of model overfitting.
The data are MinMax scaled (i.e., each variable is scaled from 0 to 1) and then split into 75% for training and 25% for testing, with 10% of the training dataset withheld for a validation dataset. The validation dataset is not shown to the model during training; however, the current state of the model at each training epoch is used to predict this dataset and monitor for improvement. Each model in the ensemble is allowed to train for as many epochs as needed with an early stopping function set to monitor the validation loss improvement. Within the early stopping monitor, a patience parameter of 10 means that if the monitored quantity (validation loss) does not improve by 0.001, then training will cease and the NN weights from the previous best loss score are restored and saved. Using this early stopping scheme, we prevent the model from significantly overfitting the data, in which case the model would become extremely adept at predicting only the training data and poorly predict the test dataset (i.e., the model does not generalize well). Example training curves are shown in Figs. 10(a) and 10(c) for simulations and experiments, respectively. These curves show steady improvement in loss for both the train and validation datasets, with no obvious signs of overfitting, which would manifest as an upward-trending validation loss (orange curve) paired with a continued decreasing training loss (blue curve). Calibration plots are also seen in Figs. 10(b) and 10(d). In these plots, the model predictions of ISL from the test dataset (the samples that were not seen during training) are compared against the “truth” values and visually demonstrate the model's accuracy. If a model is 100% accurate, then every point (blue circle) would lie along the 45° (black dashed) line.
The simulation and experimental models, each constructed of ten separate models, have their predictions averaged after inference in an attempt to smooth out high-frequency oscillations in each independent model. Averaging the predictions from an ensemble of models has been shown to result in higher model accuracy, and the variance in predictions within each ensemble is used to empirically estimate the uncertainty in the averaged model.32