Electric field waveforms of light carry rich information about dynamical events on a broad range of timescales. The insight that can be reached from their analysis, however, depends on the accuracy of retrieval from noisy data. In this article, we present a novel approach for waveform retrieval based on supervised deep learning. We demonstrate the performance of our model by comparison with conventional denoising approaches, including wavelet transform and Wiener filtering. The model leverages the enhanced precision obtained from the nonlinearity of deep learning. The results open a path toward an improved understanding of physical and chemical phenomena in field-resolved spectroscopy.
I. INTRODUCTION
Direct access to the temporal evolution of few-cycle optical pulses in the early 2000s enabled the investigation of electronic processes in various media1–5 under the emerging field of attosecond science. Few-cycle pulses in the optical spectral range typically have pulse envelope durations of sub-femtosecond to tens of femtoseconds. To directly measure the carrier field of such pulses, a shorter gate is required in the sampling process.6 Attosecond streaking spectroscopy is considered the gold standard for sampling visible light. The approach employs attosecond extreme ultraviolet pulses, produced from high-harmonic generation (HHG), as a gate.4,7,8 The extreme nonlinearity required for HHG, however, is a significant disadvantage of the technique since the process is highly inefficient. Moreover, the technique necessitates working under costly vacuum infrastructure, restricting its general accessibility. These limitations led to the emergence of novel approaches for characterizing few-cycle pulses, such as nonlinear photoconductive sampling (NPS),9–11 linear photoconductive sampling,12,13 TIPTOE,14–16 and electro-optic sampling (EOS) using visible and ultraviolet pulses,17,18 triggering a new era of spectroscopic19,20 and microscopic21,22 techniques. Nevertheless, such measurements trade ease of implementation for a worsening of the signal-to-noise ratio (SNR).11,16,23 This worsened SNR largely arises from weak signals measured with high gain that is particularly susceptible to ambient electronic noise, despite circuit grounding, shielding, and the selection of low-noise components.
A traditional denoising technique for waveform retrieval is the wavelet transform (WT).24 This is a linear time-frequency signal processing technique that decomposes a given signal into a set of wavelet coefficients called approximation and detail coefficients. It can be used to denoise data through iterative thresholding25 where the approximation filters act as averaging filters and the detail filters extract high-frequency information. Here, soft thresholding26 is implemented as values are set to 0 to omit the very high-frequency fluctuation that is typical of noise. The data are then transformed back at the end of the iterative process, producing the WT denoised data.25,27 Another approach using a genetic algorithm has recently been proposed for denoising terahertz signals.28 The method is based on removing noise-induced spikes and oscillations in the transfer function (i.e., the complex ratio between sample and reference signals in the frequency domain). Machine learning does not rely on assumptions about the transfer function and uses supervised learning, which can yield enhancement of time domain signals in attosecond spectroscopy.29,30
In this article, the extraction of few-cycle waveforms from measurements with low signal-to-noise ratios (SNRs) is demonstrated by employing a denoising algorithm based on a deep neural network architecture. The use of machine learning-based denoising (MLBD) is a particularly beneficial approach in scenarios where noise is intrinsic to the data acquisition electronics and has found applications from photonics,31–34 astrophysics, and astronomy35–38 to medicine.39–42 Here, a multilayer convolutional neural network (CNN) is trained on a large synthetic dataset (ntrain = 57 600 and ntest = 6400) to learn a mapping from noisy input to clean output. This procedure is akin to conventional denoising of data with pre-determined spectral filters as in Fourier and wavelet filtering.43 Here, however, the filters are learned directly from the training data rather than using pre-determined filters. Following a supervised learning approach, the CNN is presented with paired waveforms noisy “Xtrain” and clean “Ytrain.” This permits the model to learn the structure of signal vs noise without foreknowledge of the underlying mathematical relationship between the two.
In this article, a model capable of tackling measurements while being trained on a simulated dataset is developed and implemented. The model is compared with traditional signal processing methods, such as wavelet analysis and Wiener filtering. Despite being trained on simulated data, this approach is able to reconstruct ultrashort pulses from low SNR experimental data.
II. MODEL PERFORMANCE
Figure 1 illustrates a sample from the performance of the model on the test set ntest = 6400. The coefficient of determination is calculated for the model with an overall score of R2 = 0.99. To test the overall performance of the model, it is compared to wavelet transform (WT) and Wiener filtering (WF) as denoising methods. In WT, the denoising is performed using two layers of filters and the Symlets 8 family of wavelets. Wiener filtering, on the other hand, takes into account the statistical properties of the noise present in the signal by estimating the power spectral density of the noise vs that of the signal, where the noise is evaluated as the average of the local variance of the input signal. A comparison between the root mean square (RMS) errors between each denoising method is presented in Fig. 2. It is evident from the plot that the MLBD approach outperforms both the WT and the WF.
A sample of the waveforms extracted by the CNN model, , is shown in red. The noisy waveforms, , fed into the model are shown in light blue. The blue line represents the target waveforms, . The Pearson correlation coefficient r is calculated for each waveform with excellent results.
A sample of the waveforms extracted by the CNN model, , is shown in red. The noisy waveforms, , fed into the model are shown in light blue. The blue line represents the target waveforms, . The Pearson correlation coefficient r is calculated for each waveform with excellent results.
A histogram distribution representing the RMS error obtained by calculating , where stands for estimator results, showing that the performance of the machine learning model (red) surpasses that of the wavelet transform (blue) and Wiener filtering (gray). Note that the waveforms and are normalized individually prior to calculating the RMS error to remove any error bias due to the decrease in amplitude when using the WT and WF.
A histogram distribution representing the RMS error obtained by calculating , where stands for estimator results, showing that the performance of the machine learning model (red) surpasses that of the wavelet transform (blue) and Wiener filtering (gray). Note that the waveforms and are normalized individually prior to calculating the RMS error to remove any error bias due to the decrease in amplitude when using the WT and WF.
We benchmark our model against laboratory-measured data and compare to time-frequency Fourier filtering with zero-padding to increase the frequency resolution of the transformed signal. A broadband 4.2 fs white-light waveform spanning 500–1150 nm was obtained from the broadened output of a titanium-sapphire chirped pulse amplifier (light blue) and is depicted in Fig. 3. The waveform is measured by means of NPS that relies on strong field interaction (multiphoton absorption) to generate free carriers in a fused silica substrate.11 The free carriers generated by the interaction form a short gating event, which can be used to measure the electric field of a weak test pulse. Generally speaking, the NPS technique relies on the use of transimpedance amplifiers9,11 due to the small signals obtained with the technique. The signal-to-noise ratio of the waveform in Fig. 3 remains somewhat low with an SNR = 8.8 that can be attributed to several factors, such as laser intensity fluctuations, timing jitter, gain electronics, and spatial overlap variability. Nevertheless, the MLBD model manages to extract the waveform from the data as shown in red with a Pearson correlation factor of r = 0.97 as calculated between the cleaned laboratory data and the model prediction. Physics-informed learning, by, e.g., imposing a limited frequency range commensurate with the incident light’s bandwidth would further improve the SNR in the retrieval.
Model denoising laboratory-measured data (light blue). The cleaned laboratory data (blue) are processed using a super-Gaussian band-pass filter in both the time and frequency domains, as well as zero padding in the frequency domain. Obtaining the denoised plot (red) in the relevant frequency range required 0.13 ms using an Apple M1 pro chip. The measured white-light waveform contains frequencies below 375 THz (800 nm), constituting a frequency range previously unexplored in the learning process [see Eq. (1)]. High-frequency noise above 1 PHz (300 nm) is excluded from the assessment.
Model denoising laboratory-measured data (light blue). The cleaned laboratory data (blue) are processed using a super-Gaussian band-pass filter in both the time and frequency domains, as well as zero padding in the frequency domain. Obtaining the denoised plot (red) in the relevant frequency range required 0.13 ms using an Apple M1 pro chip. The measured white-light waveform contains frequencies below 375 THz (800 nm), constituting a frequency range previously unexplored in the learning process [see Eq. (1)]. High-frequency noise above 1 PHz (300 nm) is excluded from the assessment.
III. DATA SYNTHESIS, AUGMENTATION, AND PRE-PROCESSING
Note that A → [0, 1] and is uniformly random selected. The subscript R denotes a randomly generated value from a range of values. For tR → [−65, 65] (fs), σR → [5, 35] (fs), ωR → [375, 750] (THz), and CEPR → [0, 2π). This methodology randomly generates a set of waveforms with different arrival times tR, widths σR, central frequencies ωR, and phases. Figure 4 depicts a sample of the randomly generated waveforms (blue) following the expression in (1). Random Gaussian noise is added to the generated waveforms to form the noisy waveforms (light blue). The total sample size generated by the method is n = 64 000 waveforms. In preparation for the learning process, the dataset is randomly split into a training set containing ntrain = 57 600 waveforms and a test set containing ntest = 6400 waveforms. Note that ntest = 6400 are “held out” during the learning process and are not re-split from n = 64 000 in every learning iteration.
Randomly selected waveforms from the training dataset. Xtrain represents the noisy data, and Ytrain represents the clean data. The data are comprised of a variety of waveforms with different characteristics. Note that the plotted waveforms are normalized.
Randomly selected waveforms from the training dataset. Xtrain represents the noisy data, and Ytrain represents the clean data. The data are comprised of a variety of waveforms with different characteristics. Note that the plotted waveforms are normalized.
In ML, a feature represents a quantifiable attribute or characteristic of a given phenomenon.44 In our case of measured waveforms, each sampled point is considered a feature, and a ML model must learn to identify the locations where the field exists in the measurement. During a measurement, some of the sampled points only fluctuate by noise, while others fluctuate by both signal and noise (i.e., where the field exists in time). To mitigate any statistical biasing that may be introduced into the model due to the relative difference in these two types of fluctuations in the dataset, we employ the common preprocessing step known as feature scaling. This step involves the normalization of the value ranges assigned to the features within a dataset, enhancing the performance and accuracy of the algorithm.45 In this article, the data are scaled using MaxAbsScaler() such that Xscaled = X/max(|X|), which scales all sampled numerical features—time-domain sample points—to a range [−1, 1). This approach increases the visibility of the noise away from the pulse center but ensures that each sampled point is presented with an equal probability of lying in the range [−1, 1) over the training dataset.
IV. MODEL STRUCTURE AND TRAINING
(a) An illustration of the model structure with a total of 1 million trainable parameters. The input layer contains the preprocessed and normalized data Xtrain. Layer 2 performs a 1D convolution using a kernel to generate 32 feature maps (4 are shown for clarity). Layer 3 performs a 1D max pooling using a kernel to decimate the vectors from layer 2. Two steps of convolution and pooling are performed to extract the features from Xtrain. The data are then deconvolved before passing the information to a fully connected layer (dense) to generate a predicted output Ypred. (b) Model learning curve. The model employs the algorithm Adam to minimize a loss calculated as the mean squared error shown in blue. The validation set loss is plotted in red.
(a) An illustration of the model structure with a total of 1 million trainable parameters. The input layer contains the preprocessed and normalized data Xtrain. Layer 2 performs a 1D convolution using a kernel to generate 32 feature maps (4 are shown for clarity). Layer 3 performs a 1D max pooling using a kernel to decimate the vectors from layer 2. Two steps of convolution and pooling are performed to extract the features from Xtrain. The data are then deconvolved before passing the information to a fully connected layer (dense) to generate a predicted output Ypred. (b) Model learning curve. The model employs the algorithm Adam to minimize a loss calculated as the mean squared error shown in blue. The validation set loss is plotted in red.
By iterative minimization of the mean squared error, the model learns to map the noisy input onto the desired denoised output. In this model, iterative minimization is achieved by employing the Adam optimization algorithm,48 a variant of stochastic gradient descent. The model is trained locally on a MacBook Pro (Apple M1 Pro chip) in batches of 128 samples for 100 epochs—cycles through the collection of batches—for a total elapsed time of ∼500 s. We note that training time could be significantly improved with the use of training-optimized hardware49,50 but was not necessary for the scope of this work. A hold-out test set comprising 20% of the synthetic data is used only to evaluate the loss at the end of each epoch; it is not used to adapt the neural network weights nor hyper-parameter values, such as batch size, learning rate, and dropout.
V. CONCLUSION
Machine learning-based denoising is a proficient approach for tackling intricate noise patterns that are difficult to address using conventional methods that rely on linear filters, such as Fourier filtering or the wavelet transform. The strength of MLBD lies in its nonlinear approach, which allows the ML model to capture nonlinear relationships in the data, permitting the model to adapt to novel noise patterns when generalizing to new unseen data. This is a result of following an end-to-end learning procedure, which allows the model to learn directly from the noisy data. While the model in this article is trained using a generic white noise distribution, by training on a diverse and large dataset, the model was capable of denoising complex laboratory-measured waveforms, which were not explicitly seen during training nor were they strictly white noise. Consequently, MLBD is a versatile and efficient tool, with benefits across an extensive range of applications whereby poor SNR is encountered for measured waveforms that are not easily experimentally generated.36,37 Additionally, the advent of various inference acceleration hardware allows for sufficiently large neural networks to produce cleaned results at repetition rates that surpass 3851 to 100 K49 inferences per second, thus keeping abreast of incoming single shots of high repetition rate lasers. These inference accelerators52,53 can be leveraged for MLBD in real-time measurements for signals that may otherwise remain undetected and do so with sufficiently short latency for use in real-time high rep-rate pulse shaping feedback systems. Accelerating MLBD in such a way could inspire new approaches to 3D spatiotemporal real-time adaptive laser pulse shaping in many important fields from medical applications to semiconductor processing.
ACKNOWLEDGMENTS
N.A. acknowledges support from the Max Planck Society via the IMPRS for Advanced Photon Science. N.A. is part of the Max Planck School of Photonics supported by BMBF, Max Planck Society, and Fraunhofer Society. R.N.C.’s work was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, under Field Work Proposal 100643 “Actionable Information from Sensor to Data Center.” M.F.K.’s work at SLAC was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, under Grant No. DE-AC02-76SF00515 and by the Chemical Sciences, Geosciences, and Biosciences Division (CSGB).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Najd Altwaijry: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Investigation (lead); Methodology (lead); Software (lead); Validation (lead); Visualization (lead); Writing – original draft (lead). Ryan Coffee: Validation (supporting); Visualization (supporting); Writing – review & editing (supporting). Matthias F. Kling: Funding acquisition (lead); Resources (lead); Supervision (lead); Validation (supporting); Visualization (supporting); Writing – review & editing (lead).
DATA AVAILABILITY
The data and model that support the findings of this study are available under https://github.com/ananajd/MLBD.