In recent years, there has been an increased demand for elaborate monitoring techniques in laser material processing. This has been driven by the need for fast and cost-efficient quality assurance processes. At the same time, ultrashort-pulsed (USP) laser radiation has emerged as a promising technology for creating intricate microstructures in lithium-ion battery graphite anodes due to its high precision and negligible thermal impact. However, the integration of process monitoring in USP laser applications for graphite anode structuring is still unexplored. There is a lack of clarity on suitable sensors, observable parameters, and extractable process-relevant insights. The presented study addressed this gap by demonstrating the capability of state-of-the-art photodiode-based monitoring systems in collecting process-relevant data and deriving valuable insights. A sensor equipped with three photodiodes was employed to address these challenges. Exploratory data analysis and machine learning methodologies were leveraged to develop a data pipeline for processing the acquired information. The data were used to train convolutional neural networks that could accurately predict the focal position. At the same time, the limitations of traditional regression approaches could be shown. The findings advanced the understanding of the possibilities of process monitoring in USP laser applications and emphasized the significance of data-driven approaches in optimizing manufacturing processes.
I. INTRODUCTION AND STATE OF THE ART
Laser material processing has recently become well-established in various areas across production due to its multiple advantages, such as high precision, elevated production rates, flexibility, and almost wear-free operation.1 Alongside the advancements in the laser technology, process monitoring methods have been developed to meet the demands of both research and industry. These methods aim to enhance the process understanding and enable the inline quality evaluation, potentially reducing or eliminating the need for time-consuming and expensive downstream quality inspections.2 Widely used inline monitoring systems detect radiation emitted from the process zone with, for example, pyrometers, thermography cameras, or photodiodes.2 To date, these monitoring systems have primarily been employed in continuous wave (CW) or nanosecond pulsed laser welding processes due to limitations in the sensor hardware for data recording.3 Additionally, ultrashort-pulsed (USP) laser beam sources that can be operated with pulse repetition rates (PRRs) in the high kilohertz or megahertz range have previously been either insufficiently powerful or financially prohibitive for a widespread industrial use.4 However, recent developments have led to a steady increase in the output power of these USP laser beam sources and a corresponding decrease in their performance-related costs, paving the way for broader industrial applications.4 Moreover, newly developed process monitoring systems exhibit high sampling frequencies, making them suitable for USP laser processes operating at high PRR.
Lithium-ion batteries (LIBs) are the predominant energy storage solution for various applications, including portable consumer electronics and electromobility. However, designing automotive LIBs involves a trade-off between high energy density, which enables long-range capabilities, and a high power density, which facilitates fast charging and discharging.5 Laser structuring addresses this trade-off by promoting lithium-ion diffusion through microscopic channels in the electrodes.6–8 These channels can be implemented as grids,9,10 lines,11,12 or hole structures.13,14 Research has demonstrated that laser electrode structuring significantly enhances the fast-charging capabilities of lithium-ion batteries without causing cell degradation and extends their lifespan.13–15 These advantages are especially pronounced when applying laser structuring to graphite anodes16 and thick or highly compacted electrodes.17,18 Additionally, laser structuring accelerates the electrolyte wetting of the electrodes during battery production.12,19,20
The monitoring of the focus position in laser structuring is a cornerstone for advancing LIB anode fabrication. A precise focus control is essential for creating defined microstructures with the desired high aspect ratios,21 even with occurring process variations.
Photodiode-based process monitoring systems typically generate signals from filtered spectral emissions or reflections from the process zone.22 These systems can be integrated into the optical path of the scanner optics, ensuring a coaxial position at every scanner location.9 Three photodiodes covering different wavelength spectra are predominantly used.9 One photodiode covers a wavelength spectrum within the 380–750 nm range, measuring the radiation intensity of the plasma plume. The second photodiode detects radiation at the wavelength emitted from the laser beam source, providing information about the intensity of the reflected laser radiation.9 The third photodiode captures radiation between 1200 and 1700 nm, offering insights into the thermal conditions of the surface.9
Photodiode-based systems have been mainly employed in CW laser welding for process monitoring and characterization. Kaplan et al. demonstrated that photodiode monitoring systems with a 20 kHz sampling frequency can detect keyhole blowouts and gather information about the melt pool size.5 However, dynamic melt pool events happened too quickly for the sensor system.22 They also found that plasma and temperature signals did not differ significantly in the information content for the investigated welding process, as both were primarily generated from the vapor plume.22 Chianese et al. employed photodiodes with a 50 kHz sampling rate for the process observation during the external contacting of battery cells. Using various machine learning (ML) methods, they achieved a classification accuracy of 97% for variations in the weld penetration depth and part-to-part gaps. However, changes in the experimental conditions, such as in the copper foil thickness, reduced the prediction accuracy to 92%, suggesting that combining different sensor systems and using data fusion might be necessary for a complete process and weld quality evaluation.10 Grabmann et al. showed that a photodiode system (50 kHz sampling rate) detecting three wavelength spectra can be employed to monitor the laser beam welding of copper foils using a laser source emitting 515 nm wavelength radiation.8 The results indicated that the photodiode signal intensity was influenced by the geometric position of the foils (focal and component plane offsets). Additionally, gaps between the copper foils were detected due to an increased thermal resistance, resulting in higher heat accumulation at the top foil. Introducing a graphite-coated copper foil into the stack revealed a significant increase in the near-infrared signal, regardless of the position of the coated foil.23
In addition to CW laser welding, photodiode-based monitoring has been applied to nanosecond pulsed laser processes. Ho et al. proposed a model for depth measurements using the photodiode data from a drilling process with laser pulses having a pulse duration of 6 ns at a PRR of 15 Hz.11 The estimated ablation depth was inversely proportional to the intensity of the plasma emission measured by the photodiode.24 Further investigations by Ho et al. with a photodiode sampling frequency of 1 MHz revealed that, as the number of pulses increased, the material volume removed per pulse and the plasma emission intensity decreased, confirming the inverse relationship.25 They also proposed a focus position detection method based on the photodiode voltage signal variation over the pulse number, thereby enabling focus monitoring and control. This model could significantly reduce the pulses needed to drill through a 1 mm thick stainless steel plate.25
Few publications have addressed inline monitoring for USP laser material processing. Kunze and Schmitt presented an inline topography measurement system based on low-coherence interferometry, detecting a depth error of 0.3–3 μm compared to laser confocal microscope measurements.26 Zechel et al. incorporated an optical coherence tomography system into the scanning optics to assess the depth and width of structures on electrical steel created with USP laser radiation (500 fs pulse duration, 10 kHz PRR) to enhance the efficiency of an electrical machine.27 Kacaras et al. developed a model to determine the focus position by analyzing structure-borne acoustic emissions recorded by a piezoelectric sensor during USP laser structuring. Using pulses with a duration of 6 ps at a PRR of 20 kHz, the acoustic signal analysis via short-time Fourier transformation (STFT) identified frequencies dependent on the focal position, enabling inline monitoring.28 Leyendecker et al. predicted the surface roughness of specimens structured with 10 ps pulses at a PRR of 500 kHz using a photodiode sensor and microphone to capture process emissions, combined with ML algorithms for data evaluation.16 This system used three photodiodes to collect radiation in the visible, infrared, and laser radiation wavelengths, allowing a high precision and a robust surface quality prediction.29 Sun et al. analyzed drilling processes in turbine blades made of a nickel-based alloy (GH4169) with coatings using a femtosecond laser with a PRR of 1 kHz.17 Real-time process monitoring detected stage transitions during drilling with a classification model. An accuracy of 95.22% could be achieved using two photodiodes at a sampling rate of 200 kHz to collect time-series data for statistical feature calculation and classification.30
This study focused on developing a system to estimate the focal position during ultrashort-pulsed laser structuring using photodiode data. The primary goal was to improve drilling quality. Experimental data were collected with a USP laser source and a state-of-the-art photodiode system. These data were then used for exploratory data analysis and to train a model for accurate focal position prediction. A key contribution of this study is the creation of a precise methodology for predicting the focal position. This framework is expected to enhance quality control in USP laser structuring.
II. EXPERIMENTS
A. Materials
The anode used in this study was a double-sided graphite coating on a copper foil produced by CustomCells GmbH (Germany). The properties of the materials used are detailed in Table I.
Characteristics of the laser-structured anode.
Designation/type . | Anode . |
---|---|
Thickness after calendaring | 152 μm |
Thickness of the copper current collector | 14 μm |
Coating | Double-sided |
Active material content | 96% (±0.5%) |
Capacity | 3.5 mAh/cm2 (±3%) |
Coating | 20.84 mg/cm2 (±3%) |
Density | 1.51 g/cm3 (±3%) |
Porosity of the graphite | 33% (±3%) |
Collector material | Copper foil blanc |
Designation/type . | Anode . |
---|---|
Thickness after calendaring | 152 μm |
Thickness of the copper current collector | 14 μm |
Coating | Double-sided |
Active material content | 96% (±0.5%) |
Capacity | 3.5 mAh/cm2 (±3%) |
Coating | 20.84 mg/cm2 (±3%) |
Density | 1.51 g/cm3 (±3%) |
Porosity of the graphite | 33% (±3%) |
Collector material | Copper foil blanc |
B. Experimental setup
For the experiments, a laser cell (Coherent ExactMark 230) equipped with a USP laser system (Rapid NX, Coherent Inc., USA) emitting pulses with a duration of 10 ps was used. A 2D scan head (intelliSCAN III 20, SCANLAB GmbH, Germany) equipped with an F-theta lens performed the beam deflection and focusing. A sensor system consisting of three photodiodes (SmartSense+, Coherent Inc., USA) integrated into the optical path was used for process monitoring. More information regarding the laser and sensor systems can be found in Table II.
Characteristics of the laser and sensor systems.
Laser system . | Coherent Rapid NX . |
---|---|
Operation mode | Pulsed |
Central emission wavelength, λ | 1064.5 nm |
Max. laser power, P | 2.5 W at 50 kHz |
Pulse duration, τ | 10 ps |
Pulse repetition rate, fr | 50 kHz–1 MHz |
Pulse energy, EP | 50 μJ at 50 kHz |
Sport diameter, df | 35.25 μm |
Beam quality factor, M2 | 1.2 |
Focal length | 170 mm |
Rayleigh length | ±0.76 mm |
Laser system . | Coherent Rapid NX . |
---|---|
Operation mode | Pulsed |
Central emission wavelength, λ | 1064.5 nm |
Max. laser power, P | 2.5 W at 50 kHz |
Pulse duration, τ | 10 ps |
Pulse repetition rate, fr | 50 kHz–1 MHz |
Pulse energy, EP | 50 μJ at 50 kHz |
Sport diameter, df | 35.25 μm |
Beam quality factor, M2 | 1.2 |
Focal length | 170 mm |
Rayleigh length | ±0.76 mm |
Sensor system . | Coherent SmartSense+ . |
---|---|
Wavelength spectrum of photodiode 1 (back reflection) | 1050 nm–1100 nm |
Wavelength spectrum of photodiode 2 (plasma) | 380 nm–750 nm |
Wavelength spectrum of photodiode 3 (temperature) | 1200 nm–1700 nm |
Max. sampling rate | 5 MHz |
Gain factor of photodiode 1 (back reflection) | +60.5 dB |
Gain factor of photodiode 2 (plasma) | +26.5 dB |
Gain factor of photodiode 3 (temperature) | +72.6 dB |
Data points per pulse | 100 |
Sensor system . | Coherent SmartSense+ . |
---|---|
Wavelength spectrum of photodiode 1 (back reflection) | 1050 nm–1100 nm |
Wavelength spectrum of photodiode 2 (plasma) | 380 nm–750 nm |
Wavelength spectrum of photodiode 3 (temperature) | 1200 nm–1700 nm |
Max. sampling rate | 5 MHz |
Gain factor of photodiode 1 (back reflection) | +60.5 dB |
Gain factor of photodiode 2 (plasma) | +26.5 dB |
Gain factor of photodiode 3 (temperature) | +72.6 dB |
Data points per pulse | 100 |
Figure 1 shows an image of the setup used for the structuring task, while Fig. 2 presents a schematic drawing illustrating the process monitoring setup. During the experiments, laser pulses were generated by the laser beam source. The pulses were directed to the workpiece via a scanning optics composed of lenses and mirrors. Upon reaching the surface, the laser radiation was partially absorbed and partially reflected, depending on the material properties. The absorbed laser radiation generated plasma and heated the surface. The reflected laser radiation as well as the plasma and thermal radiation traveled back through the optical path to the photodiodes, where the radiation intensity was measured.
Schematic drawing of the setup and the sensor system. Own representation based on Erikson et al. (Ref. 31).
Schematic drawing of the setup and the sensor system. Own representation based on Erikson et al. (Ref. 31).
C. Experimental design
The materials and setup described were used throughout all experiments. Furthermore, all parameters of the sensor system and the laser beam source were kept constant. The PRR was set to the lowest possible value of 50 kHz, with each pulse having an energy of 50 μJ. The sampling frequency of all photodiodes was adjusted to 5 MHz. This configuration allowed the maximum number of data points per pulse achievable with the hardware used. For all test series, the same microstructure was introduced into the material. The microstructure consisted of nine holes arranged in a three-by-three matrix with a distance of 200 μm between neighboring drillings as can be seen in Fig. 3.
All pulses used for one drilling were emitted in one pulse train with a constant PRR. In addition to the number of pulses, the initial focus position was changed in steps of 0.25 mm in 12 steps from −1.5 to +1.5 mm between the measurements by adjusting the distance between the lens and the material surface via the stage. In addition to these steps, measurements were taken at positions further outside this range, specifically at −3, −2, +2, and +3 mm.
III. DATA PROCESSING
A. Data acquisition
For data acquisition, several measurements were conducted, with one measurement corresponding to one matrix. In all measurements, three photodiodes were used to capture the intensity of the emissions during the structuring process. Each photodiode measured the light intensity in a different spectrum (see Table II), resulting in three time-series data files per measurement. In addition, the data on the number of pulses per drilling and the initial focus position were collected for each measurement as metadata. Figure 4 shows an example of the acquired signal from the photodiodes.
B. Data preparation
First, data from the three photodiodes and the associated metadata were collected and saved in a consistent scheme. Each measurement was stored as a row in a table that encompassed all measurements. Second, to reduce the amount of data, all data points corresponding to laser-off periods (e.g., between drillings) were removed. Periods of inactivity were identified using rising edge detection in the plasma and infrared data. The final data point of a pulse train was calculated by multiplying the number of pulses in the pulse train by the number of data points per pulse, which was determined by the ratio of the sampling rate to the PRR. This approach allowed for an accurate derivation of process duration from the time-series data. The processed data were stored in a table format. Each row contained the measured values for a single pulse train, including the corresponding measurement number, drilling number, initial focus position, number of pulses constituting the pulse train, and spectrum.
Further data preprocessing depended on the model architecture described in Sec. III C.
C. Modeling of the focus position
A polynomial regression model and two convolutional neural networks (CNNs) were implemented and compared with regard to their abilities to determine the focus position. Based on the photodiode data, all models utilized supervised learning to assess deviations of the focus position. This approach was enabled by systematically collecting data on the distance between the lens and the material surface for each measurement. Variations in the distance during the experiments allowed for measurements with different offsets from the focus position, and the corresponding ground truth data on these offsets were recorded.
The objective of all models was to determine the offset from the focus position, denoted as Δz. In this context, negative Δz values indicated an underfocus, while positive values represented an overfocus. A Δz value of zero signified that the laser beam's minimal spot was on the material surface. Only the initial pulses of each pulse train were used to ensure accurate data on the focal offset. During the drilling process, the laser beam shifts out of focus as each pulse ablates the material, resulting in an overfocus. The model performance was evaluated using the mean absolute error (MAE) metric, calculated by comparing the ground truth with the predicted focus position of the model.
Due to variations in the model types and the respective input data, the number of samples available for training and testing varied. However, the data set was split into 60% training, 20% validation, and 20% testing subsets for all models. After training and hyperparameter optimization, the test data set was used once to assess the final performance. An additional special data set was employed to further demonstrate the generalization capabilities of the model. This data set included two initial focus positions not present in the original data set and, thus, entirely novel to the models. Although the test data set contained unseen samples, these samples had focus positions also represented in the training data.
1. Short-time Fourier transform convolutional neural network
The short-time Fourier transform convolutional neural network (STFT CNN) utilized spectrograms to determine the initial focus position. The STFT was applied to calculate spectrograms from 2000 data points derived from the first 20 pulses of each pulse train. Spectrograms were generated for all three wavelength spectra measured by the sensor system. The following parameters were used for the STFT, balancing the time and frequency resolution:
Segment size: 500 data points.
Segment overlap: 250 data points.
Padding: none.
The resulting spectrograms (as shown in Fig. 8 in the Appendix) from the three spectra were combined into a three-dimensional tensor, with the spectra serving as the channel dimension. Each spectrogram was resized to a square shape of 64 × 64 pixels, resulting in a four-dimensional input tensor shape (batch size, 64, 64, 3). In this context, “batch size” corresponds to the number of samples processed simultaneously, “64” denotes the pixel resolution of the spectrograms, and “3” reflects the tri-channel configuration derived from the photodiodes. The CNN architectures are shown in Fig. 5. Detailed information on the properties of the model can be found in Table IV in the Appendix.
Schematic drawing of the CNN architectures: (a) STFT CNN and (b) 1D CNN.
The architecture in Fig. 3(a) employed a sequence of 2D convolutional layers (Conv2D) and maximum pooling layers (MaxPool2D), which were arranged alternately. It began with two con volutional layers (Conv2D 1-1 and Conv2D 1-2), followed by a maximum pooling layer (MaxPool2D 1). This pattern was repeated with another set of convolutional layers (Conv2D 2-1 and Conv2D 2-2) and a second maximum pooling layer (MaxPool2D 2). The network transitions from convolutional layers to a flatten layer, which preceded two fully connected dense layers (Dense 1 and Dense 2), culminating in a linear output layer.
The architecture in Fig. 3(b) mirrored the first architecture, thereby utilizing 1D convolutional layers (Conv1D) and maximum pooling layers (MaxPool1D). It started with Conv1D 1-1 and Conv1D 1-2, followed by MaxPool1D 1. A second set of convolutional layers (Conv1D 2-1 and Conv1D 2-2) and a maximum pooling layer (MaxPool1D 2) followed the same structure. The network then flattened the data and processed it through two dense layers, ending with a linear output layer.
Both architectures were designed to capture hierarchical features from the input data, with the 2D variant being more suitable for spatial data and the 1D variant for temporal or sequence data. The alternating pattern of convolutional and pooling layers is a common design that helped in reducing the dimensionality while preserving essential features, which are then used for classification or regression tasks in the dense layers.
2. One-dimensional convolutional neural network
The one-dimensional convolutional neural network (1D CNN) utilized one-dimensional time-series data to determine the focus position per pulse. Specifically, 100 data points from each photodiode for each pulse were used as the input for the model. Only the first three pulses of each pulse train were considered to ensure an accurate ground truth. In the constructed model, each of the three photodiodes was processed as an individual channel. This approach structured the input data into a tensor with dimensions represented by (batch size, 100, 3), where “batch size” corresponded to the number of samples processed simultaneously, “100” denoted the temporal resolution of the signal, and “3” reflected the tri-channel configuration derived from the photodiodes. The overall architecture of the 1D CNN mirrored that of the STFT CNN, with modifications to accommodate for the one-dimensional input. The architecture is illustrated in Fig. 5. Detailed properties of the model are provided in Table VI in the Appendix.
3. Polynomial regression
The regression model employed five features, calculated for each pulse, to determine the focus position. These features were manually selected following an exploratory data analysis. Only the first pulse of each pulse train was considered to guarantee a correct ground truth. The features for each pulse were calculated based on the 100 data points per pulse and originated from different photodiodes. The features used in the model are illustrated in Fig. 6.
The values of these five features were fed into the models as input features. In the regression analysis, a fourth-degree polynomial was employed, which, when considering interaction terms, yielded a model with 127 parameters. This complexity was managed by applying L2-regularization, which introduced a penalty term to the loss function to prevent overfitting. The regularization strength denoted by the hyperparameter lambda was set to 1. This value and the polynomial degree were determined through an iterative process that involved comparing the training and validation loss to find the optimal balance that minimizes the error on unseen data. The selection of features was performed by plotting the feature values of multiple pulses with different initial focus positions. The architecture and hyperparameters were iteratively optimized by comparing the polynomial regression model's performance on the train and validation data set.
IV. RESULTS
The performance of the three models varied. The best performance was achieved with the STFT CNN, followed by the 1D CNN and the polynomial regression model. The achieved MAE of each model on the respective test data sets and the number of samples in all data sets is displayed in Table III. The table also shows the performance on the data set with focus positions unknown to the models. More detailed insights into the accuracy of the models across all different focus positions are shown in Fig. 7. The box plot visualization provides a more granular view of the model accuracies across various focus positions. It illustrates the distribution of prediction errors and highlights the consistency and reliability of the performance of each model. The lower prediction errors of the STFT CNN across most real focus positions confirm its robustness in focus position estimation tasks.
Comparison of the model performances.
. | STFT CNN . | 1D CNN . | Polynomial regression . |
---|---|---|---|
MAE on the test data set (mm) | 0.138 | 0.193 | 0.455 |
MAE on the special data set (mm) | 0.178 | 0.188 | 0.408 |
Training samples | 464 | 1391 | 464 |
Validation samples | 155 | 464 | 155 |
Test samples | 155 | 464 | 155 |
Special samples | 16 | 48 | 16 |
. | STFT CNN . | 1D CNN . | Polynomial regression . |
---|---|---|---|
MAE on the test data set (mm) | 0.138 | 0.193 | 0.455 |
MAE on the special data set (mm) | 0.178 | 0.188 | 0.408 |
Training samples | 464 | 1391 | 464 |
Validation samples | 155 | 464 | 155 |
Test samples | 155 | 464 | 155 |
Special samples | 16 | 48 | 16 |
Comparing the test accuracies of the models, a substantial difference in the accuracies between the two CNNs and the polynomial regression could be identified. This illustrates the limitations of classical feature selection approaches and shows the strength of deep learning methods for monitoring tasks. The performance discrepancies may also be attributed to the varied types of input data utilized. Furthermore, the sample sizes generally differed. While the polynomial regression model exclusively utilized the initial pulse from each pulse train, the CNNs incorporated either multiple pulses per sample (STFT) or additional samples derived from the first three pulses (1D). Consequently, despite originating from the same base data set, the final training, validation, and test data sets were distinct for each model.
As described in Sec. III C, each STFT sample contains information on 20 consecutive pulses. Hence, compared to the other models discussed that only take a single pulse as an input, the STFT CNN can extract more detailed features from the samples fed into the model. By using 20 pulses, each STFT sample contains information in the frequency domain. This higher level of information can then be extracted by the STFT CNN to achieve the highest accuracy within the comparison.
The outcomes of this study have led to the identification of a methodology capable of accurately modeling focus positions. This advancement represents a significant step in precision monitoring, with the potential to streamline the calibration process and improve the overall capability of the system.
V. CONCLUSION
This study not only showcases the potential of photodiode-based monitoring systems in USP laser applications for graphite anode structuring, but also illuminates the methodological advancements achieved. This research focused on the innovative methodology developed for predicting focus positions with high precision. Machine learning models, particularly CNNs, played a crucial role in surpassing the capabilities of traditional polynomial regression models. The STFT CNN emerged as the frontrunner, showing a mean absolute error of 0.138 mm, followed by the 1D CNN with a mean absolute error of 0.193 mm, and the polynomial regression model with a mean absolute error of 0.455 mm.
The method described in the paper can be employed in two ways, depending on the specific needs of the application.
During the initial setup or calibration phase, the method can be used to determine the optimal focal point for the laser structuring process. By analyzing the photodiode data and using the trained model, operators can accurately identify the best focus position to achieve the desired microstructures.
During the production stage, the method can continuously monitor the focal position in real-time. Any deviations from the optimal focus can be detected promptly, allowing for immediate adjustments. This ensures consistent quality and precision throughout the manufacturing process.
The findings emphasize the methodological step forward, as data-driven approaches not only enhance process monitoring, but also refine the calibration and optimization of manufacturing processes. The elaborated methodology, characterized by its adaptability and precision, marks a significant step in the application of USP lasers. It underscores the synergy between advanced sensor technology and machine learning techniques.
ACKNOWLEDGMENTS
We sincerely thank the Bavarian Research Foundation (BFS) for funding our research. The results presented in this work have been achieved in the project Tramik (Grant No. AZ-1501-21). The authors gratefully acknowledge being supplied the laser cell ExactMark 230 equipped with the laser beam source Rapid NX and the SmartSense+ monitoring system provided by the COHERENT Inc.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Pawel Garkusha: Conceptualization (lead); Data curation (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (lead); Project administration (equal); Software (equal); Validation (equal); Visualization (lead); Writing – original draft (lead); Writing – review & editing (equal). Benjamin Kasper: Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (supporting); Software (lead); Validation (equal); Visualization (supporting); Writing – review & editing (supporting). Christian Geiger: Formal analysis (supporting); Writing – review & editing (equal). Christian Bernauer: Formal analysis (supporting); Writing – review & editing (supporting). Lovis Wach: Formal analysis (supporting); Writing – review & editing (supporting). Michael Kick: Formal analysis (supporting); Writing – review & editing (supporting). Michael F. Zaeh: Funding acquisition (lead); Project administration (lead); Resources (lead); Supervision (lead); Writing – review & editing (equal).
APPENDIX: ADDITIONAL FIGURES AND MODEL PARAMETERS
Figure 8 shows an example of a spectrogram computed by using STFT.
Tables IV–VII show architecture and hyperparameters of the STFT CNN and architecture and hyperparameters of the 1D CNN.
Architecture of the STFT CNN.
Layer . | Properties . |
---|---|
Conv2D 1-1, Conv2D 1-2, Conv2D 2-1, and Conv2D 2-2 | Kernel: (16,16), filter: 32, activation: ReLU, and padding: same |
MaxPool2D 1 and MaxPool2D 2 | Kernel (4,4) |
Dense 1 | Size: 256, activation: ReLU, and dropout: 0.1 |
Dense 2 | Size: 256 and activation: ReLU |
Linear (output) | Size: 1 and activation: linear |
Layer . | Properties . |
---|---|
Conv2D 1-1, Conv2D 1-2, Conv2D 2-1, and Conv2D 2-2 | Kernel: (16,16), filter: 32, activation: ReLU, and padding: same |
MaxPool2D 1 and MaxPool2D 2 | Kernel (4,4) |
Dense 1 | Size: 256, activation: ReLU, and dropout: 0.1 |
Dense 2 | Size: 256 and activation: ReLU |
Linear (output) | Size: 1 and activation: linear |
Hyperparameters of the STFT CNN.
Parameter . | Values . |
---|---|
Epochs | 128 |
Batch size | 16 |
Loss function | Mean squared error |
Optimizer | Adam (β1 = 0.9, β2 = 0.999, ) |
Learning rate | 0.0005 (adaptive, with decay to 0.0001, if validation loss did not improve after eight epochs) |
Parameter . | Values . |
---|---|
Epochs | 128 |
Batch size | 16 |
Loss function | Mean squared error |
Optimizer | Adam (β1 = 0.9, β2 = 0.999, ) |
Learning rate | 0.0005 (adaptive, with decay to 0.0001, if validation loss did not improve after eight epochs) |
Architecture of the 1D CNN.
Layer . | Properties . |
---|---|
Conv1D 1-1 and Conv1D 2-1 | Kernel: (50), filter: 64, activation: ReLU, and padding: same |
Conv1D 1-2 and Conv1D 2-2 | Kernel: (20), filter: 64, activation: ReLU, and padding: same |
MaxPool1D 1 and MaxPool1D 2 | Kernel: (2) |
Dense 1 | Size: 256, activation: ReLU, and dropout: 0.1 |
Dense 2 | Size: 256 and activation: ReLU |
Linear (output) | Size: 1 and Activation: linear |
Layer . | Properties . |
---|---|
Conv1D 1-1 and Conv1D 2-1 | Kernel: (50), filter: 64, activation: ReLU, and padding: same |
Conv1D 1-2 and Conv1D 2-2 | Kernel: (20), filter: 64, activation: ReLU, and padding: same |
MaxPool1D 1 and MaxPool1D 2 | Kernel: (2) |
Dense 1 | Size: 256, activation: ReLU, and dropout: 0.1 |
Dense 2 | Size: 256 and activation: ReLU |
Linear (output) | Size: 1 and Activation: linear |
Hyperparameters of the 1D CNN.
Parameter . | Values . |
---|---|
Epochs | 64 |
Batch size | 16 |
Loss function | Mean squared error |
Optimizer | Adam (β1 = 0.9, β2 = 0.999, ) |
Learning rate | 0.0005 (adaptive, with decay to 0.0001, if validation loss did not improve after five epochs) |
Parameter . | Values . |
---|---|
Epochs | 64 |
Batch size | 16 |
Loss function | Mean squared error |
Optimizer | Adam (β1 = 0.9, β2 = 0.999, ) |
Learning rate | 0.0005 (adaptive, with decay to 0.0001, if validation loss did not improve after five epochs) |