Separating lithium metal foil into individual anodes is a critical process step in all-solid-state battery production. With the use of nanosecond-pulsed laser cutting, a characteristic quality-decisive cut edge geometry is formed depending on the chosen parameter set. This cut edge can be characterized by micrometer-scale imaging techniques such as confocal laser scanning microscopy. Currently, experimental determination of suitable process parameters is time-consuming and biased by the human measurement approach, while no methods for automated quality assurance are known. This study presents a deep-learning computer vision approach for geometry characterization of lithium foil laser cut edges. The convolutional neural network architecture Mask R-CNN was implemented and applied for categorizing confocal laser scanning microscopy images showing defective and successful cuts, achieving a classification precision of more than 95%. The algorithm was trained for automatic pixel-wise segmentation of the quality-relevant melt superelevation along the cut edge, reaching segmentation accuracies of up to 88%. Influence of the training data set size on the classification and segmentation accuracies was assessed confirming the algorithm’s industrial application potential due to the low number of 246 or fewer original images required. The segmentation masks were combined with topography data of cut edges to obtain quantitative metrics for the quality evaluation of lithium metal electrodes. The presented computer vision pipeline enables the integration of an automated image evaluation for quality inspection of lithium foil laser cutting, promoting industrial production of all-solid-state batteries with lithium metal anode.
I. INTRODUCTION
The strive for improved energy storage solutions drives efforts to commercialize lithium metal battery (LMB) technologies as potential substitutes for conventional lithium-ion batteries (LIBs). Adopting lithium metal as an anode active material promises next-generation battery types with increased specific energies and energy densities (Placke ., 2017). This capability originates from lithium’s exceptional theoretical specific capacity of 3 860 mAh g−1 and its low electrochemical potential of −3.04 V vs the standard hydrogen electrode (Xu ., 2014). However, electrochemical hurdles, such as low Coulombic efficiency and safety-critical formation of lithium dendrites, have thus far precluded utilization of lithium metal anodes in conjunction with liquid electrolytes (Lin ., 2017). Therefore, several post-lithium-ion battery chemistries leveraging lithium metal anodes, particularly combined with solid electrolytes, are extensively researched. These technologies include, among others, inorganic and organic all-solid-state batteries (ASSBs), lithium-sulfur batteries (LSBs), and lithium-air batteries (LABs) (Varzi ., 2020). While significant advancements in the cell chemistry and design of LMBs have been made, a notable gap in research addressing their industrialization remains (Frith ., 2023; Tan ., 2022; and Xu ., 2020b). To facilitate the commercial availability of LMBs, manufacturing systems, production processes, and quality assurance measures have to be developed.
Cutting out anodes of a specified geometry from lithium metal coil substrates with typical thicknesses in the low micrometer range is one of the critical process steps in industrial LMB production (Duffner ., 2021 and Schnell ., 2018). In laboratory-scale LMB manufacturing, lithium metal substrates are manually separated using hand tools, such as scissors or punches (Stumper ., 2023). As lithium metal adheres to mechanical cutting devices (Jansen ., 2018) due to its plastic deformation at low strain rates (Grady 1980), the cutting tools require successive cleaning. Progressive tool contamination complicates the transfer of fine blanking from conventional LIB to LMB production (Jansen ., 2019) due to the decreasing cut edge quality. Proposed techniques to maintain blade cleanliness, such as applying special coatings (Weber, 2019) or sacrificial interlayers (Backlund, 1977), are intricate to implement in high-throughput industrial production lines.
Thus, laser cutting is favorable given its non-contact, wear-free, and flexible working principle (Duffner ., 2021). In the realm of LIB production, nanosecond-pulsed laser systems are preferentially utilized for electrode cutting (Kriegler ., 2021; Lee and Suk, 2020; and Lutey ., 2015). This established application has positioned nanosecond-pulsed laser radiation as a promising choice for separating lithium metal substrates in LMB manufacturing. Moreover, it was recently demonstrated that laser pulses in the nanosecond range enable the separation of lithium metal substrates at exceptional cutting speeds of more than 5 m s−1 (Kriegler ., 2022).
Beam-matter interaction in the short-pulsed laser processing of metals is characterized by absorption, heat conduction, melting, melt expulsion, evaporation, and plasma formation (Leitz ., 2011). It was demonstrated that depending on the selected process parameters, melt displacement results in a raised edge along the cutting kerf when lithium metal substrates are cut by nanosecond laser radiation (Jansen ., 2018 and Kriegler ., 2022). This melt superelevation represents a surface feature of critical importance, as it is suspected to promote lithium dendrite growth. Such dendrites are needlelike structures that extend from the electrode surface, causing battery short-circuiting when piercing the separator layer.
It was demonstrated in the literature that inhomogeneities on the surface of lithium metal anodes promote fluctuations in local current densities as they influence the contact area to adjacent layers (Gireaud ., 2006). Consequently, melt superelevations draw an increased influx of lithium ions and are subject to accelerated lithium deposition rates during lithium plating, referred to as current focusing (Krauskopf ., 2020). Moreover, if a critical current density is surpassed during lithium stripping, a self-reinforcing mechanism successively accumulates voids and increases the current density. Hence, the increased local current densities during plating may initiate lithium dendrite formation when exceeding a characteristic value, ultimately causing cell death by electrical short circuits (Kasemchainan ., 2019). Moreover, lithium hydroxide may form in the heat-affected zone around the cut edge by reaction with water residues (Bocksrocker, 2022 and Jansen ., 2018). It is theorized that an uneven lithium surface composition can lead to non-uniform ionic surface conductivities, which in turn, encourage the emergence of dendritic lithium depositions (He ., 2019). Thus, choosing process parameters suitable to diminish melt formation is paramount to prevent lithium dendrite formation. Furthermore, detecting defective laser cuts contributes to quality-controlled LMB production. Whereas, in the laboratory-scale fabrication of prototype LMBs, the cut edge quality is typically not controlled, the efficient industrial production of LMBs with consistent performance characteristics demands scalable and quality-assured separation processes.
Optically inspecting the cut edge of lithium metal substrates using imaging techniques combined with automated image analysis accelerates the laborious identification of feasible process parameters and allows product quality control. The automatic feature extraction from digital images is commonly referred to as computer vision. The underlying methods can be used, among others, for image classification, object detection, semantic segmentation, and instance segmentation (Lin ., 2014). Recently, the machine learning subfield of deep learning has gained increasing interest in processing image data. Deep multilayered neural networks learn implicit relations within data sets, expanding the detection capabilities compared to conventional image analysis methods (LeCun ., 2015) like thresholding techniques (Ng, 2006). Their high versatility renders deep learning algorithms robust against fluctuating production environments, such as changing lighting conditions, and allows their transfer to modified production scenarios (Smith ., 2021). These advantages, in combination with the emerging abundance of computing resources, have promoted deep learning based on neural networks as a leading computer vision method (Chai ., 2021 and Deng, 2014).
Convolutional neural networks (CNNs) consisting of convolutional, pooling, and fully connected layers represent deep, feed-forward networks well-suited for computer vision tasks (LeCun ., 2015). From the various evolving CNN architectures (Bharati and Pramanik 2020 and Guo ., 2016), the algorithm type and implementation parameters must be selected according to the envisaged application-specific trade-off between runtime and accuracy (Huang ., 2017). The region-based CNN (R-CNN), initially introduced as an algorithm for bounding-box object detection (Girshick ., 2014) and its extensions (Girshick, 2015 and Ren ., 2015), served as the basis for Mask-Regional CNN (Mask R-CNN). Mask R-CNN complements object detection with instance segmentation, creating pixel-to-pixel segmentation masks for each region of interest (ROI) (He ., 2017), standing out for its high accuracy (Bharati and Pramanik, 2020). Thus, Mask R-CNN offers a comprehensive solution for detecting and classifying objects through bounding boxes and pixel-level segmentation.
Data acquisition is often elaborate for industrial computer vision tasks, particularly when considering the large amount of annotated data demanded by CNNs (Krizhevsky ., 2017). Therefore, a CNN can be pre-trained with abundant data to learn low-level, data-unspecific features. Following this initial phase, domain-specific fine-tuning can be applied to repurposing the learned features to a target data set and task (Girshick ., 2014). Such a transfer learning approach (Yosinski ., 2014) alleviates the scarcity of task-specific training data (Bengio 2012).
Applying Mask R-CNN and comparable computer vision algorithms was proposed within a multitude of domains, including agriculture (Gené-Mola ., 2020; Gonzalez ., 2019; and Qiao ., 2019), infrastructure (Guo ., 2021 and Xu ., 2022), medicine (Anantharaman ., 2018 and Ronneberger ., 2015), and materials science (Masubuchi ., 2020). Although CNNs exhibit exceptional detection capabilities, published research regarding their utilization for computer vision in industrial production is limited (Wuerschinger ., 2020). This scarcity may partly be attributed to the industries’ reluctance against the black box approach. Nonetheless, the potential of CNNs in industrial quality control was showcased several times.
Various CNNs were applied for the automated visual inspection of friction stir welds using camera and topography images (Hartl ., 2019). Courtier . (2021) used a CNN to classify laser-cut stainless steel samples according to the applied cutting speed (Courtier ., 2021). In additive manufacturing, CNNs were applied for characterizing surface defects in scanning electron microscopy images of samples produced by selective laser melting (Wang ., 2022). For femtosecond laser processing, CNNs were implemented to predict process parameters (Mills ., 2019) and to detect beam misalignments (Xie ., 2019) using camera images. Furthermore, CNNs were utilized in battery production to classify laser weld defects of battery safety vents in digital images (Yang ., 2020a and Yang ., 2020b). Moreover, the approach was extended by a pixel-level localization of weld defects using semantic segmentation networks (Yang ., 2022 and Zhu ., 2021).
Assigning pixel-level masks by instance segmentation allows the derivation of quantitative values on the location, shape, and size of objects, rendering it an excellent method for feature evaluation in microscopy images. Therefore, this study addresses the applicability of CNN-based computer vision for parameter selection and quality assurance of lithium metal laser cutting in ASSB production. Mask R-CNN is proposed as a feasible algorithm for classifying and segmenting confocal laser scanning microscopy (LSM) images showing cut edges of lithium metal foils separated by laser radiation. The segmentation masks are applied to gain quantitative information on the cut edge geometry by combining them with topography data.
II. METHODOLOGY AND EXPERIMENTAL APPROACH
A. Sample fabrication
Battery-grade lithium metal foils (China Energy Lithium, China) with a thickness of 50 μm were processed using a nanosecond-pulsed fiber laser (SP-200P-A-EP-Z-L-Y, TRUMPF formerly SPI, Germany) emitting radiation with a wavelength of 1060 nm. The laser source allowed the adjustment of the pulse waveform and enabled average output powers of up to 200 W. The laser beam was deflected by a high-speed galvanometric scanning unit (Superscan IV-30, Raylase, Germany) and focused via a telecentric F-theta lens (S4LFT2163/126, Sill Optics, Germany) with a focal length of 163 mm to a spot radius of approximately 14 μm. Due to the high reactivity of lithium metal, the samples were enclosed in a container filled with dry air, which allowed the laser beam to enter via a transparent laser window. Cuts of 10 mm length were produced using 288 parameter combinations, varying the laser power, the pulse repetition rate, the pulse duration, and the laser beam scanning velocity. The experimental plan encompassed a wide range of process parameters and is detailed in Table V in the Appendix. Depending on the process parameters used, material removal was based on melt expulsion and evaporation leading to a characteristic melt superelevation along the cutting kerf. The experimental setup and cause-effect relationships between process parameters and cutting kerf features are detailed in a previous publication (Kriegler ., 2022).
B. Image acquisition
Images of the laser cuts in the lithium metal samples were obtained using LSM (VK-X 1000, Keyence, Japan) at a 480-fold magnification, resulting in a captured image region of approximately 702 × 527 μm2. The cutting kerfs were manually centered in the microscope’s image field. The samples were evaluated at approximately 5 mm from the incision point to exclude process instabilities at the start of the laser cut. The illumination strength was automatically determined by the microscope’s measurement software (VK.H2X, Keyence, Japan).
246 color images with a resolution of 1024 pixels × 768 pixels were recorded (see Fig. 7) using the complementary metal-oxide-semiconductor (CMOS) sensor integrated into the LSM. Additionally, the topography of the electrode surface was captured using the confocal laser height measurement function of the LSM with a laser beam wavelength of 661 nm. The acquisition frame rate was 15 Hz and the vertical scan step size was 0.75 μm. The images were tilt-corrected and the workpiece surface was referenced to zero height. The height information was exported to a comma-separated values (csv) file format.
C. Data preparation
The color images were converted to the joint photographic experts’ group (jpg) format. Subsequently, the data set was artificially augmented by 180° rotation and horizontal mirroring to quadruple the data stock to 984 images. Data augmentation by modifying images on a pixel level, for instance, by altering the brightness, was disregarded as constant illumination during image acquisition and ensured consistent image quality in industrial production. Each of the 984 images from the data stock was assigned to one of the three classes (see Table I) and ground-true labeled with polygon lines by a human expert. The open-source software LabelMe (Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, USA) was used to segment the melt superelevations.
Identifier . | Class 1 (defective cut) . | Class 2 (successful cut/regular melt) . | Class 3 (successful cut/irregular melt) . |
---|---|---|---|
Exemplary class image | |||
Class description | Class 1 images show a defective cut with a non-continuous cutting kerf and fused melt superelevations. Defective cuts can occur for various reasons, such as unsuitable process parameter selection, a laser malfunction, or a workpiece positioning error. Workpieces corresponding to class 1 images are allocated as rejects for industrial lithium metal battery production. | Class 2 images show a continuous cutting kerf with clearly separated melt rims at its sides and even melt superelevations with a quasi-constant width, characteristic of a stable cutting process. It is assumed that class 2 cuts can be accepted within industrial lithium metal battery production if the topography deviations are in a reasonable range. | Class 3 images show a continuous cutting kerf with clearly separated melt rims at its sides but with melt superelevations irregular in width and shape, indicating an unstable process behavior. It is assumed that class 3 cuts can be accepted within industrial lithium metal battery production if the topography deviations are in a reasonable range. However, the inconsistency in the product quality may render the corresponding process parameter set rather undesirable. |
Data set size | 336 images | 105 images | 483 images |
Identifier . | Class 1 (defective cut) . | Class 2 (successful cut/regular melt) . | Class 3 (successful cut/irregular melt) . |
---|---|---|---|
Exemplary class image | |||
Class description | Class 1 images show a defective cut with a non-continuous cutting kerf and fused melt superelevations. Defective cuts can occur for various reasons, such as unsuitable process parameter selection, a laser malfunction, or a workpiece positioning error. Workpieces corresponding to class 1 images are allocated as rejects for industrial lithium metal battery production. | Class 2 images show a continuous cutting kerf with clearly separated melt rims at its sides and even melt superelevations with a quasi-constant width, characteristic of a stable cutting process. It is assumed that class 2 cuts can be accepted within industrial lithium metal battery production if the topography deviations are in a reasonable range. | Class 3 images show a continuous cutting kerf with clearly separated melt rims at its sides but with melt superelevations irregular in width and shape, indicating an unstable process behavior. It is assumed that class 3 cuts can be accepted within industrial lithium metal battery production if the topography deviations are in a reasonable range. However, the inconsistency in the product quality may render the corresponding process parameter set rather undesirable. |
Data set size | 336 images | 105 images | 483 images |
The labeled images were converted to the common objects in context (coco) format (Lin ., 2014), saved as javascript object notation (json) files, and used for training, validation, and testing of the CNN. 60 images were selected for model testing and excluded from model training/validation with equal shares of class 1, class 2, and class 3 images. From the remaining 924 images, 700 and 224 were assigned to the training and validation data sets, respectively. The validation data set was used for hyperparameter tuning, particularly for early-stage detection of overfitting. Smaller data sets with 22/7, 44/14, 88/28, 175/56, and 350/112 training/validation images were created by randomly removing images to test the influence of the training/validation data quantity.
All rotated and mirrored variants of the original test images were removed from the training/validation set to evaluate the influence of data augmentation on model performance in another experiment. The resulting 744 images for training and validation were again divided into data sets of 22/7, 44/14, 88/28, 175/56, and 350/112 training/validation images. This modified data augmentation approach is referred to as limited data augmentation, and the data sets are referred to by their number of training images throughout this work.
D. Implementation of the convolutional neural network
Code implementation and execution were realized on a standard personal computer (see Table II). Computational analysis was performed using Python 3.7 with the Tensorflow 2.0 and the Keras 1.15 frameworks. An open-source version of Mask R-CNN (Waleed, 2017) with a ResNet101 (He ., 2016) backbone was selected as the model basis due to its high accuracy, providing bounding boxes and semantic feature masks. No particular hyperparameter optimization, for example, of the learning rate, was performed as this study focused on model application to an industrial use case. A learning rate of 10−4 was chosen for all experiments based on explorative preliminary tests. The image characteristics and the model hyperparameters for training are summarized in Table III. A transfer learning approach was followed using a pre-trained Mask R-CNN model (Waleed, 2017) with initial weights based on 35 000 images from the generic coco data set (Lin ., 2014), reducing the training effort.
CPUa . | RAMb . | GPUc . | VRAMd . | CUDA capabilitye . | Operating system . |
---|---|---|---|---|---|
Intel(R) Core(TM) i5-6400 | 16 GB | NVIDIA GTX 1070 | 8 GB | 6.1 | Windows 10 |
CPUa . | RAMb . | GPUc . | VRAMd . | CUDA capabilitye . | Operating system . |
---|---|---|---|---|---|
Intel(R) Core(TM) i5-6400 | 16 GB | NVIDIA GTX 1070 | 8 GB | 6.1 | Windows 10 |
CPU, central processing unit.
RAM, random-access memory.
VRAM, video random access memory.
GPU, graphical processing unit.
The compute unified device architecture (CUDA) allows the usage of GPUs for general-purpose computing.
Image dimensions . | Image size . | Original image type . | Learning rate . | Weight decaya . | Momentuma . | Batch size . | Epochs . |
---|---|---|---|---|---|---|---|
1024 pixels × 768 pixels | ≈2.1 MB | pngb | 10−4 | 1 × 10−4 | 0.9 | 1 | 100 |
Image dimensions . | Image size . | Original image type . | Learning rate . | Weight decaya . | Momentuma . | Batch size . | Epochs . |
---|---|---|---|---|---|---|---|
1024 pixels × 768 pixels | ≈2.1 MB | pngb | 10−4 | 1 × 10−4 | 0.9 | 1 | 100 |
The weight decay and momentum were chosen according to He . (2017).
png, portable network graphics.
E. Metrics for algorithm performance assessment
Furthermore, the data set training time and the image test time were recorded to assess the applicability of the methodology in industrial production.
F. Quantitative evaluation of the cut edges
The melt height, the melt width, and the width of the cutting kerf were defined as quality features of the cut edge and were analyzed automatically by a human expert.
For an automatic quantitative assessment of the laser cut edges, the following steps were carried out using the segmentation masks:
Creation of binary segmentation matrices (size: 1024 pixels × 768 pixels) from the segmentation masks by setting every value corresponding to the segmentation mask to 1 and all other values to 0.
Extraction of LSM topography data by exporting height matrices (size: 1024 pixels × 768 pixels) containing the height values at each spatial position.
Segmentation matrix correction using the topography data by removing all pixels with height values below the reference plane from the segmentation matrices (i.e., setting their value to 0).
Segmentation matrix correction by deletion of the 50 uppermost and 50 lowermost pixel rows corresponding to approximately 34 μm at the top and the bottom of the image to reduce the influence of segmentation mask inaccuracies at the image margins.
Multiplication of the binary matrices with the width of one pixel (0.686 μm), line-wise summation of the matrix values for the left/right melt superelevation, and extraction of the mean melt width.
Calculation of the standard deviation of the melt width using the line values to allow an assessment of the homogeneity of the melt superelevation.
Determination of the mean kerf width by line-wise subtraction of the innermost point of the left melt superelevation from the innermost point of the right melt superelevation and calculation of the mean value of all rows.
Multiplication of the binary matrices with the height values from the topography data, line-wise summation of the matrix values for the left/right melt superelevation, and calculation of the mean melt height and its according standard deviation.
To receive reference values for the automatically determined metrics, the topography data was analyzed by a human expert using the microscope’s software module (Multifile Analyzer, Keyence, Japan). Average profile lines were generated by calculating the mean of 768 profile lines perpendicular to the cutting kerf. The distance between the profile lines was 0.686 μm, corresponding to the size of one pixel. The processing pipeline is summarized in Fig. 1 and detailed in Table VII with exemplary intermediate and final images for the three defined classes.
III. RESULTS AND DISCUSSION
A. Image classification
Mask R-CNN was used for classifying the images into defective cuts (class 1) and successful cuts (class 2 and 3). Such a classification can be useful for identifying a rough parameter window or for detecting production defects. At this point, no distinction was made between class 2 and class 3 images since these classes do not indicate production defects but allow conclusions on the process behavior. Reducing the effort for image acquisition, data preparation, and model training is crucial for implementing computer vision applications in industrial production. The required amount of annotated data for reaching a certain precision with a CNN is generally unknown. Thus, the influence of the data set size on the classification precision was evaluated by training the model with six data sets containing 22–700 training images randomly selected from the data stock.
The test data set contained 20 images each of defective cuts (class 1), successful cuts with a regular melt superelevation (class 2), and successful cuts with an irregular melt superelevation (class 3). The training time per image was approximately 2.2 min and the absolute training time scaled virtually linearly with the number of training images. Accordingly, absolute training times between 0.81 and 25.1 h resulted in 22 and 700 training images, respectively (see Table IV).
Figure 2 shows the learning curves over the number of epochs for the six different data set sizes. The training loss converges to values between 0.1 and 0.2, indicating a successful training process for all data set sizes. Convergence has already been reached for the smallest data set of 22 training images, but lower training losses are reached after fewer epochs when images are added to the training data set. Also, the standard deviation between three different training runs decreases when more training images are used, indicating an enhanced reproducibility of the results by reducing the dependence on individual images.
As expected, while the training loss is reduced over the number of training epochs, the precision of the model for object classification increases (see Fig. 3). While class 2 and class 3 objects were widely classified correctly for small training data sets, class 1 images were misclassified more frequently, which is also visible in the confusion matrices presented in Table VI. The lower classification accuracy for class 1 images might result from the lower number of images showing defective cuts in the data set (336 class 1 images vs 588 class 2/class 3 images). Also, an assignment to class 1 sometimes is not unambiguously possible even for a human expert since the color images impede determining whether a continuous cut is present in some cases. When using 175 training images or more, the precision significantly exceeded 90% and reached 98.3% for 700 training images with a correct classification of all class 2/class 3 images after two of the three training runs.
The decreasing standard deviation for the training loss and the precision between the separate training runs underlines the relevance of individual images for the training process. However, this indicates that, by carefully selecting the training images covering a broad range of object peculiarities, a further boost of the algorithm performance can be reached despite a low number of original images.
For the data sets of 22–350 training images, an additional training run was performed using only the original test images, which were not seen in a mirrored or rotated version during training, limiting the artificial data augmentation (compare Sec. II C).
It was studied if the usage of mirrored or rotated versions of an original image for training and testing caused overfitting. The precision gained by testing with solely unseen original images was within or in proximity to the standard deviation range of the results gained using the comprehensive data augmentation approach (see Fig. 8 and Table IV). However, for 22–88 training images, the limited data augmentation approach led to a higher precision.
For 175 and 350 training images, the limited data augmentation yielded slightly lower precisions than the conventional approach. This might be due to the higher chance of having mirrored or rotated image versions in the training data set when the comprehensive data augmentation approach is used, causing an adaption of image-specific features. Thus, a marginal overfitting might have occurred for the comprehensive data augmentation approach. Nevertheless, the high precisions reached render data augmentation a feasible method to reduce the necessary amount of original data for the presented use case.
The fast convergence and comparably high precision achieved with small data sets are presumably a consequence of the model pre-training and the rather simple classification task with only two object classes. The Mask R-CNN model returns a class probability for detected objects in an image, which can be used to reduce misclassification. Therefore, the RPN threshold, which was set to 0.7 within this study, can be increased to achieve a quasi-zero false negative rate for classification as a defective cut (class 1) (Gené-Mola ., 2020 and Xu ., 2020a). Thus, if an image cannot be unambiguously classified, a manual post-inspection by an operator can be triggered in industrial battery production.
B. Segmentation of the melt superelevation
Mask R-CNN is not only suitable for object classification but also returns a pixel-level object segmentation mask. Figure 4 demonstrates that as few as 44 training images yielded an IoU of 74.4% after model training for 100 epochs. The high IoU confirms the capability of Mask R-CNN to achieve an accurate segmentation even with a scarce data set, as also shown for a use case from the medicine domain (Anantharaman ., 2018). When 22 training images were used, the IoU standard deviation, especially for class 1, was still high, owing to the dependence on the individual images in the training data set. However, when increasing the training image number, the standard deviation between individual training runs diminished significantly.
This substantiates the negligible influence of individual images in the data set, allowing for achieving a high reproducibility. The highest IoU of 87.9% was obtained for the largest training datasets containing 700 images while the standard deviation only amounted to 0.1%.
The 60 test images were separately labeled by two experts to estimate the human segmentation reproducibility. Both experts were equally instructed on the LabelMe software and the relevant image features. The segmentation of one image took around 1–2 min for a trained expert. The mean IoU between the manually accomplished image masks was 90.7%, resulting from individual labeling preferences and a limited accuracy when adjusting the polygon line in a reasonable amount of time. Thus, the inherent inaccuracy in human-based image segmentation might limit the training progress and justify why an IoU of 100% between human-generated and automatically generated segmentation masks is practically impossible. However, IoU of 87.9% for all classes using 700 training images approached the human masking accuracy, which underlines the suitability of the chosen algorithm for the underlying segmentation task. Furthermore, when only considering correctly classified images, even higher IoUs resulted with almost no differences between the separate classes at higher image numbers (see Fig. 9).
The test time for object detection and segmentation amounted to approximately 850 ms per image, corresponding to a frame rate of 1.18 frames per second. This renders the presented approach a reasonable basis for online applications considering that a standard personal computer and no dedicated high-performance computing resource was used (see Table II). A further test time reduction for enabling inline quality control could be accomplished by increasing the computational performance or reducing the image resolution to increase the signal-to-noise ratio. The appropriate resolution must be selected based on the sensor unit being used, which for inline applications might be a high-resolution camera system or a laser-triangulation sensor working at a high sampling rate.
Another approach for image size reduction could be image cropping to remove nonrelevant image sections, such as parts of the background at the image borders. Following the classification precision, no correlation between the achieved IoU and the data augmentation approach was found.
The relevant achieved performance metrics for object classification and image segmentation are summarized in Table IV. The high precision and the IoU approximating the human labeling accuracy render the conducted Mask R-CNN implementation feasible for its application in industrial battery production.
No. of training images . | Absolute training time (h) . | Precision (P)a,b . | Precision (P) with limited data augmentationb . | Intersection over union (IoU)a,b . | Intersection over union (IoU) with limited data augmentationb . |
---|---|---|---|---|---|
22 | 0.81 | 70.0 ± 7.0% | 81.7% | 63.4 ± 6.1% | 71.0% |
44 | 1.55 | 86.1 ± 2.1% | 91.2% | 74.4 ± 2.6% | 78.9% |
88 | 3.01 | 85.6 ± 5.1% | 85.0% | 74.5 ± 5.0% | 74.3% |
175 | 5.87 | 97.2 ± 2.1% | 93.3% | 85.7 ± 1.5% | 81.8% |
350 | 11.91 | 96.1 ± 1.0% | 90.0% | 85.4 ± 0.6% | 79.9% |
700 | 25.09 | 98.3 ± 0% | — | 87.9 ± 0.1% | — |
No. of training images . | Absolute training time (h) . | Precision (P)a,b . | Precision (P) with limited data augmentationb . | Intersection over union (IoU)a,b . | Intersection over union (IoU) with limited data augmentationb . |
---|---|---|---|---|---|
22 | 0.81 | 70.0 ± 7.0% | 81.7% | 63.4 ± 6.1% | 71.0% |
44 | 1.55 | 86.1 ± 2.1% | 91.2% | 74.4 ± 2.6% | 78.9% |
88 | 3.01 | 85.6 ± 5.1% | 85.0% | 74.5 ± 5.0% | 74.3% |
175 | 5.87 | 97.2 ± 2.1% | 93.3% | 85.7 ± 1.5% | 81.8% |
350 | 11.91 | 96.1 ± 1.0% | 90.0% | 85.4 ± 0.6% | 79.9% |
700 | 25.09 | 98.3 ± 0% | — | 87.9 ± 0.1% | — |
Mean value and standard deviation of three test runs with a random set of training and validation images.
After 100 epochs.
As this study did not concentrate on algorithm optimization, but on the method applicability in industry, a further improvement of the results is supposedly achievable through comprehensive hyperparameter tuning (He ., 2017). Yet, despite the absence of a comprehensive hyperparameter tuning, a low number of training images resulted in a high precision P and IoU within this study.
C. Determination of quantitative quality features
The parameter selection for laser micro-machining is complex due to the high number of process parameters and their complex interdependencies. The shape of the melt superelevation makes it possible to investigate the process behavior and is a quality-relevant feature for laser cutting of lithium metal within all-solid-state battery production (Jansen ., 2018 and Kriegler ., 2022). Therefore, using the segmentation masks for gaining quantitative values characterizing the melt superelevation supports process design. Furthermore, an automated inspection of the cut edge quality allows to detect process fluctuations enabling corrections. Figure 5 shows exemplary binarized images with the segmentation masks for the melt superelevations as well as the height images resulting from overlaying the topography images with the binarized segmentation masks. While the geometry metrics of the class 2 image possess a low standard deviation, high standard deviations for the melt width, the melt height, and the kerf width characterize the cutting kerf in the class 3 image. The automatic assessment of geometry parameters was validated by comparison to measurements performed by human experts (see Sec. II E). Most of the automatically determined melt heights, melt widths, and kerf widths within the test images deviated by less than 20% from the human-based measurements [Figs. 6(a)–6(c)].
The slight divergence of the values can be explained by the differing measurement approaches. A systematic downward deviation in the automatically determined melt heights is visible in Fig. 6(a). For automatic measurement, the mean melt height is determined by calculating a mean of all pixel height values that are part of the segmentation mask.
In contrast, in the human-based measurement approach, an average cross section is created by calculating mean height values for each image column. Furthermore, as the manual measurements are based on averaged cross sections, the transition from a melt superelevation to the surrounding bulk material is defined either at the position where the averaged cross section line undershoots the reference plane or by visual criteria in the color images. Thus, since the manual measurements partly depend on the subjective perception of the human expert, errors might arise, especially for class 3 images with irregular melt complicating boundary identification [see Figs. 6(b) and 6(c)]. The lowest melt heights and widths detected were below 10 and 40 μm, respectively. Correlating these values to the underlying process parameters enables the selection of feasible parameter sets. Additionally, a low kerf width, which might be the consequence of unsuitable process parameters, for example, an insufficient laser power or an excessive scanning velocity, can be used to predict a transition from class 2/class 3 to class 1 in the case of an incorrect adjustment of the process parameters.
The mere consideration of the mean melt height and the melt width does not allow for classification into class 2 or class 3, as a homogenous melt superelevation might have the same quantitative mean values as an irregular melt superelevation. Extracting the standard deviation of the melt height, the melt width, and the kerf width allows an alternative classification approach [see Figs. 6(d)–6(f)]. Class 2 is characterized by a low standard deviation of all quantitative values, while class 3 shows a higher standard deviation.
Individual quantitative values are calculated for the left and the right melt superelevations as separate segmentation masks are generated. Directionalities in melt formation, for example, coming from a beam misalignment, can be detected.
The execution of the entire test pipeline, including the segmentation process and the determination of quantitative values, took around 1.3 s per image. Thus, the data set of 60 images could be analyzed in less than 1.5 min, while a human-based inspection would take up to one hour. The developed pipeline covering object detection, instance segmentation, and the determination of qualitative values is not limited to the presented task but can be transferred to other laser micro-machining tasks. As nanosecond-pulsed laser cutting is industrially well established (Meijer 2004 and Mishra and Yadava 2015), for example, in the photovoltaic (Bovatsek ., 2010), packaging (Lutey ., 2013), and battery industry (Kriegler ., 2021), a high potential for applying the presented approach arises. Moreover, the proposed method could be used for inspecting the output of other process steps in battery manufacturing, such as the detection of electrode defects (Badmos ., 2020), the characterization of laser-structured LIB electrodes (Hille ., 2023), or the analysis of three-dimensionally-structured ceramic solid electrolyte layers (Kriegler ., 2023).
IV. CONCLUSION AND FUTURE WORK
In this study, a computer vision pipeline was presented, allowing the qualitative and quantitative assessment of lithium metal foil cut edges separated by laser radiation for quality inspection in all-solid-state battery production. The state-of-the-art deep learning algorithm Mask R-CNN was implemented for detecting and segmenting the melt superelevations along cut edges in color images recorded by confocal laser scanning microscopy. 248 images were captured showing cut edges stemming from laser cutting with various parameter sets and the data set was artificially augmented. The classification ability of the algorithm was used to distinguish between defective and successful laser cuts. The relation between the training data set size and the classification accuracy was discussed, showing a precision of more than 95% for 175 or more training images. The low number of original training images required to reach high classification precisions underlines the high applicability of the approach for industrial use cases where data acquisition is complicated. The melt superelevation was automatically segmented, reaching an intersection over union between 63.4% and 87.9% depending on the number of training images. These values approach the human-based segmentation repeatability of 90.7%, substantiating the high suitability of Mask R-CNN for feature extraction. The segmentation masks were employed to determine quantitative values characterizing the geometry of the cutting kerf and the melt superelevations, which enabled an automated cut edge quality evaluation allowing conclusions on the suitability of a parameter set for lithium metal laser cutting. The presented pipeline can be easily transferred to quality inspection for other micro-machining applications. Furthermore, the approach’s high versatility makes it applicable to other imaging techniques providing three-dimensional information, such as white light interferometry. Overall, the developed approach supports the production of high-quality all-solid-state batteries by facilitating the selection of feasible process parameters and automated quality assurance for the laser cutting of lithium-metal foils.
Future works may include implementing the method for a continuous quality inspection by transferring the approach to feasible inline sensor systems and increasing the algorithm’s computational efficiency. Moreover, a transfer of the approach to related applications is targeted, considering other laser processes, materials, and quality characteristics.
ACKNOWLEDGMENTS
This work was funded by the German Federal Ministry of Education and Research (BMBF) under Grant No. 03XP0184l (ProFeLi). The authors gratefully acknowledge support. The authors thank Elena Jaimez-Farnham for manual image labeling.
AUTHOR DECLARATIONS
Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Author Contributions
Johannes Kriegler: Conceptualization (lead); Data curation (equal); Formal analysis (lead); Investigation (lead); Methodology (lead); Validation (equal); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). Tianran Liu: Data curation (lead); Formal analysis (equal); Investigation (equal). Roman Hartl: Validation (equal); Writing – review & editing (equal). Lucas Hille: Formal analysis (equal); Writing – review & editing (equal). Michael F. Zaeh: Funding acquisition (lead); Project administration (lead); Supervision (lead); Writing – review & editing (equal).
DATA AVAILABILITY
Additional information, such as image data, software code of the trained Mask R-CNN, and software code for the generation of quantitative quality indicators are available upon reasonable request.
APPENDIX: SUPPLEMENTARY EXPERIMENTAL RESULTS
Exemplary images of a lithium metal foil sample obtained using confocal laser scanning microscopy are shown in Fig. 7.
Evolution of the precision (P) using training data sets with limited data augmentation neglecting augmented images from the test data set in the training and validation data sets are shown in Fig. 8.
Evolution of the intersection over union (IoU) for correctly classified images only are shown in Fig. 9.
An overview of the experimental series in which the training, test, and validation images were created is presented in Table V.
Confusion matrices for the classification of lithium laser cut images are presented in Table VI.
The image processing pipeline at the hand of exemplary class images is presented in Table VII.
Experimental series . | No. of images . | Pulse duration (ns) . | Pulse repetition rate (kHz) . | Laser power (W) . | Scanning speed (m s−1) . | Experiment repetitions . |
---|---|---|---|---|---|---|
I | 12 | 261, 508, 820, 1220 | 200 | 100 | 1000 | 3 |
II | 96 | 29, 108, 177, 261 | 800 | 100 | 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400 | 3 |
III | 48 | 261 | 200, 500, 750, 1000 | 50, 100, 150, 200 | Maximum | 3 |
IV | 84 | 29 | 1000, 1500, 2000, 2500, 3000, 3500, 4000 | 50, 100, 150, 200 | Maximum | 3 |
V | 48 | 13 | 2950, 3350, 3750, 4150 | 50, 100, 150, 200 | Maximum | 3 |
Experimental series . | No. of images . | Pulse duration (ns) . | Pulse repetition rate (kHz) . | Laser power (W) . | Scanning speed (m s−1) . | Experiment repetitions . |
---|---|---|---|---|---|---|
I | 12 | 261, 508, 820, 1220 | 200 | 100 | 1000 | 3 |
II | 96 | 29, 108, 177, 261 | 800 | 100 | 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400 | 3 |
III | 48 | 261 | 200, 500, 750, 1000 | 50, 100, 150, 200 | Maximum | 3 |
IV | 84 | 29 | 1000, 1500, 2000, 2500, 3000, 3500, 4000 | 50, 100, 150, 200 | Maximum | 3 |
V | 48 | 13 | 2950, 3350, 3750, 4150 | 50, 100, 150, 200 | Maximum | 3 |
. | . | Predicted . | |||||||
---|---|---|---|---|---|---|---|---|---|
. | . | Positive . | Negative . | ||||||
True . | Positive . | True positive (TP) | False positive (FP) | ||||||
Negative . | False negative (FP) | True negative (TN) | |||||||
. | . | . | Predicted . | ||||||
. | . | . | Test run 1 . | Test run 2 . | Test run 3 . | Precision after 100 epochsa . | |||
. | . | . | Class 1 . | Class 2/Class 3 . | Class 1 . | Class 2/Class 3 . | Class 1 . | Class 2/Class 3 . | |
22 training images | Class 1 | 3 | 17 | 11 | 9 | 0 | 20 | 70.0 ± 7.0% | |
Class 2/class 3 | 0 | 40 | 1 | 39 | 0 | 40 | |||
44 training images | Class 1 | 14 | 6 | 18 | 2 | 12 | 8 | 86.1 ± 2.1% | |
Class 2/class 3 | 1 | 39 | 8 | 32 | 0 | 40 | |||
88 training images | Class 1 | 16 | 4 | 15 | 5 | 13 | 7 | 85.6 ± 5.1% | |
Class 2/class 3 | 9 | 31 | 1 | 39 | 0 | 40 | |||
175 training images | Class 1 | 18 | 2 | 17 | 3 | 20 | 0 | 97.2 ± 2.1% | |
Class 2/class 3 | 0 | 40 | 0 | 40 | 0 | 40 | |||
350 training images | Class 1 | 18 | 2 | 17 | 3 | 18 | 2 | 96.1 ± 1.0% | |
Class 2/class 3 | 0 | 40 | 0 | 40 | 0 | 40 | |||
700 training images | Class 1 | 19 | 1 | 20 | 0 | 19 | 1 | 98.3 ± 0.0% | |
Class 2/class 3 | 0 | 40 | 1 | 39 | 0 | 40 |
. | . | Predicted . | |||||||
---|---|---|---|---|---|---|---|---|---|
. | . | Positive . | Negative . | ||||||
True . | Positive . | True positive (TP) | False positive (FP) | ||||||
Negative . | False negative (FP) | True negative (TN) | |||||||
. | . | . | Predicted . | ||||||
. | . | . | Test run 1 . | Test run 2 . | Test run 3 . | Precision after 100 epochsa . | |||
. | . | . | Class 1 . | Class 2/Class 3 . | Class 1 . | Class 2/Class 3 . | Class 1 . | Class 2/Class 3 . | |
22 training images | Class 1 | 3 | 17 | 11 | 9 | 0 | 20 | 70.0 ± 7.0% | |
Class 2/class 3 | 0 | 40 | 1 | 39 | 0 | 40 | |||
44 training images | Class 1 | 14 | 6 | 18 | 2 | 12 | 8 | 86.1 ± 2.1% | |
Class 2/class 3 | 1 | 39 | 8 | 32 | 0 | 40 | |||
88 training images | Class 1 | 16 | 4 | 15 | 5 | 13 | 7 | 85.6 ± 5.1% | |
Class 2/class 3 | 9 | 31 | 1 | 39 | 0 | 40 | |||
175 training images | Class 1 | 18 | 2 | 17 | 3 | 20 | 0 | 97.2 ± 2.1% | |
Class 2/class 3 | 0 | 40 | 0 | 40 | 0 | 40 | |||
350 training images | Class 1 | 18 | 2 | 17 | 3 | 18 | 2 | 96.1 ± 1.0% | |
Class 2/class 3 | 0 | 40 | 0 | 40 | 0 | 40 | |||
700 training images | Class 1 | 19 | 1 | 20 | 0 | 19 | 1 | 98.3 ± 0.0% | |
Class 2/class 3 | 0 | 40 | 1 | 39 | 0 | 40 |
Mean value and standard deviation of three test runs with a random set of training and validation images.
. | Class 1 . | Class 2 . | Class 3 . |
---|---|---|---|
Original imagea | |||
Image labeled by a human expert | |||
Binarized version of an image labeled by a human expert | |||
Automatically segmented image with bounding box and pixel-level mask | |||
Binarized automatically segmented image | |||
Binarized automatically segmented image corrected using height values | No quantitative evaluation because categorized as efective laser cut | ||
Binarized automatically segmented image corrected by deletion of the 50 uppermost and lowermost pixel lines | No quantitative evaluation because categorized as defective laser cut | ||
Automatically segmented height image created by overlaying the binarized segmentation mask to the topography data | No quantitative evaluation because categorized as defective laser cut |
. | Class 1 . | Class 2 . | Class 3 . |
---|---|---|---|
Original imagea | |||
Image labeled by a human expert | |||
Binarized version of an image labeled by a human expert | |||
Automatically segmented image with bounding box and pixel-level mask | |||
Binarized automatically segmented image | |||
Binarized automatically segmented image corrected using height values | No quantitative evaluation because categorized as efective laser cut | ||
Binarized automatically segmented image corrected by deletion of the 50 uppermost and lowermost pixel lines | No quantitative evaluation because categorized as defective laser cut | ||
Automatically segmented height image created by overlaying the binarized segmentation mask to the topography data | No quantitative evaluation because categorized as defective laser cut |
Scale bars are inserted to facilitate reader’s understanding of the dimensions but are not included in the automatically processed images.