Incomplete cuts during laser fusion cutting result in a closed kerf, preventing the workpiece from detaching from the sheet and resulting in rework or rejection. We demonstrate the approach of a vision transformer, used for image classification, to detect cut interruption during laser fusion cutting in steel and aluminum. With events impending an incomplete cut in steel, we attempt to predict cut interruption before they even occur. To build a data set for training, cutting experiments are carried out with a 4 kW fiber laser, forcing incomplete cuts by varying the process parameters such as laser power and feed rate. The thermal radiation from the process zone during the cutting process is captured with a size of 256 × 256 px2 at sample rates of 20 × 103 fps. The kerf is recorded with a spectral sensitivity between 400 and 700 nm, without external illumination, which enables the melt to be observed in the range of the visual spectrum. The vision transformer model, which is used for image classification, splits the image into patches, linearly embedded with an added position embedding, and fed to a standard transformer encoder. For training the model, a set of images was labeled for the respective classes of a complete, incomplete, and impending incomplete cut. With the trained model, incomplete cuts in steel and aluminum can then be recognized and impending incomplete cuts in steel can be predicted in advance.
I. INTRODUCTION
Laser fusion cutting is a well-established process that uses a focused laser beam to continuously melt the material on the surface in the kerf at the front of the cut, which is expelled by an inert gas jet.1 Of interest for industrial applications is the ability to cut sheets at the highest possible feed rate without compromising the cut quality. Typical quality losses are the adhesion of the burr to the bottom edge of the sheet, the formation of irregular striation patterns on the kerf, and incomplete cuts. With increasingly unsuitable parameters, e.g., too high feed rate or too low laser power, an incomplete cut occurs. The melt is not expelled from the kerf, or the laser does not pierce through the material, resulting in a re-solidified melt pool, leaving the kerf sealed.
At higher feed rates, the angle of the kerf at the bottom part increases, resulting in a larger area, irradiated by the laser.2 With increasing temperature at the surface of the cut front, local vaporization occurs with high vapor temperatures, which disturbs the melt flow and may cause an interrupted striation pattern on the kerf edge.3 When parts of the melt flow do not separate from the kerf and solidify along the bottom edge of the sheet, burr forms.4 The burr formation during fiber laser cutting is influenced by melt flow instabilities, such as an interrupted melt wave and vaporization.5 In addition, reduced surface tension of the melt, due to increasing sulfur content for stainless steel, shows a lower susceptibility to burr formation as less melt adheres to the bottom edge of the sheet.6
One approach to guarantee good cut quality is to optimize the laser cut parameters themselves. Modern laser machines typically use a set of process parameters optimized for the specific processed material. Neural networks (NNs) are capable of predicting and optimizing suitable process parameters.7–10 Quality control is often still performed by an experienced user and replies on a response from inspection after production. Therefore, as an alternative, to achieve the desired quality, the process can be monitored and controlled, with various sensor concepts. Machine learning (ML) has received significant attention for monitoring a variety of laser processes.11–17 Specifically, for laser cutting, more publications on predicting the cutting quality with ML models have been published recently.18 Furthermore, the possibility of deducing the process parameters from images of the cutting edge using a convolutional neural network (CNN) model was demonstrated.19 However, sensor-based concepts with ML models for quality monitoring, particularly for laser fusion cutting, are still relatively rarely reported. Approaches using a custom CNN model as well as with an NN model from geometric features show that burr formation can be estimated from images taken coaxially from the process zone by a high-speed camera.20,21 Similarly, incomplete cuts can be detected with a CNN, again from images taken coaxially by a high-speed camera.22 In addition to optical sensors, microphones are also used to successfully detect incomplete cuts using a random convolution kernel transform model to distinguish complete and incomplete cuts from the audio signal caused by laser cutting.23 In this contribution, an approach is demonstrated for the detection of cut interruption during laser fusion cutting of steel and aluminum by a vision transformer (ViT) model.
II. METHODS
A. Monitoring setup and cutting experiments
The illustration in Fig. 1 depicts the setup including the cutting head with the attached high-speed camera to monitor the kerf during the laser cutting process. To create a database of images from cuts that the ViT can learn from, experiments were carried out using a 4 kW multimode fiber laser (YLS-4000, IPG Photonics), and a cutting head (HP SSL, Precitec), with a wavelength of 1070 nm, a M2 of 8.5, and a focal point diameter of 200 μm. The emitted laser beam, shown as a purple-red line, was collimated, transmitted through the dichroic mirror, and was focused onto the sheet in the process zone. The thermal radiation emitted from the melt, shown as a green dashed line, spread partly toward the cutting head, was reflected by the dichroic mirror and focused on the high-speed camera. The aim of this setup was to record solely the thermal radiation from the top view during the laser cutting process. The cutting head, including the monitoring unit, was moved over the sheet on a conventional 2D flatbed machine. To determine the relationship of the melt pool dynamics between complete and incomplete cuts, the process radiation in the kerf was recorded using a high-speed camera (Fastcam Mini AX50, Photron). The pixel size of the CMOS chip of the camera was 20 × 20 μm2 and a relative spectral sensitivity in the VIS spectrum. With a focal length of the processing lens and the focusing lens placed in front of the camera of 200 mm, the pixel size of the captured image was identical to the pixel size of the camera, resulting in a resolution of 1270 ppi. The field of view was limited by the nozzle with a diameter of 2 mm. Due to reflections of the melt at the edge of the nozzle, the recorded image was subsequently cropped to include only the melt pool, to a width and height of 51 × 51 px2 (i.e., 1.02 × 1.02 mm2). This was intended to train the model to classify images of the melt pool and prevent detection of reflections at the nozzle. The laser cutting process was recorded without external illumination to ensure only thermal radiation from the cutting kerf was recorded. To visualize the thermal radiation for mild steel, stainless steel, and zinc coated steel, the kerf was captured at 20 × 103 fps, with an exposure time of 20 μs. Due to significant differences in the amount of radiation emitted by the aluminum melt compared to steel caused by different emission coefficients, temperatures, surface conditions, and areas in the kerf, the exposure time had to be adjusted. Therefore, in order to visualize the thermal radiation for cuts in aluminum, the exposure time had to be set to 1.0 ms and the sampling rate accordingly to 1 × 103 fps.
Illustration of the monitoring system with the cutting head and high-speed camera.
Illustration of the monitoring system with the cutting head and high-speed camera.
As part of the experimental design, the cutting parameters, laser power, feed rate, and gas pressure were varied in order to distinguish between incomplete and complete cuts, as shown in Table I. For 1.0 mm mild steel and aluminum, the assist gas pressure was set to 10 bar instead of 16 bar, since otherwise no incomplete cut could be induced. The gas pressure was varied between 8 and 18 bar in mild steel. Incomplete cuts were caused by too low laser power, too high feed rates, and too low assist gas pressure. The experimental cuts were carried out on mild steel, stainless steel, zinc coated steel, and aluminum, resulting in 355 cutting tests.
Cutting parameters.
Material . | Sheet thickness (mm) . | Laser power (kW) . | Feed rate (mm s−1) . | Gas pressure (bar) . |
---|---|---|---|---|
Mild steel | 1, 3, 5, 10 | 2–4 | 5–300 | 8–18 |
Stainless steel | 1.5 | 2–4 | 60–220 | 16 |
Zinc coated steel | 1, 3 | 2–4 | 40–300 | 16 |
Aluminum | 1, 3, 5 | 2–4 | 30–300 | 10, 16 |
Material . | Sheet thickness (mm) . | Laser power (kW) . | Feed rate (mm s−1) . | Gas pressure (bar) . |
---|---|---|---|---|
Mild steel | 1, 3, 5, 10 | 2–4 | 5–300 | 8–18 |
Stainless steel | 1.5 | 2–4 | 60–220 | 16 |
Zinc coated steel | 1, 3 | 2–4 | 40–300 | 16 |
Aluminum | 1, 3, 5 | 2–4 | 30–300 | 10, 16 |
B. Creating the dataset
Three datasets were created for the detection of incomplete cuts. A data set only for complete and incomplete cuts, without distinguishing between the type of metal, resulted in two labeling classes. A further dataset for complete and incomplete cuts in the respective metal resulted in eight classes, and a data set for complete cuts, impending incomplete cuts, and incomplete cuts, for all steels, resulted in three classes. Figure 2 shows the sheet surface and sample images for the labeled monitored images of a complete cut, an impending incomplete cut, and an incomplete cut in mild steel. In order to differentiate between complete and incomplete cuts, the ViT was trained using supervised learning on a dataset of labeled example images for these events. Labeled as complete cut were all images in which a complete cut occurred, for an incomplete cut, only images of the part during an incomplete cut, e.g., after the drives had accelerated, and for impending incomplete cuts the transition from a complete cut to an incomplete cut. This did not occur for all interrupted cuts, such as for insufficient laser power, where the laser did not cut through the material during the entire cutting trial. The labeling process was applied accordingly to all materials used. Please note, only complete and incomplete cuts were labeled for aluminum, as no events were observed for an impending incomplete cut, and as was the case for steels. 20 cutting trials were set aside for subsequent testing of the trained ML model. Table II shows the number of cutting trials and the labeled frames for training and validation. For all three datasets, 20 000 training frames and 5000 validation frames were randomly selected for each class, to achieve an 80% training and 20% validation split, and to balance the number of frames for each class. Table III shows the number of cutting trials and frames of the separate cutting trials, which were subsequently tested by the trained ML model.
Visualization of a complete cut, an impending incomplete cut, and an incomplete cut with an illustrative image of the sheet metal and labeled recorded images.
Visualization of a complete cut, an impending incomplete cut, and an incomplete cut with an illustrative image of the sheet metal and labeled recorded images.
Number of cutting trials and frames for training and validation of complete cuts (cc), incomplete cuts (ic), and impending incomplete cuts (iic).
. | No. cutting trials . | No. frames . |
---|---|---|
Mild steel (cc) | 91 | 4 087 397 |
Mild steel (ic) | 36 | 931 734 |
Stainless steel (cc) | 32 | 704 242 |
Stainless steel (ic) | 11 | 63 645 |
Zinc coated steel (cc) | 37 | 1 382 401 |
Zinc coated steel (ic) | 29 | 270 815 |
Aluminum (cc) | 72 | 155 990 |
Aluminum (ic) | 27 | 28 117 |
Steel (iic) | 45 | 46 874 |
. | No. cutting trials . | No. frames . |
---|---|---|
Mild steel (cc) | 91 | 4 087 397 |
Mild steel (ic) | 36 | 931 734 |
Stainless steel (cc) | 32 | 704 242 |
Stainless steel (ic) | 11 | 63 645 |
Zinc coated steel (cc) | 37 | 1 382 401 |
Zinc coated steel (ic) | 29 | 270 815 |
Aluminum (cc) | 72 | 155 990 |
Aluminum (ic) | 27 | 28 117 |
Steel (iic) | 45 | 46 874 |
Number of cutting trials and frames for testing of complete cuts (cc), incomplete cuts (ic), and impending incomplete cuts (iic).
. | No. cutting trials . | No. frames . |
---|---|---|
Mild steel (cc) | 4 | 215 072 |
Mild steel (ic) | 4 | 92 066 |
Stainless steel (cc) | 1 | 16 027 |
Stainless steel (ic) | 1 | 4 196 |
Zinc coated steel (cc) | 2 | 69 369 |
Zinc coated steel (ic) | 2 | 6 145 |
Aluminum (cc) | 3 | 12 439 |
Aluminum (ic) | 3 | 1 012 |
Steel (iic) | 6 | 4 997 |
. | No. cutting trials . | No. frames . |
---|---|---|
Mild steel (cc) | 4 | 215 072 |
Mild steel (ic) | 4 | 92 066 |
Stainless steel (cc) | 1 | 16 027 |
Stainless steel (ic) | 1 | 4 196 |
Zinc coated steel (cc) | 2 | 69 369 |
Zinc coated steel (ic) | 2 | 6 145 |
Aluminum (cc) | 3 | 12 439 |
Aluminum (ic) | 3 | 1 012 |
Steel (iic) | 6 | 4 997 |
The images were then transformed and normalized to fit the ViT model. All images were resized to 224 × 224 px2. Afterward, each image from the training data was augmented for better generalization and to prevent overfitting by the model. For this purpose, each training image was randomly rotated by 90°, 180°, and 270° and flipped both horizontally and vertically. No augmentation was applied during validation and testing. The images were then converted to tensors, and the pixel values were normalized with the mean values 0.485, 0.456, 0.406 and the standard deviation 0.229, 0.224, 0.225 for the respective color channels and, finally, grouped with a batch size of 32.
C. Vision transformer model
Illustration of the vision transformer (a), the transformer encoder (b), and multihead attenion (c). (Refs. 24 and 25).
The classification head was implemented by an MLP with only one hidden layer.
Last, a training loop was created to train and optimize the weights of the model for several epochs. During training, the image passed forward through the model, and the class of the image was predicted. The loss was calculated as the difference between the actual class and the class predicted by the model. The cross-entropy loss with an integrated softmax function as an activation function was used as the loss function. Subsequently, backpropagation was performed to calculate the gradients of the loss. In the next step, the weights were optimized and updated. The Adam optimizer algorithm was employed as the optimization function, with a learning rate of 1 × 10−4 and exponential decay rates of 0.9 and 0.999. Finally, the validation data pass through the model with the adjusted weights and the respective loss and accuracy were calculated. For the training process, pretrained weights for the ViT were downloaded from the PyTorch library, which were trained on ImageNet1K_V1. All experiments were performed within the PyTorch 2.3 framework.
III. RESULTS AND DISCUSSION
Figure 4 depicts the accuracy achieved in the training and validation datasets by the ViT model for 100 epochs of training. The first dataset consists of all complete and incomplete cuts, resulting in two classes achieved an accuracy of the validation data of 99.93%. In the first few epochs up to 80, a strong fluctuation in the accuracy of the validation data can be observed. However, toward 100 epochs, the accuracy of the train data approaches the validation data, indicating a very high level of detection for complete and incomplete cuts. The second dataset consisting of complete and incomplete cuts of the respective materials achieves an accuracy of 98.82%. The accuracy of the validation data is more stable and smooth, although with a small discrepancy between the training and validation data. The third dataset, which consists of complete cuts, impending incomplete cuts, and incomplete cuts for all steels, achieves an accuracy of 99.18%. However, even after 100 epochs, a strong fluctuation of the validation data can be observed, while the accuracy of the training data increases steadily and smoothly. These are indications of a low generalization of the features by the model and overfitting.
Accuracy of the training for the ViT model for the dataset of all complete and incomplete cuts (a), complete and incomplete cuts for the respective material (b), and for complete cuts, impending incomplete cuts, and incomplete cuts in steel (c).
Accuracy of the training for the ViT model for the dataset of all complete and incomplete cuts (a), complete and incomplete cuts for the respective material (b), and for complete cuts, impending incomplete cuts, and incomplete cuts in steel (c).
In order to evaluate the model ability to perform on detection, the images of the 20 set aside cutting trials, which the model has not seen before, are classified by the three trained ViT models. Figure 5 shows the confusion matrix, which illustrates a summary of the correct and incorrect predicted images by the ViT, comparing the true labels with the predicted labels of a complete and incomplete cut for the test frames (cf. Table III). As can be observed, the tests yield a remarkably high level of detection. In a total of 416 596 frames, complete and incomplete cuts are detected with 99.94% accuracy. The performance of the ViT model for the second model is evaluated in detail in the confusion matrix in Fig. 6. Both complete and incomplete cuts were detected for the corresponding materials mild steel, stainless steel, zinc coated sheet steel, and aluminum. Of the 416 596 frames, 388 352 were correctly labeled, corresponding to an accuracy of 93.22%. The detection is 5.6% lower during training, as expected from the curve progression and distance between the training and validation accuracy. However, a detection rate of over 90% is still remarkable. Furthermore, it can be observed that the majority of mislabeled data are material-specific. For complete cuts in mild steel, 9116 images were predicted as complete cuts in stainless steel, and 8509 images were predicted as complete cuts in zinc coated steel.
Confusion matrix of the ViT model for the dataset of all incomplete cuts (ic) and complete cuts (cc).
Confusion matrix of the ViT model for the dataset of all incomplete cuts (ic) and complete cuts (cc).
Confusion matrix of the ViT model for the dataset complete cuts (cc) and incomplete cuts (ic) for aluminum, mild (M) steel, stainless (S) steel, and zinc coated (Z) steel.
Confusion matrix of the ViT model for the dataset complete cuts (cc) and incomplete cuts (ic) for aluminum, mild (M) steel, stainless (S) steel, and zinc coated (Z) steel.
The detection of the ViT for complete cuts, impending incomplete cuts, and incomplete cuts in steel is shown in the confusion matrix in Fig. 7. Out of 408 142 images in steel, 397 681 images were labeled correctly. The lower accuracy of the model to 97.44% can also be attributed to a certain degree of overfitting. Furthermore, it can be observed that particularly impending incomplete cuts are often misclassified. Out of 4 997 images with an impending incomplete cut, 1737 images were predicted as a complete cut, while 71 as an incomplete cut. This suggests that impending incomplete cuts are not detected before every incomplete cut, and features in the melt that indicate an impending incomplete cut do not occur prior to every incomplete cut.
Confusion matrix of the ViT model for the dataset of all complete cuts (cc), impending incomplete cuts (iic), and incomplete cuts (ic) for steel.
Confusion matrix of the ViT model for the dataset of all complete cuts (cc), impending incomplete cuts (iic), and incomplete cuts (ic) for steel.
Figure 8 illustrates the ViT detection for a complete and incomplete cut in zinc coated steel and aluminum. Shown is a sample image of the sheet surface, monitored images of the melt from the kerf, and the corresponding ViT detection during the cutting process, each in the center of the experimental cuts. For zinc coated steel, the ViT correctly predicted both the material and the complete and incomplete cut. In aluminum, the complete cut and incomplete cut are detected correctly. However, a complete cut in aluminum is occasionally predicted as mild steel and zinc coated steel.
Class prediction by ViT for a complete cut [(a) and (c)] and an incomplete cut [(b) and (d)] for zinc coated steel (a) and (b) and aluminum (c) and (d).
Class prediction by ViT for a complete cut [(a) and (c)] and an incomplete cut [(b) and (d)] for zinc coated steel (a) and (b) and aluminum (c) and (d).
Despite the lower accuracy in the third dataset, the ability to detect impending incomplete cuts should here be demonstrated. Figure 9 shows experimental cuts in mild steel, the monitored images, and the ViT prediction for a complete cut, an impending incomplete cut, and an incomplete cut. The initial complete cut is entirely predicted as such by the ViT. The small melt pool area in the kerf, as shown in the monitored image, indicates the precise detection by the ViT. In the second complete cut, the ViT occasionally predicts impending incomplete cuts, although a complete cut is clearly visible. During the impending incomplete cut, at the beginning, while a complete cut is visible, the ViT also marks these alternately. The impending incomplete cut is then detected by the ViT. During the transition into the incomplete cut, an impending incomplete cut and incomplete cut are recognized alternately until an incomplete cut is present. Finally, the incomplete cut in mild steel is unambiguously detected. This is presumably classified by the ViT due to the significantly increased melt pool area and occasional plasma formation, which can be seen in the monitored images. The model requires 6–9 ms to classify a single frame, at a maximum feed rate of 300 mm/s, impending incomplete cuts can, therefore, be predicted for a distance of 1.8–2.7 mm.
Class prediction by ViT for a complete cut [(a) and (b)], an impending incomplete cut (c), and an incomplete cut (d) for steel.
Class prediction by ViT for a complete cut [(a) and (b)], an impending incomplete cut (c), and an incomplete cut (d) for steel.
IV. CONCLUSIONS
In this contribution, the ability of a ViT to detect complete and incomplete cuts in mild steel, stainless steel, zinc coated steel, and aluminum, with first approaches to predict incomplete cuts in steel is demonstrated. To build the dataset, the thermal melt pool was recorded during the laser cutting process in situ using a high-speed camera in visual spectral range with a sampling rate of up to 20 kHz. Three datasets were created, first, for the detection of complete and incomplete cuts; second, with the extension for the respective metal; and, finally, for complete cuts, impending incomplete cuts, and incomplete cuts, in steel. With the trained ViT models, the first dataset could be detected with an accuracy of 99.94%, the second dataset with an accuracy of 93.22%, and the third dataset with an accuracy of 97.44%, which highlights the ability of ViT to detect the quality during laser cutting.
ACKNOWLEDGMENTS
This research was funded by the German Federal Ministry of Education and Research, project “Platform for AI-based sensor date analysis,” Grant No. 13FH013KI2.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Max Schleier: Conceptualization (equal); Data curation (equal); Writing – original draft (equal). Cemal Esen: Supervision (equal). Ralf Hellmann: Conceptualization (equal); Project administration (equal); Supervision (equal).