Lenses are fundamental elements in many optical applications. However, various aberrations are inevitably present in lenses, which will affect the distribution of focused light intensity and optical imaging. Accurately predicting the aberrations of a lens is of great significance. Nevertheless, quantitatively measuring the aberrations of a lens, especially when multiple aberrations are present simultaneously, is a challenging task. In this paper, we propose a method based on a designed deep residual network called Y-ResNet to measure the astigmatism and coma of a lens simultaneously. The Y-ResNet was trained on the focused image pattern of a Gaussian beam passing through a lens with astigmatism and coma. The trained network can accurately predict the aberration coefficients of the lens with 0.99 specificity, 0.925 precision, 0.9382 recall, and a 0.9406 F1-score achieved on astigmatism and 0.99 specificity, 0.956 precision, 0.98 recall, and a 0.954 F1-score achieved on coma. Specifically, even if only part of the intensity distribution of the light spot is captured, the network can accurately estimate the aberrations of the lens with an accuracy of over 90% on coma and can identify astigmatism aberration features. This paper can provide a feasible method for correcting beam patterns caused by aberration based on deep learning.
I. INTRODUCTION
The transmission of laser beams through optical systems has important significance for the interaction between lasers and matter as well as for fields such as optical communication.1 It is necessary to obtain a focused beam of light with a specific intensity distribution for laser material processing and other applications. Transmission theory can be used to simulate the intensity distribution of a light beam as it transmits through an ideal optical system.2 However, aberrations in the optical system are typically ignored in such simulations, despite the presence of aberrations that can alter the intensity distribution and impact laser applications.
Astigmatism and coma, as representative forms of off-axis aberrations, have become increasingly important in the design of wide-field optical systems.3 With increasing astigmatism, the sharpness of beam patterns around their edges deteriorates. Astigmatism distorts the ideal intensity distribution into a complex and slender diffraction pattern, which becomes more pronounced as the level of astigmatism increases. Coma, named for its elongated comet-like structure, produces a complex and asymmetric diffraction pattern with a bright center around which residual spots are arranged in a triangular pattern. Addressing the shape distortion of beam patterns resulting from these aberrations is a critical challenge, particularly in imaging and related fields. Additionally, accurately estimating the aberrations is crucial for correcting them and controlling the beam patterns.
While traditional optical field transmission theory can simulate the transmission of a light beam through an optical system with aberrations, acquiring aberration parameters from a known intensity distribution of a beam of light is challenging, particularly when multiple aberrations exist simultaneously. With the increase in computational power and data volume, deep learning technology has rapidly developed and provided a potential solution to address this problem. Deep Convolutional Neural Networks (DCNNs)4,5 have been widely used in various tasks, including face alignment,6 object identification,7,8 image segmentation,9 fault diagnosis,10,11 and target detection.12,13 However, as the number of network layers increases, DCNNs can suffer from degradation.14 In 2015, Microsoft15 developed a new network called Deep Residual Networks (ResNet), which introduces a deep residual learning framework to address the degradation problem and enhance robustness. ResNet has been applied to classification tasks, such as ImageNet and CIFAR-10, and has achieved excellent results, broadening the use of deep learning in both academic and applied research. ResNet is mainly used for classification, such as disease detection and classification,16 garbage image classification,17 and medical image classification.18–20
Möckl et al.21 simulated point spread function (PSF) images with Zernike aberrations at different focal points using vector diffraction theory. By training the neural network on these 100 000 images, it demonstrated good performance in predicting aberration coefficients. This approach successfully verified its capability in realistic simulations of complex PSF images carrying aberration phase information. Vanberg et al.22 demonstrated that deep CNN can accurately perceive the wavefront of focal plane images by predicting modal Zernike coefficients using two datasets of Zernike coefficients. Ma et al.23 took two sets of datasets of in-focal and out-of-focus intensity images as the input of AlexNet, and the corresponding Zernike coefficients were obtained from the intensity images. Wang et al.24 proposed phase-retrieval deep convolutional neural networks (PRDCNNs) based on the U-Net architecture, which directly measured the aberration structure and showed good fluctuation and robustness more directly and accurately. These provided proof that DCNNs can extract aberration information from intensity pictures for estimation. Methods for estimating aberrations using DCNNs have recently been applied to aberrations caused by optical systems,25,26 biological imaging,27,28 and turbulence.29,30
In order to obtain multiple aberration coefficients simultaneously from a known beam intensity distribution and accurately estimate the aberrations for use in controlling the beam patterns, we propose a new residual network called Y-ResNet to simultaneously estimate and identify these two aberration coefficients from the diffracted intensity patterns. In this study, the dataset of diffraction intensity patterns of a single lens with both coma and astigmatism aberrations is constructed by diffraction integration. The results demonstrate the effectiveness of our method, as it accurately extracts the difference between the two types of aberrations and estimates the categories simultaneously. This approach overcomes the limitations of the Zernike polynomial-based methods and provides more comprehensive and accurate results for aberration estimation. Estimating aberrations using the method of deep learning is more efficient than traditional wave-front aberration31–33 estimation methods. Compared to the previous work with a dataset of 100 000 images,21–23 we obtained high accuracy with a much smaller dataset of 6561 and can simultaneously make separate estimations for two different aberrations in intensity images.
II. METHODS
A. Basic principle
This section outlines the fundamental principles of the propagation of a Gaussian beam through a lens with astigmatism and coma aberrations. Under the paraxial condition,34 the ABCD transfer matrix is used to represent the transmission of a laser beam transmitting through an optical system.
We can simulate the light intensity images using the light intensity expression in Eq. (5). We divide the range of astigmatism coefficients and coma coefficients into 81 types, respectively, which means we take 81 values evenly from 1 × 10−5 to 5 × 10−5. The intensity distribution patterns of different aberration coefficients are shown in Fig. 2.
Intensity distribution patterns of focused beam with different aberration coefficients.
Intensity distribution patterns of focused beam with different aberration coefficients.
B. Network
Deep convolutional networks combine different characteristics, which can be enriched by increasing the depth of the network. However, degradation problems can occur as the depth of the network increases, leading to a decrease in accuracy. In many cases, redundant layers exist in the network, and we hope that these layers can complete identity mapping to ensure that the inputs and outputs are the same. To address this issue, He15 proposed ResNet by replacing several layers of the original network with a residual block. ResNet uses residual blocks, which enable the direct flow of information from one layer to another while bypassing the intermediate layers. In this way, the network can retain the original information and prevent degradation effectively. The residual block acts as a shortcut connection, enabling the network to learn residual functions instead of directly learning the desired output. The introduction of ResNet has significantly improved the performance and accuracy of deep convolutional networks.
In order to estimate the aberration coefficients of astigmatism and coma, we have developed a neural network called Y-ResNet based on the classical CNN network known as the deep residual network (ResNet).15 Figure 4(a) illustrates the architecture of the network. The input images of the network are intensity patterns with a single channel with pixels of 224 × 224. To accommodate the network architecture, we pad the input images with the zero-padding layer, resulting in images becoming 230 × 230 pixels. The initial stage of Y-ResNet involves a convolutional layer with a size of 7 × 7 and a stride of 2 × 2 for down sampling; the activation function is set as ReLU. To prevent overfitting and enhance training efficiency, a batch normalization layer (BN) is incorporated. Additionally, a max-pooling layer with a size of 7 × 7 and a stride of 2 × 2 for decreasing the parameters of training is connected with the zero-padding layer as a basic convolution unit. For all the ResNet networks with different layers (18, 34, 50), four “convolutional blocks” are the components of the main structure of the networks. For Resnet-18, each block includes two identity blocks separately, and each identity block comprises two convolution layers, as shown in Fig. 4(c). Each “conv” displayed in Fig. 4 consists of a convolution layer and batch normalization, as shown in Fig. 4(e). For Resnet-34 and Resnet-50, the numbers of “convolutional blocks” are three, four, six, and three, respectively, as shown in Figs. 4(b) and 4(d). In order to extract global information and mitigate overfitting, an average-pooling layer is connected to the “convolution blocks.” This pooling layer helps aggregate information from the previous layers. After passing through all the convolution layers and pooling layers above, information about the input images is fully extracted and selected. The next step involves feeding the selected information into a fully connected layer with a dropout rate of 0.0002. The outputs of the network are two different fully connected layers representing the estimating values of astigmatism coefficients and coma coefficients, respectively.
The structure of ResNet to determine two aberration coefficients. (a) Main structure of the network; (b) Y-ResNet-34; (c) Y-ResNet-18; (d) Y-ResNet-50; (e) conv unit.
The structure of ResNet to determine two aberration coefficients. (a) Main structure of the network; (b) Y-ResNet-34; (c) Y-ResNet-18; (d) Y-ResNet-50; (e) conv unit.
All three ResNet models are trained by Adam as an optimizer with a learning rate of 0.0002 and binary cross-entropy as a loss function. We train the network with 50 epochs and shrink by 0.1 at the 15th and 25th epochs.
III. EXPERIMENT AND RESULT
Based on the formula in Sec. II A, we explored the diffraction patterns of a Gaussian beam passing through a lens with astigmatism and coma and propagating a certain distance z. In order to provide the datasets to the neural network, we employed MATLAB as a simulation tool. The beam parameters are presented in Table I.
Basic parameters of the Gaussian beam.
Beam parameters . | Numerical value . |
---|---|
λ (wavelength) | 632.8 × 10−9 m |
f (focal length) | 2 × 10−1 m |
Z (propagation distance) | 1.5 × 10−1 m |
w (beam waist) | 2 × 10−3 m |
C6 (Astigmatism coefficient) | 1 × 10−4–5 × 10−4 |
C3 (Coma coefficient) | 1 × 10−4–5 × 10−4 |
Beam parameters . | Numerical value . |
---|---|
λ (wavelength) | 632.8 × 10−9 m |
f (focal length) | 2 × 10−1 m |
Z (propagation distance) | 1.5 × 10−1 m |
w (beam waist) | 2 × 10−3 m |
C6 (Astigmatism coefficient) | 1 × 10−4–5 × 10−4 |
C3 (Coma coefficient) | 1 × 10−4–5 × 10−4 |
A. Classification results based on different datasets
The amount of datasets plays a crucial role in determining the performance of neural networks. We employ four different datasets to validate the impact of the dataset amount on the network’s ability to predict aberration coefficients. Group A: 81 types of astigmatism coefficients and 81 types of coma coefficients to jointly construct 6561 intensity patterns. Group B: 67 types of astigmatism coefficients and 67 types of coma coefficients to jointly construct 4489 intensity patterns. Group C: 58 types of astigmatism coefficients and 58 types of coma coefficients to jointly construct 3364 intensity patterns. Group D: 51 types of astigmatism coefficients and 51 types of coma coefficients to jointly construct 2601 intensity patterns. All the intensity pattern images have been set in size to 224 × 224 pixels. The datasets are separated into validation sets and training sets in a 1:9 ratio.
The better the Y-Resnet-34 performance, the lower the values of MAE, RMSE, and MSE, and conversely, the higher the values of Rxy and accuracy.
Figures 5 and 6 illustrate the results of four different metrics for the four datasets. It can be clearly seen from Fig. 5 that the values of the three types of errors in Group A are lower than those of the other three groups. Figure 6 also shows that Group A exhibits higher Rxy values compared to the other groups for two aberration coefficients. Based on the analysis, the training samples with 6561 demonstrate better performance with Y-Resnet-34.
Measurement standard of classification on Y-ResNet-34. (a) Astigmatism coefficients; (b) Coma coefficients.
Measurement standard of classification on Y-ResNet-34. (a) Astigmatism coefficients; (b) Coma coefficients.
Figure 7 demonstrates the accuracy of two types of aberration with Y-ResNet-34. For astigmatism, the network achieves a preferable performance, whereas for coma, the representation accuracy is relatively lower, with the highest accuracy being only 84.8%. That means the performance of the network in estimating coma coefficients is lower than that in astigmatism coefficients.
B. Performance comparison between different neural networks
In Sec. III A, we found a suitable input sample (input 6561 images of 81 classes for each aberration) for determining the aberration coefficients, and now we input this suitable input sample into different networks (Y-Resnet-18, Y-ResNet-34, and Y-ResNet-50) to verify the network’s prediction accuracy of the two aberration parameters. Simultaneously, in order to optimize the issue of low prediction accuracy for coma, we have implemented an optimization approach. Specifically, we cropped the input image to a certain size that only maintain the bright pattern and remove the black borders, as shown in Fig. 8.
Intensity patterns before and after cropped. (a) Before cropped; (b) after cropped.
Intensity patterns before and after cropped. (a) Before cropped; (b) after cropped.
These metrics consist of the arithmetic operation of four sets: (a) True Positive (TP): predict aberration coefficients correctly. (b) True Negative (TN): predict not a specific type of aberration coefficients correctly. (c) False Negative (FN): predict aberration coefficient incorrectly. (d) False Positive (FP): predict not a specific type of aberration coefficients incorrectly. The higher value of specificity indicates that the classifier is more capable of identifying negative and positive samples; the higher values of recall and precision indicate that the classifier is more capable of identifying positive samples; and the F1-score combines the precision and recall of the classifier, which is a good balance of the performance evaluation. These four evaluation functions measure the performance of the network using both positive and negative samples.
We employ Y-ResNet-18, Y-ResNet-34, and Y-ResNet-50 on the same datasets to verify if the coefficients are well predicted. Four metrics for characterizing two aberration coefficients by three networks are shown in Fig. 9.
Results analysis of four metrics with three Networks. (a) Astigmatism coefficients; (b) coma coefficients.
Results analysis of four metrics with three Networks. (a) Astigmatism coefficients; (b) coma coefficients.
Figure 9 implies that the Y-ResNet-18 network achieves better results in the classification of aberration coefficients compared to Y-ResNet-34 and Y-ResNet-50. That means for 6561 intensity patterns, fewer layers can better fit the dataset. Moreover, the evaluation metrics of specificity, precision, recall, and F1-score maintain a significantly high value of over 0.954 for coma coefficients and over 0.925 for astigmatism coefficients. For aberration classification on both Y-ResNet-34 and Y-ResNet-50, the values of specificity are above 0.99.
In the classification process, it is possible for FP and FN to be close to zero, leading to limitations in precision and recall measurements. To overcome this limitation, the F1-score is introduced, which is the harmonic mean of precision and recall. The F1-score is highest when precision and recall are close to each other.
Based on the values of these metrics, it can be concluded that the network exhibits excellent performance in predicting the values of each aberration coefficient. The high values of specificity, precision, recall, and F1-score indicate that the network is able to accurately classify the aberration coefficients, further emphasizing its effectiveness in this task.
C. Performances on aberration of cropped intensity patterns
In some circumstances, the detector cannot capture the distribution of intensity patterns completely. In view of this situation, the ability to recognize a partial intensity pattern is crucial. Therefore, in this section, we cut the intensity pattern into different sizes to test and verify the capability of the designed ResNet for the recognition of aberration coefficients. This analysis will help assess the model’s performance and its ability to extract meaningful features from incomplete or partial intensity patterns, providing insights into its robustness in practical applications.
Figure 10 illustrates the cropped intensity pattern images with different sizes, where each subsequent image is obtained by clipping 1/7 of the original image. It can be observed that as the intensity pattern size decreases, the distortion caused by astigmatism gradually reduces, while the distortion caused by coma increases. To analyze the accuracy of the network in predicting astigmatism and coma coefficients using different cropped sizes, the cropped intensity patterns are input into the Y-ResNet-50 model for training. The learning rate, loss function, and optimizer remain the same as in Sec. III B.
As the size of the intensity pattern decreases, the accuracy of network predictions gradually decreases to 90%. This indicates that as the intensity pattern size decreases, the captured image features also diminish, resulting in a decrease in prediction accuracy. Furthermore, Fig. 10 also demonstrates that the accuracy of astigmatism coefficients noticeably decreases with the cropping of intensity patterns. This observation aligns with our understanding that cropped intensity patterns provide less information for accurately estimating astigmatism coefficients.
Overall, these findings highlight the impact of intensity pattern size on network accuracy and emphasize the importance of considering the size and quality of captured intensity patterns to achieve reliable predictions of astigmatism and coma coefficients.
IV. CONCLUSION
In this paper, we propose a neural network called Y-ResNet for accurately predicting the aberration of a Gaussian beam as it propagates through a single lens over a certain distance. The Y-ResNet network is specifically designed to extract and estimate aberration coefficient features from intensity patterns. Unlike traditional ResNet models, Y-ResNet has an output layer with two parallel branches, enabling separate classification of astigmatism coefficients and coma coefficients.
The results demonstrate the high accuracy of astigmatism coefficient estimation, with a rate of over 96% achieved with different dataset sizes. Additionally, after removing irrelevant information from the intensity patterns, the estimation accuracy of coma coefficients reaches as high as 98% with different layers of ResNet. This proves that it is feasible to extract and estimate the aberration coefficients from the intensity patterns. To further evaluate the ability of the network to predict aberrations, we conducted experiments with intensity pattern cropping of various sizes. The accuracy of aberration prediction gradually decreases as the cropping size becomes smaller, but it consistently remains above 90%. This suggests that even with smaller cropped patterns, the neural network can still extract relevant features, albeit with reduced influence from astigmatism.
These findings demonstrate the effectiveness of the Y-ResNet model in predicting aberration coefficients and highlight the model’s robustness in handling intensity patterns of different sizes. The proposed approach holds promise for accurate aberration estimation and contributes to the advancement of aberration analysis in Gaussian beam propagation through lenses. We have only estimated the aberrations carried by the intensity images at the simulation level and have not confirmed the feasibility of the method experimentally. Future work could focus on changing different types of incident beams for different applications or try to estimate three or more types of aberrations simultaneously.
ACKNOWLEDGMENTS
This work was supported by the Scientific Research Fund of Liaoning Provincial Education Department (Grant No. LJKMZ20220620) and the 2023 Central Government guidance for local science and technology development funds (basic research on free exploration) (Grant No. 2023JH6/100100066).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Jinyang Jiang: Conceptualization (equal); Formal analysis (equal); Software (equal); Writing – original draft (equal). Xiaoyun Liu: Funding acquisition (equal); Writing – original draft (equal). Yonghao Chen: Conceptualization (equal); Formal analysis (equal). Siyu Gao: Formal analysis (equal). Ying Liu: Formal analysis (equal). Yueqiu Jiang: Funding acquisition (lead).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.