We propose a machine learning-based technique to address the crystallographic characteristics responsible for the generation of crystal defects. A convolutional neural network was trained with pairs of optical images that display the characteristics of the crystal and photoluminescence images that show the distributions of crystal defects. The model was trained to predict the existence of crystal defects at the center pixel of the given image from its optical features. Prediction accuracy and separability were enhanced by feeding three-dimensional data and data augmentation. The prediction was successful with a high area under the curve of over 0.9 in a receiver operating characteristic curve. Likelihood maps showing the distributions of the predicted defects are in good resemblance with the correct distributions. Using the trained model, we visualized the most important regions to the predicted class by gradient-based class activation mapping. The extracted regions were found to contain mostly particular grains where the grain boundaries changed greatly due to crystal growth and clusters of small grains. This technique is beneficial in providing a rapid and statistical analysis of various crystal characteristics because the features of optical images are often complex and difficult to interpret. The interpretations can help us understand the physics of crystal growth and the effects of crystallographic characteristics on the generation of detrimental defects. We believe that this technique will contribute to the development of a better fabrication process for high-performance multicrystalline materials.
I. INTRODUCTION
In multicrystalline materials, the formation of crystal defects or imperfections in the atomic arrangement has a profound effect on the material behavior. A variety of crystal defects can be found in multicrystalline materials: point defects, such as vacancies and impurities; line defects, including edge and screw dislocations; and surface defects, which divide the solid crystal into regions with different structures and crystallographic orientations. The presence of these various crystal defects is found to change the mechanical, physical, and optical properties of crystalline materials.1–8 The mechanical strength strongly depends on the density of grain boundaries, the thermal conductivity in metals as well as the absorption of light around certain wavelengths is reduced by point defects, and plastic deformation is controlled by dislocations.4–8 While the increase in mechanical strength has allowed for the production of various applications, the deterioration in electrical and thermal properties has raised concerns. Hence, the study of understanding and controlling the defects of crystalline materials is attracting attention for fabricating materials with suitable properties. While many studies have been conducted to control defect generation,9,10 they are often operated on a series of physical experiments and human-based analysis on a limited area. If we could analyze the crystallographic characteristics holistically with the aid of machine learning, it would be possible to eliminate human-based bias or preconceptions and conduct a comprehensive analysis.
Multicrystalline silicon is employed as a model material in this study. Multicrystalline silicon is a common material used to fabricate solar cells due to its abundance and affordability. However, the electrical properties of the cells, particularly their conversion efficiencies, tend to be limited by dislocation clusters with high concentrations of dislocations up to 107 cm−2 acting as recombination centers for photogenerated carriers.11–14 Thus, it is essential to understand the generation mechanism of dislocation clusters and to control them. Typically, dislocation clusters are formed near grain boundaries during silicon ingot casting due to the relaxation of strain energy.15 While the generation of dislocations has been studied by means of observation of silicon ingots and their stress distributions during growth processes, the relationships between the dislocations and other crystalline characteristics, including grain boundary arrangements, grain size distributions, and crystallographic orientations, have not been clarified.16,17
In this study, we construct a machine learning model that reproduces the behaviors of the crystal and analyze the internal structures to gain an understanding of the physical phenomenon behind the behavior. We address the various characteristics in order to gain a deep and comprehensive understanding of the generation of dislocation clusters. A machine learning model consisting of convolutional and dense layers is trained with optical images containing information about the crystalline characteristics and photoluminescence (PL) images displaying the distributions of dislocation clusters. The construction of the model and its outcomes are reported, and we provide insights from the analysis of the model.
II. METHODS
Multicrystalline silicon wafers were used in this study. In order to associate the crystalline structures with the generation of dislocation clusters, a machine learning model conducted binary classification that predicts the presence of dislocation clusters from the given images of the wafers. The pre-trained model was then analyzed to identify the relevant structures in the generation of dislocation clusters.
A. Collecting optical and photoluminescence images
The model was trained using two types of images: optical and PL images. The images were taken from silicon wafers cut out from a single ingot by a diamond wire saw and textured using alkaline solutions. Texturing removes contamination and damage caused by the sawing process and creates pyramid-like structures consisting of {111} planes. The wafer size was 15.6 × 15.6 cm2, and the thickness of each wafer was ∼180 μm, cut out with an interval of 290 μm. Optical images were taken using a house-made apparatus.18,19 The apparatus mainly consists of an imaging photometer (ProMetric IP-PMY29), a rotating collimated light for illumination, and a stage. Irradiation on the wafers generates reflection patterns specific to the crystallographic orientation, which we integrate into full-sized optical images of 500 × 500 pixels2. An example of the optical images is shown in Fig. 1(a), in which the grain boundaries, grain sizes and shapes, and crystallographic orientations (represented by the color differences) are distinguishable.
Example of collected images: (a) optical image, (b) acquired PL image, and (c) binarized PL image.
Example of collected images: (a) optical image, (b) acquired PL image, and (c) binarized PL image.
For the same wafers, PL images were taken. The images were resized into images of 500 × 500 pixels2, with each pixel of the optical and PL images corresponding to each other. The PL images were then prepared for training by binarization. The images were binarized with a fixed threshold and reconstructed into an image shown in Fig. 1(c), with the white areas representing dislocations and the black areas representing non-dislocations. The spatial resolution is limited by the pixel size, which is about 300 μm. Empirically, there is a marked decrease in PL intensity at dislocation densities of 105 cm−2, where 10 or more dislocations exist per pixel. Since it is difficult to estimate the dislocation density from the PL intensity with high accuracy, only the presence or absence of significant dislocation generation in each pixel is treated by thresholding.
B. Model design
Our proposed model is a convolutional neural network (CNN)20 consisting of three convolutional layers and one dense layer. We have chosen CNN since it provides translational invariance and the ability to extract a variety of features directly from the image without prior parameter selection. The convolutional layers extract the key features from the input optical images. Then, based on the binarized PL images, the dense layers classify the given images into outputs between 1 and 0, which represents whether the center pixels of the given images contain dislocations or not. The hyperparameters of the model were adjusted to increase the area under the curve (AUC) of the test. Since the balance of the classes in the training dataset is not ideal, the accuracy may not reflect the actual model performance due to overlearning. By using the AUC as an indicator, we confirmed that the prediction was not random and determined the hyperparameters.
In our study, two CNNs were constructed with different dimensions: 2D and 3D.21,22 In the 2D model, inputs were simple optical images and they were trained with labels from the corresponding PL images. It should be noted that balancing the distribution of classes is an essential process in the dataset. As can be seen in Fig. 1(c), the original given images contain more non-dislocation areas than dislocation areas. Namely, in the original images, only 0.6% of the pixels contain dislocations. This means that a model predicting “non-dislocation” for every single pixel still acquires an accuracy of 99.4%. To avoid this, we randomly selected images to create a training dataset containing ∼25% of dislocations. The architecture is shown in Fig. 2. The dataset contained 37 953 images, which were split into training and validation datasets by a ratio of 80:20, each consisting of 30 362 and 7592 images. Test data were taken from alternate wafers. The architecture of the 3D model is shown in Fig. 3, where the model takes five optical images stacked in the growth direction and predicts the dislocations of the top wafer. Here, the model learns not only from the crystal structures, such as grain boundaries and orientations, but also from their changes in association with crystal growth. The dataset contained 21 426 images, split into training and validation datasets, each consisting of 17 140 and 4286 images.
C. Grad-CAM, providing visual explanations of the model
The analysis of the model is performed by Gradient-based Class Activation Mapping (Grad-CAM).23 Generally, the interpretations that a machine learning model makes are not visible to the eye. This is referred to as the “black-box problem,” where the lack of interpretability and visibility delimits the model’s trustworthiness.24–26 Recent developments in interpretability have suggested using the weights of the pre-trained network to visualize the model’s reasoning in producing the output. While this technique has provided more trust and reliability in neural networks, we believe that visualizing the model’s reasoning in complicated problems may provide extended benefits, such as teaching us something that humans are not aware of yet. Here, we use gradients of the pre-trained model to calculate the weights of the feature maps on the class score and generate coarse heatmaps highlighting the most discriminant areas of the given images.
The obtained coarse highlights of the discriminant regions are overlaid on the original images to visualize the most important regions. A graphical explanation of the Grad-CAM is shown in Fig. 4.
III. RESULTS
A. Results of training on the 2D model
Figure 5 shows the predicted likelihoods for two test wafers along with the ground truth for 2D prediction. The reliability of the model can be seen from the receiver operating characteristic (ROC) curve shown in Fig. 6, which plots true positive rate vs false positive rate.27,28 The AUC is a typical efficiency statistic for the ROC curves, and the larger the area under the red curve is, the more separability the model shows. Note that the model was trained on 30 362 images randomly selected from nine wafers (Nos. 502, 506, 509, 513, 516, 519, 523, 526, and 541 in the supplied datasets in the repository29), each shaped (101, 101). For test, two full-sized wafers were used. Each (101, 101) area was sequentially taken out, and the existence of dislocations at the center pixels was predicted. The outputs were integrated into predicted likelihood maps with the size of (400, 400). Furthermore, we improved the predictions by data augmentation. Data augmentation is a method to increase data when it is assumed that there are insufficient data to train the network. In many cases, simple transformations, such as rotating or flipping the original image, filtering, or desaturating, are effective.30,31 The selection of data augmentation techniques is embraced on the condition that they are “safe” as in preserving the labels post-transformation.30 Although these minorly altered images are inherently the same data, the neural network treats them as novel data, hence helping to improve the model’s generalizability and trustworthiness. Particularly in our case, the original luminance profiles were rotated by 90°, increasing data to four times the original size. The model was trained and tested on the augmented data and compared with the non-augmented training.
Ground truth and predicted likelihoods in grayscale and colormap for two wafers for the 2D model. Predicted likelihoods are shown for two models: one trained with non-augmented inputs and the other trained with ×4 augmented inputs.
Ground truth and predicted likelihoods in grayscale and colormap for two wafers for the 2D model. Predicted likelihoods are shown for two models: one trained with non-augmented inputs and the other trained with ×4 augmented inputs.
ROC curve for test data predicted on the 2D model pre-trained with (a) non-augmented and (b) augmented inputs.
ROC curve for test data predicted on the 2D model pre-trained with (a) non-augmented and (b) augmented inputs.
B. Results of training on the 3D model
Figure 7 shows the predicted likelihoods for two predicted wafers for the 3D model with non-augmented and augmented inputs. The ROC curve is displayed in Fig. 8. Note that training was performed on 21 426 image sets consisting of five wafers stacked in the growth direction (Nos. 461, 481, 502, 521, and 543 for 3D in the supplied datasets in the repository29), each shaped (101, 101, 5). Test was performed on two full-sized wafer sets. The existence of dislocations at the center pixel of the top wafer was predicted. These predictions were integrated into likelihood maps of (400, 400). The size of the training dataset, the accuracy of predictions, and the test AUC are displayed in Table I, showing the comparison of the 2D augmented and non-augmented models and 3D augmented and non-augmented models.
Ground truth and predicted likelihoods in grayscale and colormap for two wafers for the 3D model. Predicted likelihoods are shown for two models: one trained with non-augmented inputs and the other trained with ×4 augmented inputs.
Ground truth and predicted likelihoods in grayscale and colormap for two wafers for the 3D model. Predicted likelihoods are shown for two models: one trained with non-augmented inputs and the other trained with ×4 augmented inputs.
ROC curve for test data predicted on the 3D model pre-trained with (a) non-augmented and (b) augmented inputs.
ROC curve for test data predicted on the 3D model pre-trained with (a) non-augmented and (b) augmented inputs.
Training dataset size, accuracy, and test AUC for 2D and 3D models.
Model . | Augmentation . | Training dataset size . | Binary accuracy . | Test AUC . |
---|---|---|---|---|
2D | None | 30 362 | 0.975 | 0.964 |
2D | ×4 | 121 645 | 0.960 | 0.961 |
3D | None | 21 426 | 0.979 | 0.979 |
3D | ×4 | 85 704 | 0.988 | 0.938 |
Model . | Augmentation . | Training dataset size . | Binary accuracy . | Test AUC . |
---|---|---|---|---|
2D | None | 30 362 | 0.975 | 0.964 |
2D | ×4 | 121 645 | 0.960 | 0.961 |
3D | None | 21 426 | 0.979 | 0.979 |
3D | ×4 | 85 704 | 0.988 | 0.938 |
C. Results of analysis by Grad-CAM
The Grad-CAM was operated on the 2D and 3D models trained with the augmented data. In Fig. 9, we show an example of the highlights for the 2D model. The highlights are created for each of the three convolutional layers. Figure 10 shows an overlay of the true and predicted dislocation clusters from the 3D model on a corresponding optical image. For the same model, Fig. 11 shows the examples of highlights obtained for nine regions.
Highlights of discriminant regions for each layer in the 2D CNN. Shown in grayscale is the given optical image. The colormaps show the overlays of highlights for conv2D_1, conv2D_2, and conv2D_3 layers, respectively.
Highlights of discriminant regions for each layer in the 2D CNN. Shown in grayscale is the given optical image. The colormaps show the overlays of highlights for conv2D_1, conv2D_2, and conv2D_3 layers, respectively.
Overlay of ground truth and predicted dislocation clusters, both shown on a given optical image. In the overlay of ground truth and predicted dislocation clusters, the predicted dislocations are displayed in red and the true dislocations are shown in magenta.
Overlay of ground truth and predicted dislocation clusters, both shown on a given optical image. In the overlay of ground truth and predicted dislocation clusters, the predicted dislocations are displayed in red and the true dislocations are shown in magenta.
Highlights of discriminant regions for several regions in the 3D CNN.
IV. DISCUSSION
A. Performance of prediction
Our models have a high AUC of over 0.93, successfully separating the dislocation areas from the rest. The likelihood maps show that the main dislocation areas are predicted correctly, while the model has predicted a larger area as dislocations than the actual. Data augmentation does not change the prediction scores significantly, but the pixels with intermediate likelihood values declined after augmentation. This is shown by the decline of the green or yellow areas in the colored likelihood maps. This indicates that by data augmentation, the model has generated more confidence in its predictions. This is because the data variety and abundance provided by rotating the input images have helped the model generalize more. Moreover, the 3D model has a higher accuracy and good separability compared to the 2D model. The likelihood maps are outstandingly clearer and in good contrast, indicating that the model has certainly separated areas with and without dislocation clusters.
From the likelihood maps and the ROC curve, it can be said that our model has been sufficiently trained to undergo analysis.
B. Visualization by Grad-CAM
Here, we discuss the visualizations from the 3D model with augmented inputs, which is expected to have the most reliability. In Fig. 11, the regions where the change in the values of the input pixels affects the class of prediction are distinctly observed. For regions 1, 4, and 5, the highlights are significant for a selected area. In regions 6 and 9, the highlights are broader and the model seems to be looking at the multiple complex structures. The areas where dislocations existed are commonly highlighted but do not seem to have the most impact. It is remarkable that none of the highlights were of the center pixels alone. Despite the fact that the model was trained on the dislocation presence of the center pixel of the given image, the characteristics of the areas far from the center are often more discriminative. It has also been found that when fed with slightly shifted or 90°-rotated images, the same grains were highlighted. It seems that the data augmentation resulted in providing the rotation invariance to the model in addition to the intrinsic translation invariance of CNN. As a result, the model acts as if it were chasing several specific grains or areas, which indicates the presence of characteristics with a large responsibility for the generation of dislocations.
Shown in Fig. 12 are the visualizations compared with the changes in crystal structures for four regions. In region A, the grain boundary at the highlighted area has arched due to the growth of an adjoining crystal. In region B, the model focuses on the area that is pushed outward by an enlarging grain. In region C, the highlights are near two shrinking grains at the center and also above complex structures at the bottom of the image. In region D, highlighted are several clusters of small grains and extending or narrowing grain boundaries. These changing crystallographic structures are of great importance in the generation of dislocation clusters, and the model has successfully extracted the key structures.
Grad-CAM visualizations and the optical images fed into the model. The optical images are aligned in the direction of crystal growth.
Grad-CAM visualizations and the optical images fed into the model. The optical images are aligned in the direction of crystal growth.
Dislocation generation in multicrystalline materials depends not only on the complexity of the microstructure and the diversity of grain boundaries but also on the material manufacturing process. Even for materials science experts, it is difficult to determine the presence or absence of dislocations from optical images. The results of this study will provide experts with insights into the fundamental nature of this complex problem and make a significant contribution to the deepening of scientific knowledge.
V. CONCLUSIONS
We developed a method to address and analyze dislocation clusters, a type of crystal defect that is detrimental to solar cells. This was carried out through feature extraction of optical images by convolutional neural networks and an analysis based on gradient-based class activation mapping. This technique outperforms baseline techniques in the analysis of the crystal structure in the time required and in the elimination of human bias. Remarkably, areas far from the center of the given images were often of more interest to discriminate than the center pixels. From a close look at the highlights, it was seen that the model reasons on significant changes in grain boundaries and grain shapes or complex small structures. This indicates that a machine learning model is extremely effective in extracting the tendencies statistically from the images. This is beneficial in analyzing a large amount of crystallographic data because image features are difficult to interpret. We believe that this technique will become a guideline for addressing large data in order to clarify mechanisms regarding crystal growth. Future work seeks a comprehensive understanding of the generation of crystal defects through increased data from various crystals.
ACKNOWLEDGMENTS
This work was supported by JST CREST (Grant No. JPMJCR17J1).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Kyoka Hara: Conceptualization (equal); Data curation (lead); Investigation (lead); Methodology (lead); Visualization (lead); Writing – original draft (lead). Takuto Kojima: Conceptualization (lead); Investigation (supporting); Supervision (equal); Validation (lead). Kentaro Kutsukake: Conceptualization (equal); Investigation (supporting); Methodology (equal); Supervision (supporting). Hiroaki Kudo: Conceptualization (supporting); Investigation (supporting); Methodology (supporting); Supervision (supporting). Noritaka Usami: Funding acquisition (lead); Supervision (lead); Writing – review & editing (lead).
DATA AVAILABILITY
The data that support the findings of this study are openly available in GitHub at https://github.com/UsamiCREST/DC_prediction.29