Machine learning methods can be used to predict the properties of materials from their structure. This can be particularly useful in cases where other standard methods for finding material properties are time and resources consuming to use on large sample spaces. In this work, we study the strength of α-quartz crystals with a porous layer created by simplex noise as the shape of porosity. We train a neural network to predict the yield stress of these systems under both shear and tensile deformation. Molecular dynamics simulations are used for a randomly selected sample of possible structures in order to generate the ground truth to be used as the training data. We employ deep convolutional neural networks (CNNs) which are commonly used when dealing with image or image-like data since the input data for the problem in hand are a binary 2D structure of the porous layer of the systems. The trained CNN can predict the yield stress of a system based on the geometry of that given system, with much higher precision compared to a baseline polynomial regression method. Saliency maps created with the trained model show that the model predictions are most sensitive to altering structures near high-stress regions, indicating that the model makes predictions based on reasonable physics.
Modeling of mechanical failure phenomena is important both in materials research and for our fundamental understanding of nature.1 Machine learning has opened new paths to problem-solving where traditional simulation approaches have been insufficient.2 Thus, modeling and machine learning can be used in tandem to understand mechanical failure phenomena, both in a practical way to predict when and how fractures may occur and to discover mechanisms in nature. On a microscopic scale, molecular dynamics (MD) simulations can be used to directly model the failure of materials.3–10 Such simulations incorporate the scale of individual atoms and the breaking of atomic bonds, which is the fundamental ingredient in mechanical failures. However, it is often necessary to model spatially large systems and a large number of simulations to obtain insights from MD simulations. The computational cost of such simulations can be prohibitively high, and the amount of data may be too large to inspect manually. Therefore, surrogate methods for predicting yielding behavior from the geometrical structure may be needed, as improved analysis tools for exploring natural phenomena. Deep learning has proved to be able to capture relevant and informative features of a wide variety of data types11,12 and predict the desired output for different regression or classification problems. Finding a mapping between a material's structure and a given physical property of that system is an example of a problem that could be approached with machine learning methods such as neural networks. A well-trained neural network can be significantly faster and more efficient for computing the relevant properties than running simulations.
Several studies have used continuum models to investigate the prediction of mechanical properties of composite materials using neural networks, for example predicting the full loading curves of composite materials under a mechanical load.13 Similarly, neural networks have predicted the 2D stress and strain fields from an image of the geometry and the loading mode and magnitude,14,15 or from an alternative input form such as ologs, the building blocks that define a material's microstructure.12
On the molecular scale, neural networks have been used to predict the path of dynamic cracks16 and to predict Young's modulus of silica glass.17 A study of graphene kirigami systems using MD showed that a convolutional neural network could accelerate the search for kirigami structures with a high stretchability.5 In a follow-up study, the same physical system was investigated using a generative autoencoder-based procedure.18 Diffusion-based models have also been used for inverse design of systems with cracks15 and hierarchically organized materials.19,20
A challenge when training these methods is the choice of a generator of reasonable geometries for training samples. Many studies assign grid points to be either occupied by an elastic material or unoccupied. In such systems, the number of possible configurations grows exponentially with the number of grid points and surpasses the number of atoms in the universe with less than 300 grid points. Thus, there is clearly a need for efficient sampling to create a training dataset that contains a representative subset of physically relevant and stable geometries. This is particularly important to make sure that there exist training samples that are sufficiently similar to the samples that the ML method will face in production.
Here, we explore how neural networks can be used to aid the understanding of a porous interface in an α-quartz crystal representing natural rocks. The crystals studied have randomly generated geometrical defects in their structure, which affects their mechanical properties. We have limited our study to only a thin, disordered layer as a first step toward full three-dimensional models and as a simplified model for surface processes such as interface adhesive strength or the frictional strength of a disordered surface. We use MD simulations to predict the ground truth mechanical strength. These pairs of features (geometries) and labels (yield stresses) are used to train a convolutional neural network model for the strength of the α-quartz interfaces when subjected to tension and shear. In order to extend the span of the training data and, thus, improve the generalizability of the model, we employ an active learning strategy inspired by the work by Hanakata et al.5 Here, we apply it to a 3D mechanical system and sample geometries generated by simplex noise21 to obtain structures with extended correlations. We show that a simple convolutional neural network can accurately model and predict the yield strength of pre-cracked α-quartz interfaces generated by simplex noise, and that this model is generalizable to samples not generated by simplex noise. Finally, we use saliency maps22 to show that the salient features of the ML model align with our intuition of mechanical failure, namely, that it is due to stress concentration near sharp corners. A similar approach is used in the work by Hsu et al.23 where they have used Grad-Cam to interpret how their model perceives the geometry of the systems. This modeling approach can be useful for simultaneously controlling various material properties, for example, to obtain a fixed relationship between the mechanical strength and the gas permeability of a nanoporous silica sample. This can be further developed for the inverse design of structures with desired properties without the need for a huge number of simulations on a wide domain of possible systems.
For exploring the mechanical failure properties of silica interfaces, we prepared a block of α-quartz measuring 140 × 140 × 70 Å3 as shown in Fig. 1(a). We introduced a thin porous interface on this α-quartz block by removing particles from a subspace located at the middle of the system, spanning the xz-space with a height of along the y-axis. The height of was chosen to be a tenth of the system height (z direction). It is also Å larger than the cutoff radius of the interatomic potential we are using for the MD simulations,24 which ensures that the voids do not immediately collapse due to adhesion.
The porous layer located in the middle section of the α-quartz blocks determines the strength of the material in response to shear or tensile deformation. We use Simplex noise,21 an improved version of Perlin noise25 to generate the porous layer. Simplex noise is a gradient-based procedural noise algorithm that can generate geometries that appear similar to natural terrains and surfaces. It is widely used for generating scenery in video games and animations.
Given a set of points on a 2D grid, the Simplex algorithm can be used to calculate scalar values for each point, which can be interpreted as a surface . The shape of this surface can be controlled by the scale and octaves parameters of the simplex noise mathematical function, with a higher scale resulting in larger structures and a higher number of octaves resulting in structures on multiple scales [Figs. 1(b) and 1(c)]. We apply a threshold to the Simplex values to transform this smooth 3D surface to a 2D step function following and 0 otherwise, for some threshold. This step-like Simplex noise can be represented as a binary 2D image. For creating samples, we have chosen scales and octaves with steps of 1 for both cases and thresholds .
To assess the mechanical strength of the Simplex noise-generated structures, we ran MD simulations for α-quartz systems under tensile and shear deformation. Before the deformation stage in order to ensure the stability of the systems, we performed a relaxation stage using a Nose/Hoover thermostat and barostat keeping the systems at a temperature of 300 K and a pressure of 1 bar. In both cases, we used a strain rate of 0.01 ps–1, periodic boundary conditions, and temperature control using a Langevin thermostat with a damping time of 100 ps and Velocity Verlet integration with a time step of 0.5 fs. The temperature was kept at 300 K while applying true tensile or shear deformation until failure. The yield stress was measured as the maximum stress during shearing or tension, on stress' strain curves smoothed using a Savitzky–Golay filter.26 All MD simulations were run using the open-source MD simulator LAMMPS.27,28 Four snapshots of the system structure under shear deformation are shown in Fig. 1(a). The two topmost snapshots are from the start and shearing stage of the simulation, and the bottom two snapshots show the failure of the interface as it fails mechanically. A representative loading curve is shown below the system configurations, and the rapid drop in force after yield indicates a brittle failure of the system. This is the expected behavior for this α-quartz model.9 These simulations produced a dataset containing 1000 systems with yield stress values in the range [0.5, 4] GPa and porosities in the range [0, 0.5]. We call this as the base dataset, and this dataset is used to train the initial neural network model. The porosity and yield stresses of the base dataset simulations are shown in Fig. 2(a). Unsurprisingly, porosity is highly correlated with strength. As a first model, we fit a third-degree polynomial to the yield stress vs porosity for the base dataset [also shown in Fig. 2(a)]. The predictions of this model are plotted vs the true values on unseen test data in Fig. 2(b).
We now apply a machine learning procedure for an accelerated evaluation of the strength of various porous silica interfaces. This problem can be formulated as a regression problem with images as input data and yield stress as the continuous output property. The image representing each system in our dataset is a 200 × 100 binary grid of the porous central layer of the bulk crystal, and the yield stress was produced using MD simulations. Since our input data are images where spatial correlations are expected to be important for the yield stress values, convolutional neural networks (CNNs) are a reasonable choice of model architecture. We have chosen a deep convolutional neural network of four convolutional layers and one fully connected layer, similar to a previous study on graphene kirigami.5 The convolutional layers are followed by a max pooling operation. We have chosen a 9 × 9 convolutional kernel and periodic padding by pixels. The choice of periodic padding is done to impose some periodicity on the model.
We expect the CNN to capture more of the physical information from the images of the porous cross section compared to a simple polynomial fit to porosity. Before entering the neural network, the labels are, therefore, changed by subtracting the predicted strength made by the third-degree polynomial model. That is, the labels for the ML algorithm are the residuals of the polynomial model of the yield stress vs porosity. Using this method, we try to restrict the CNN to model only the information hidden in the material structure beyond the porosity contribution.
We train the neural network using an Adam optimizer with a learning rate of 0.0001. We apply dropout between the last convolutional layer and the fully connected layer to prevent overfitting by making training harder for the model, avoiding co-adaption, and resulting in a more generalized model.29 We use a weight decay of 0.01. Mean squared error (MSE) was used as the loss function. We applied a 10-fold cross-validation in the training process. The training was carried out in 2000 epochs, and the error on the validation dataset did not show signs of overfitting. The performance of this CNN model is shown in Fig. 2(c) where model predictions are plotted vs the true yield stresses of unseen test data that were created after model training. This graph shows a clear improvement on the polynomial model [Fig. 2(b)], and the CNN model captures most of the variability of the yield stress that is not explained by the polynomial model.
To further improve the yield stress prediction model, we apply an active learning approach.30 Active learning is particularly useful where acquiring data are expensive, and increasing the model quality with a moderate amount of data is desirable. In the present work, we devised a method to iteratively generate additional samples based on variance from the polynomial fit model and use these samples to train new generations of the model.
For each generation, we randomly create 1 × 106 geometries with porosity in the range [0, 0.5] using the same approach as for the base dataset. We find the predicted strength of all these samples by passing them through the trained ML model. These samples are then divided into ten evenly spaced bins, ranging from the lowest to the largest predicted deviation from the polynomial model. We randomly select ten samples from each of these bins to realize through MD simulations. This binning procedure ensures a more uniform selection of data as measured by the deviation from the polynomial model than would result from the procedure generating the base dataset. The realized samples are added to the training dataset, the model is trained in the same way as for the first generation, and we continue to the next generation.
The model was trained for 50 generations. To evaluate the effectiveness of the active learning process, we compare the active learning-based model to another model trained on the same number of samples, but where all the training data were generated in the same way as the base dataset. The evolution of the mean square error values through generations averaged over ten-folds of the cross-validation process with and without active learning is compared in Fig. 3(a). These values are lower when active learning is applied, and the standard deviation suggests a meaningful overall advantage of active learning. This method also improved the efficiency of model training, as a lower number of epochs is required for the model to reach the minimum loss with active learning, Fig. 3(b).
To further explore what the trained CNN has learned about the relationship between structure and yield stresses, we use it to find the strongest and weakest systems in a randomly generated dataset. We generated 3000 random samples with simplex noise with porosity equal to 0.10 ± 0.01 and predicted the yield stress for these samples using the trained CNN after 50 generations. Although these samples lie in a small range of porosity values, they differ significantly in terms of geometry since they have been generated with the full range of scale and octaves. We expect these differences to affect the yield stress of the systems. Figure 3(c) shows the geometries that were predicted to be the weakest and strongest samples in this sample set under shear deformation. Numerous small and irregular voids tend to result in systems with the lowest yield stress. Systems with medium-sized, almost round, and dispersed voids are the strongest systems. These observations fit our physical understanding of the relationship between the geometry of the porous layer and the strength of the system under shear and tensile deformation, regardless of the value for porosity.
Figure 4 presents a sample of the strongest systems predicted by both the tensile deformation and the shear deformation model at a porosity of 0.1 ± 0.01. The main feature of these geometries, when compared to the geometries shown in Fig. 1, is that the porosity is concentrated in a few relatively regularly distributed blobs. There also seems to be a slight difference in the geometries depending on the type of deformation applied to the system. Systems with higher yield stress under shear stress seem to have voids roughly aligned on a line, whereas systems with higher yield stress under tension have a more scattered placement spreading the voids out in two dimensions. Note that these patterns are qualitative observations that can be investigated in more detail.
To get a glimpse of what physics our model has learned, we use saliency maps,22 to evaluate what parts of the geometries determine the yield strength. A saliency map represents the degree of importance of each pixel on the model's output. Here, we compare the saliency maps with the von Mises stress fields measured with MD simulations. Figure 3(d) shows saliency maps and von Mises stresses, for example systems under shear. Regions of high saliency and high von Mises stress coincide, which indicates that the salient features of our ML model are the same as the salient features of the physical system. This is an indication that the information extracted from only the image representing the porous layer in the system to predict the yield stress is physically relevant and meaningful. In addition to that, saliency maps may be capable of capturing more than the stress field. For example, the points inside the voids can also get non-zero values, meaning that adding atoms to empty spaces with high saliency values results in a larger change in the strength of the modified geometry.
To investigate the model generalizability, we test it on systems with a different structure from the training data. We have created 400 samples with one circular void. The location and size of the void in each system are chosen randomly. Figure 5(a) shows the model performance measured by RMSE, bias, and R2 for these out-of-sample data. The predictions are more stable, and bias values are closer to zero over generations in the active learning case than in the no active learning case. RMSE and R2 are not significantly different for this set of samples.
We have shown that convolutional neural networks can model the relationship between system geometries with random properties. These models can capture more complex spatial features of the systems with defects compared to the polynomial fit of system strength to porosity. Saliency maps show how correspondence with von-Mises stress fields, which can be interpreted as a measure of the physical understanding of the CNN. The active learning approach used in this work had a minor advantage, mostly for out-of-sample data. In general, this idea can be extended further to produce a more efficient pipeline of training neural networks to map material structures to their physical properties, and also to create generative models14,18 on the same basis to design material structures with given properties. This pipeline can also be generalized to more complex systems like the interaction of rough surfaces represented as a 2 + 1 dimensional surface model or 3-dimensional disordered porous systems.
This project has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 945371 and from The Research Council of Norway Project No. 287084.
Conflict of Interest
The authors have no conflicts to disclose.
Fahimeh Najafi: Formal analysis (equal); Investigation (equal); Methodology (equal); Visualization (equal); Writing – original draft (equal). Henrik Andersen Sveinsson: Formal analysis (equal); Investigation (equal); Methodology (equal); Supervision (equal); Writing – original draft (equal). Christer Dreierstad: Formal analysis (equal); Investigation (equal); Methodology (equal); Visualization (equal); Writing – original draft (equal). Hans Erlend Bakken Glad: Formal analysis (equal); Investigation (equal); Methodology (equal); Visualization (equal). Anders Malthe-Sørenssen: Formal analysis (equal); Investigation (equal); Methodology (equal); Supervision (equal); Writing – original draft (equal).