Neural network potentials (NNPs) can greatly accelerate atomistic simulations relative to ab initio methods, allowing one to sample a broader range of structural outcomes and transformation pathways. In this work, we demonstrate an active sampling algorithm that trains an NNP that is able to produce microstructural evolutions with accuracy comparable to those obtained by density functional theory, exemplified during structure optimizations for a model Cu–Ni multilayer system. We then use the NNP, in conjunction with a perturbation scheme, to stochastically sample structural and energetic changes caused by shear-induced deformation, demonstrating the range of possible intermixing and vacancy migration pathways that can be obtained as a result of the speedups provided by the NNP. The code to implement our active learning strategy and NNP-driven stochastic shear simulations is openly available at https://github.com/pnnl/Active-Sampling-for-Atomistic-Potentials.
INTRODUCTION
Atomic-scale simulations enable the examination of microstructural evolutions in crystalline solids. Density functional theory (DFT), in particular, is routinely used to simulate the effects of deformations and establish diffusion characteristics of point defects, such as vacancies. However, the associated high computational cost restricts such ab initio simulations to idealized systems. Interatomic many-body potentials, parameterized for specific elemental metals, and binary and ternary alloys, are often used in lieu of ab initio calculations.1–3 While such potentials can provide good agreement with experiment,4–6 it is nontrivial to modify the potentials to accommodate variations in chemical composition. Alternatively, the versatility of neural network potentials (NNPs) could enable more-accurate atomic-scale simulations of larger, more realistic systems (such as those containing grain boundaries), and longer-timescale molecular dynamics (MD) simulations, while avoiding computational restrictions inherent to ab initio methods. Additionally, an NNP fitted on a particular atomistic system can be used as a starting point to train a new NNP on a different system, reducing the cost to fit an NNP to novel atomic systems.
NNPs can be formulated to scale linearly with system size, and a well-trained NNP can produce results with errors comparable to the level of theory at which the training set was calculated.7,8 Moreover, a well-trained NNP can accurately capture a variety of properties related to defects in crystalline structures.9 The goal of training an NNP is to approximate the relation between potential energy and relative atomic positions that constitute the potential energy surface (PES). Neural networks have proven to be useful fitting functions for high-dimensional nonlinear systems, making them a practical mechanism for describing complex interactions inherent in a PES.10,11 For more information, we direct readers to Kocer et al.’s overview of NNPs,12 Pinheiro et al.’s broader survey of machine learning (ML) potentials, which includes a section on NNPs,13 and Westermayr et al.’s perspective on integrating ML methods into computational chemistry and material science.14
The quality of the NNP fit explicitly relies on the given training set. Training a neural network with an imbalanced training set will lead to a highly accurate representation of the PES in over-represented samples and low accuracy in under-represented samples.15 Such biased NNPs have limited predictive ability and require additional effort, via specialized sampling schemes or additional data collection, to compensate for training set biases.16 While it would be efficient to repurpose reported data to train an NNP, computational studies set up to investigate mechanisms and PESs for specific phenomena necessarily span a limited space of structural and compositional complexity. This focus on specific phenomena translates into imbalanced datasets that are not suited for training a generalizable NNP.
An alternative method is to generate a new, carefully selected dataset for the sole purpose of training an NNP. The selection of configurations for a training set is often a laborious process, requiring a considerable number of trial and error steps.17 Furthermore, obtaining robust, unbiased chemical training sets has proven difficult due to the large configuration space occupied by many-atom systems. Castin et al.18,19 outlined an a priori strategic sample selection approach for training NNPs. While such an approach was shown to produce accurate vacancy migration energies in crystalline materials, including Cu–Ni, the training set compilation scheme requires targeted ab initio simulations to be performed prior to training, increasing computational costs. Moreover, the authors advise that the validation set be of similar size to the training set to accurately capture the NNP error for novel configurations, further adding to the cost. Shaidu et al.20 developed a self-consistent scheme that couples DFT and MD in an evolutionary algorithm to compute structures on-the-fly during NNP training, while several studies21,22 have reported on the autonomous creation of large training sets based on optimization of the entropy of atomic descriptors. Such autonomously compiled training sets have been shown to improve the robustness of NNPs toward out-of-distribution predictions compared against expert-curated training sets. Nonetheless, these methods require targeted ab initio simulations to be performed, which may not be practicable with limited computational resources.
A number of studies have examined automated and/or on-the-fly strategies for sampling from pre-existing datasets. Tsymbalov et al.23 used a Gaussian process to sample molecules from the QM9 dataset and found that only 7500 molecules were needed to reach a root-mean-square error (RMSE) of 2 kcal/mol in the prediction of absolute energy, compared to 15 000 using random sampling. Xu et al.24 applied an enhanced self-organizing, incremental neural network to construct training datasets with minimum redundancy for several small-molecule examples, notably reducing redundancy in the ANI-1 dataset. Alternatively, active learning incrementally increases the size of the training set during network training by targeted sampling of configurations that show high model uncertainty. A number of active learning strategies have been reported involving on-the-fly sampling25–30 and uncertainty prediction.31–36
In this work, we generate an NNP that accurately simulates shear-induced deformation and point defect behaviors in Cu–Ni multilayers. To this end, we train a modified version of the continuous-filter convolutional neural network (CF–CNN) of Schütt et al., called SchNet,37,38 on configurations generated at the DFT level. We chose to implement the SchNet NNP due to its ability to produce a smooth potential energy surface and energy-conserving force field,39 and demonstrated accuracy in various applications.11,40–44
To avoid bias in the network resulting from an imbalanced dataset, we employ an active sampling strategy during NNP training. We then introduce an NNP-driven stochastic simulation scheme to produce a set of evolutionary pathways for a well-defined precursor system. Utilizing the computational speedups provided by the NNP, this sampling scheme is used to produce an assortment of novel evolutionary pathways. We demonstrate the use of our methods on a set of Cu–Ni multilayers with and without defects (vacancies and vacancy complexes), and obtain a variety of structural configurations that were not included in the NNP’s training set, but were validated against DFT. The code to implement our active learning strategy and perform NNP-driven stochastic shear simulations is openly available at https://github.com/pnnl/Active-Sampling-for-Atomistic-Potentials.
METHODS
Data acquisition
To train the NNP, we consider the energetic and structural changes in a Cu–Ni multilayer under shear strain. The Cu–Ni multilayer was represented using a periodic model with a cubic supercell of 5 × 5 × 5 u.c3. [u.c. corresponds to the crystallographic face centered cubic (FCC) cell]. The Cu and Ni layers consisted of five atomic planes each, stacked in the [001] direction, and joined by a planar interface. Atomically abrupt interfaces in model systems allow for detailed characterization of interfacial phenomena, including intermixing45 and diffusion.46,47 Such model structures can be synthesized via well-controlled, layer-by-layer deposition techniques,45–49 such as molecular beam epitaxy.46,47 Because atomic rearrangement pathways are known to be affected by the presence of defects, such as vacancies, we considered both pristine and vacancy-containing systems—in the latter case, a single vacancy, located either at the Cu–Ni interface or inside the Ni or Cu layer.
Figure 1 depicts the Cu–Ni multilayers examined in this work. In addition to the pristine structure, we examined the following vacancy-containing structures: a single vacancy in either the Ni or Cu layer, located at various distances from the interface [VNi(N1), VNi(N3), VCu(C1), and VCu(C3)] and a pair of vacancies located in the (001) plane near the interface in either the Ni or Cu layer [2VNi(N1) and 2VCu(C1)].
DFT calculations were performed using the Vienna Ab initio Simulation Package (VASP).50–53 The exchange-correlation potential was described by the generalized gradient approximation (GGA) and the Perdew–Burke–Ernzerhof (PBE) functional.54,55 The projected augmented wave method (PAW)56 and a plane-wave basis set with an energy cutoff of 295.4 eV were used. The calculations were conducted at the Γ point only, and spin-polarization was neglected (see the supplementary material for validation of these settings). For static energy minimization, the convergence criterion was set as 10−4 eV.
We apply shear in the [011] direction through repeated cell deformation and static energy minimization, to obtain shear trajectories (Fig. S1). For the training set, shear strain was applied to pristine and vacancy-containing Cu–Ni multilayers VNi(N1), VNi(N3), VCu(C1), and VCu(C3). The application of shear deformation results in an initial force on the atoms, which is minimized in the optimization process. The largest magnitude of the atomic forces at the beginning of each optimization is 0.7 eV/Å. The static energy minimization of shear deformation, in which the crystal structure remained FCC, was typically completed in less than 100 steps, leading to a drop in energy of less than 2 eV. In contrast, optimizations leading to changes in the crystal structure (such as from FCC to HCP) could take up to 600 steps to converge, resulting in a decrease in energy of up to 69 eV.
Each optimization step obtained when computing the shear trajectories represents a distinct sample when training the NNP. Each sample includes the chemical identity of each atom k (k = 1, …, Na), represented by atomic number Zk, coordinates rk of all atoms in the structure, forces Fk exerted on each atom, and total energy E generated in the DFT optimizations.
Ab initio molecular dynamics (AIMD) probes the PES at finite temperatures, which incorporates the effect of thermal fluctuations. In this sense, these calculations provide a more realistic picture of the atomic rearrangements, even though the timescale captured by these simulations is limited due to high computational cost. To include effects of thermal fluctuations in the training set, AIMD calculations were performed at the same level of theory as was used for structure optimization using a canonical (NVT) ensemble at temperatures of 1000 K [pristine, VNi(N1), and VNi(N2)] and 1500 K [VNi(N1) and VNi(N2)] with 400 timesteps (1 fs per timestep) for each shear strain. In this case, the potential energy E, rather than the total energy, was used for the NNP training set.
Neural network potential
In constructing the neural network potential (NNP), we employed the SchNet architecture, which is designed to predict atomic or molecular properties based on atomic numbers Z and positions r by establishing a relationship between a property of interest, either measured or calculated, over a finite region of chemical space.37,38 These relationships consist of a set of parameters (known as the learned network weights) that allow mapping of the set of Zk and rk to the property of interest via supervised training.
An NNP is generated by fitting SchNet to the potential energies obtained from a dataset pre-computed at a specific level of theory. SchNet’s functional representation of the potential energy is infinitely differentiable, which avoids discontinuities in derivatives of the energy as a function of rk and allows atomic forces to be derived from the NNP gradient. In this way, a trained (i.e., well-fitted) NNP can be used to predict the potential energy and atomic forces . The accuracy of NNP can be improved by optimizing toward the reproduction of both E and F by minimization of the loss function :57
where, for each sample, E is the reference energy, Fk the reference forces on atom k located at rk, and ρ establishes the trade-off between accuracy of fitting energy and forces; we set ρ = 0.01.
A schematic of the NNP architecture is shown in Fig. 2. Each atom is given an initial representation based on its atomic number Z. These representations are passed into a series of “interaction” layers, where they are updated based on the representations and distances to neighboring atoms n within a given cutoff radius. To maintain differentiability, the distances to neighboring atoms are described by Gaussian radial basis functions, reminiscent of the two-body terms in Behler’s atom-centered symmetry functions.58 The updated representations are passed through an “atom-wise” multilayer perceptron (MLP) to produce the contribution of each atom to the overall energy. The individual energies then are summed to obtain the total potential energy . The atomic forces are obtained as the negative gradients of the potential energy with respect to the atomic coordinates. See the work of Schütt et al. for a further description of the SchNet architecture.37–39
In this work, the SchNet NNP was composed of two interaction layers with feature vector lengths of 128 and Gaussian radial basis functions of length 50. The chemical environment for each atom was represented within a cutoff distance of 6.0 Å, which leads to ∼80 neighbors per atom. The batch size was set to 32. The learning rate was initialized to 0.0001 with a decay rate of 0.8, applied after no decrease in validation loss was observed for 60 epochs. This learning rate decay scheme was extended over the full training sequence and was not influenced by the active learning iteration.
Active learning scheme
The accuracy of an NNP is determined by the quality of the dataset it is based on. Thus, training datasets should fairly represent the complexity of the configuration space and the PES one seeks to model. Typically, a dataset is split into training, validation, and testing subsets in an 8:1:1 sample ratio; the samples are randomly selected uniformly but, consequently, preserve any bias that might be present in the dataset.
Typical materials systems include regions with high structural and chemical complexity where physical phenomena of interest take place, as well as regions of relatively low complexity that remain largely unchanged throughout the simulation. For example, ∼17% of the reference data used in this work contained structures where every atom has a FCC lattice environment. Because the NNP loss function is defined as the mean squared error over the dataset, regions of high and low complexity should be adequately represented, to avoid sampling bias. The question then becomes: how does one select a subset of samples to use for training that eliminates bias within a dataset. To this end, we develop an active learning scheme that assesses the NNP errors during training and iteratively constructs a training set that incorporates only samples that are predicted with high errors.
This approach, composed of four elementary steps, is illustrated in Fig. 2. We begin training with a small training set (step 1: Train). After a set number of training epochs, we assess the accuracy of the NNP approximation on samples from a reserve set (step 2: Evaluate). At step 3 (Filter), the NNP accuracy is analyzed. If a particular sample is found to have a high error, then it is likely either outside the range of represented data or underrepresented within the dataset, indicating bias in the dataset. Poorly approximated samples are added to the training set (step 4: Include). These four steps are repeated for a set number of iterations to complete the full NNP training cycle.
In this work, the initial training set contained 3000 samples (5% of the total data); the validation set contained 5000 samples (8.5%), and the reserve set contained the remaining nearly 51 000 samples (86.5%). The evaluation method for selecting high-error samples determines which samples are moved from the reserve set to the training set and, therefore, the size of the final training set. We examined 5000 samples from the reserve set at each iteration and identified 30% of those samples with the highest mean error in atomic forces. We then filtered the samples, to include only those with likelihood less than ɛtol, assuming that the error follows a Gaussian distribution with mean μtrain and standard deviation σtrain. This technique quantifies the magnitude of errors relative to the bias and variance of the NNP such that only samples with higher errors than those in the current training set are incorporated into the training set in the next active learning iteration. The value of ɛtol can range from 0 to 1, with higher values indicating a stricter cutoff. Here, we apply ɛtol = 0.15 throughout training. We found that while training for 80 epochs, each iteration allowed the NNP weights to converge within 50 iterations.
SchNetPack amendments
The generation of NNPs can require long training times and copious computational resources, especially if the training set or atomic systems are structurally and compositionally complex as to require a large number of configurations to be probed. Frey et al. introduced a PyTorch-based library to scale deep learning models from molecular graphs, including SchNet, to hundreds of graphics processing units (GPUs).59 In contrast, in this work, we aim to generate NNPs with limited GPU resources.
The relatively large number of atoms in the Cu–Ni system, compared to the commonly used small-molecule datasets such as QM9, initially led to memory problems when attempting to train an NNP using SchNetPack as described. Therefore, we made several amendments to the SchNetPack codebase38 to reduce the GPU memory footprint such that training can support structures with hundreds of atoms, and to improve computational efficiency and reduce training times. The key amendments are summarized as follows (see the supplementary material for details). First, we incorporated Automatic Mixed Precision (AMP) into the training code, which allows training on NVIDIA GPUs with half precision, while maintaining the network accuracy achieved with single precision. Second, we redefined the shifted softplus activation function, to be fully implemented as a PyTorch tensor, to reduce I/O overhead. Third, we used the minimum image convention (MIC)60 to compute distances across periodic boundary conditions on-the-fly during NNP training rather than precomputing and storing cell offset vectors for each atom. Fourth, we moved nearest neighbor computations to a preprocessing step, storing only those neighbors within the cutoff radius. With these amendments, we were able reach a performance of ∼1 min/epoch with a training set of 39 214 structures and validation set of 5000 structures, scaled over 6 NVIDIA A100 GPUs.
RESULTS AND DISCUSSION
To compare our active learning strategy to a typical static learning process, we trained three NNPs on the same network architecture, but with different training set sampling schemes: (1) the active learning proposed in this work, (2) static learning with an approximate random, uniform 8:1:1 train:validation:test split, where all samples were equally likely to be assigned to each set, and (3) static learning that uses the final training and validation sets of (1).
For our NNP trained via active learning, the full dataset was split into training, validation, and reserve sets, as described in the “Methods” section. Fifty active learning iterations were applied, drawing samples from the reserve set for each iteration (see Fig. 2). At the end of this process, the training set consisted of 39 214 samples (66.6% of the total samples), while the reserve set had decreased to 14 659 samples (24.9%). However, by the 25th iteration, 95% of samples in the final training set had been collected, indicating that nearly all the samples required for active learning had been collected in the first half of the process. In the statically trained NNP with random sampling, the full dataset was split into training, validation, and test sets: 47 098 (80.0%) went to training, 5000 (8.5%) to validation, and 6775 (10%) to the test set. The statically trained NNP using the splits produced during active learning had the same 66.6%–8.5%–24.9% split as in case of the final active learning iteration, the difference being that the dataset was unchanged throughout training.
Figure 3 shows key differences between the training sets obtained by uniform sampling and by active learning. Samples with a high fraction of atoms with local FCC lattice environment are greatly over-represented in the uniformly sampled training set, but not in the active learning training set. Samples with predominately FCC lattice structure tend to be of lower energy. Therefore, the majority of samples added to the training set during active learning correspond to higher-energy configurations.
As shown in Table I, the “best” and final training losses obtained for each model are of the order of 10−3, while the validation losses vary from 10−3 to 100. The orders-of-magnitude higher validation losses for the networks trained via static learning (2 and 3) indicate overfitting has occurred. Overfitting occurs when the neural network performs well on the training set (leading to low training loss), but poorly on the validation set (high validation loss), indicating that the network does not generalize to samples outside of the training set. A well-fitted network will have comparable training and validation losses. The orders-of-magnitude higher validation losses indicate that NNPs derived through static learning strategies cannot reproduce the full range of energy–structure relationships represented in the validation set.
Learning strategy . | Best train loss . | Best val loss . | Final train loss . | Final val loss . |
---|---|---|---|---|
Active learning (1) | 3.5 × 10−3 | 8.9 × 10−3 | 3.5 × 10−3 | 9.1 × 10−3 |
Uniform train set (2) | 5.3 × 10−3 | 8.1 × 10−2 | 5.3 × 10−3 | 5.8 × 10−1 |
Active train set (3) | 9.2 × 10−3 | 8.8 × 10−2 | 9.5 × 10−3 | 2.1 × 100 |
Learning strategy . | Best train loss . | Best val loss . | Final train loss . | Final val loss . |
---|---|---|---|---|
Active learning (1) | 3.5 × 10−3 | 8.9 × 10−3 | 3.5 × 10−3 | 9.1 × 10−3 |
Uniform train set (2) | 5.3 × 10−3 | 8.1 × 10−2 | 5.3 × 10−3 | 5.8 × 10−1 |
Active train set (3) | 9.2 × 10−3 | 8.8 × 10−2 | 9.5 × 10−3 | 2.1 × 100 |
The model trained using our active learning strategy has the best (i.e., lowest) validation loss at 8.9 × 10−3 – roughly nine times lower than the best validation loss of the statically trained model with uniform sampling and roughly ten times lower than that of the statically trained model using the final training set obtained from active learning. Active learning can often produce lower error when applied to subsets of the dataset compared to training over the full dataset.61–64 Mussmann and Liang showed that for logistic loss functions, specifically for the task of binary classification, active learning iterations based on uncertainty sampling act as preconditioned stochastic gradient descent steps on the zero-one loss of the training set, which is simply the number of misclassifications made by the network.65 A full theoretical explanation of the improved performance of active learning strategies, in particular, for regression tasks performed in this work, remains an open topic of research.
Though the NNP is trained to reproduce the energy and forces acting on each atom for a set of independent structures, our intent is to use it for structure optimization. However, evaluating an NNP on an independent set of structures does not imply that the NNP is applicable for static energy minimization.66 Indeed, we observed that structure optimizations driven by the statically trained NNPs often suffered from instability, resulting in a failure to obtain a minimum-energy configuration. Moreover, the optimizations that did converge produced nonphysical structural configurations.
In contrast, the NNP trained via active learning reliably produced minimum-energy configurations during structure optimizations that were in line with those observed in DFT simulations. We then examined the performance of this NNP by simulating the structural evolution of Cu–Ni multilayers under shear. Atomic positions rk (k = 1, …, N), where N is the total number of atoms, of the initial structure were defined by the FCC lattice and parameters of a cubic supercell C = lI, where I is the 3 × 3 identity matrix and l is the cubic supercell lattice parameter. We begin the simulation by displacing atomic coordinates rk by small random perturbations Δrk, where elements of Δrk are sampled from the normal distribution centered at zero (μ = 0) with variance σ = 0.05. These random perturbations allow the structure to evolve along different pathways under applied shear and, therefore, provide richer insight into the nature of the PES. The value of σ is selected to be significantly smaller than the interatomic distance, yet large enough to induce consequential variations in the structural evolution of the system across different simulations. For Cu and Ni, perturbations of σ = 0.05 are well below the generalized Lindemann melting criterion.67 Hence, the perturbations do not change the lattice structure as such, but rather correspond to thermally induced fluctuations that are expected to arise due to shear-induced friction.
After the random perturbation is applied to the atomic coordinates, shear deformation is induced by transforming the simulation cell as C ↦ C + δ, where
and δ21 = δ31, which corresponds to the shear applied along the [011] crystallographic lattice direction, with a shear strain magnitude γ of
Then, the energy of each structure, as determined using the trained NNP, is minimized with respect to the internal atomic coordinates using the Atomic Simulation Environment (ASE) implementation of the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm.68 This sequence of randomized perturbations, followed by incremental shear deformation and energy minimization, is repeated until the desired strain γmax is reached. In our case, γmax was set to ∼0.33, which is sufficient to simulate at least two plastic deformation events (see Figs. 4 and 5). We find that by applying this procedure, higher-energy configurations that were not represented in the training sets are accessed.
To test the ability of the NNP to accurately reproduce structure–energy relationships in static energy minimization calculations, we compare the energies and structures of a model system containing a single vacancy VCu(C3), calculated using the NNP and DFT (Fig. 4). We consider two pathways, both of which evolve as a series of plastic deformation events. Figures 4(a) and 4(c) show the potential energies and structural deviations, respectively, for the case where the initial FCC lattice is maintained, with partial Cu–Ni intermixing (γ = 0.14), followed by a transition to an FCC-HCP mixed phase at γ = 0.26. In contrast, Figs. 4(b) and 4(d) show a case with the opposite sequence of transformations: a mixed FCC–HCP phase is formed at γ = 0.14, followed by the transition to an FCC intermixed phase at γ = 0.20.
The first set of PESs [Figs. 4(a) and 4(c)] have nearly identical energies and structures up to γ = 0.26, where there is a small, persistent deviation in both structure and energy between the DFT and NNP predictions. The second set of PESs [Figs. 4(b) and 4(d)] show nearly identical energies and structures throughout the examined range of γ, save for a transient deviation near γ = 0.18, where the DFT simulations predict the HCP-to-FCC transformation at the Cu–Ni interface to occur at smaller γ than the NNP. After this transformation occurs, the DFT and NNP potential energy profiles and corresponding structures realign. Importantly, these structural pathways were not included in the NNP’s training set. Hence, the agreement between the DFT- and NNP-derived PESs showcases the NNP’s ability to provide accurate structure–energy relationships in systems subjected to shear deformation.
Next, we applied the shear deformation procedure to simulate the effect of shear in a pristine Cu–Ni slab and in slabs containing defects (see Fig. 1 for notations). We performed 100 stochastic shear simulations for each system to generate a collection of possible shear-induced transformation pathways. We found that at low shear (γ < 0.1), all stochastically generated paths collapse into a single PES profile. Multiple profiles arise with the onset of plastic deformations, and further increase in the diversity of possible metastable configurations with increasing shear. These NNP-derived PES profiles capture the key features obtained using DFT simulations.
Examining the potential energy profiles shown in Fig. 5 and the corresponding structural changes, we identify three types of plastic transformations induced by shear: defect/HCP type (purple), FCC type (yellow), and defect/HCP-to-FCC type (purple-yellow). The defect/HCP-type rearrangement occurs when {111} or {101} slip planes form, causing a large number of atoms throughout the structure to break from FCC lattice symmetry, forming defects such as stacking faults. This change is accompanied by a relatively small decrease in energy, resulting from the release of shear strain. These defects remain in the system for a large range of shear strain. FCC-type rearrangement preserves the FCC symmetry and leads to a much larger decrease in energy. In this type of rearrangement, defects formed during the intermixing process spontaneously restore the FCC lattice. The defect/HCP-to-FCC type rearrangement shows a unique behavior, in that the FCC symmetry is initially broken by the formation of defects, leading to a small decrease in energy, followed by a return to the FCC symmetry and gradual decrease in energy. In these pathways, slip occurs, causing the crystal to split into planes such that atoms near the interface of the planes deviate from the FCC lattice. With further application of shear, the slip is resolved, and the structure regains FCC symmetry. Notably, the defect/HCP-to-FCC type rearrangement was not observed in any of the simulations in which the vacancy was located in the Ni phase. This can be attributed to the larger cohesive energy of Ni compared to that of Cu. Therefore, once relatively stable defects form during plastic deformation events, it is costly to restore the FCC lattice.
The simulations of potential energy profiles with respect to the shear magnitude show that our coordinate perturbation approach allows us to access multiple distinct metastable configurations. Therefore, as a second form of validation, we evaluated the NNP’s ability to accurately capture the relationships between the structure and potential energies of these metastable configurations.
To this end, we selected 21 higher-energy configurations obtained by static energy minimization of the sheared (γ = 0.33) Cu–Ni systems using the NNP (from Fig. 5) and calculated the potential energies for these configurations using DFT. In all cases, the NNP and DFT energies agree within 0.005 eV per atom, as shown in Fig. 6. The structures were then minimized at the DFT level, and the resulting energies were compared with those obtained using the NNP. With the exception of five configurations, the effect of this optimization is negligible, and the DFT energies agree with the NNP energies within 0.004 eV per atom. In other words, in most considered cases, the NNP is able to identify the local energy minima and assign energy values with an accuracy nearly identical to that of DFT.
The five spurious NNP-derived configurations reached an ∼10 eV lower-energy configuration when optimized at the DFT level (blue → orange). This suggests that the NNP PES tends to be more corrugated than the DFT PES, i.e., the NNP predicts the existence of energy minima where DFT does not. To examine whether NNP reproduced the energy minima found using DFT, we further relaxed the DFT-optimized structures (orange) using the NNP and found that each of the optimized energy minima was within 0.006 eV/atom of the corresponding DFT energy minimum (see Fig. S2). This validation indicates that some of the NNP-derived structures may be artificial, i.e., they do not have corresponding minima on the DFT PES, while the DFT-optimized structures do have corresponding minima on the NNP PES. In this case, spurious minima were observed only for high-energy FCC–HCP, mixed, metastable configurations. Accordingly, the subsequent DFT optimization leads to a significant reduction or elimination of the HCP regions.
Figure 7 further demonstrates the utility of the NNP through examination of repeated simulations of the shear-induced transformation of the 2VNi(N1) system. The presence of vacancies disrupts the local FCC lattice symmetry, allowing the assignment of a unique graph representation for the di-vacancy.69 We can then examine vacancy transformations in terms of transformations of this graph. The di-vacancy begins in the Ni plane of the Cu–Ni interface, and the di-vacancy graph is composed of 12 Ni atoms and 6 Cu atoms. As the shear magnitude reaches γ = 0.119, the accumulated strain relaxes, and the number of Ni (Cu) atoms in the di-vacancy graph decreases (increases) to 9, which is reflective of the di-vacancy migration pathway. The accumulated strain relaxes again when the shear magnitude reaches γ = 0.270. We identified three distinct outcomes of this relaxation that lead to different di-vacancy migration trajectories. In one scenario, shown in red in Fig. 7, Ni atoms diffuse into the vacant lattice sites, i.e., the di-vacancy migrates into the Ni part of the system. In this process, two vacant sites migrate off-plane relative to one another and perpendicular to the shear direction. Notably, the configuration is readily identifiable based on the structure (number and arrangement of vertices) of the di-vacancy graph. In the second scenario, shown in blue, Cu atoms diffuse into the vacant lattice sites in a chain-like fashion, causing the di-vacancy to split into two mono-vacancies. Sections of the interface shift in the [010] direction, and the mono-vacancies are again oriented off-plane, perpendicular to the shear direction, but no longer share nearest neighbors. Finally, in the third scenario, shown in green, the di-vacancy retains its local structure and relative orientation, but migrates into the Cu plane at the Cu–Ni interface. This configuration was found to be the most thermodynamically preferred.
SUMMARY
We demonstrated an active sampling scheme to train a neural network potential (NNP) based on a DFT dataset that was designed to investigate shear-induced intermixing and vacancy migration in Cu–Ni multilayers. The NNP training begins by using a small subset of the dataset, which is autonomously expanded as training progress. Sample inclusion is determined by assessing the error of the NNP prediction on a subset of reserved data to avoid overfitting the NNP to well-learned regions of the structural space. The method successfully filters over-represented examples from the dataset, producing a better fit compared to traditional training methods. We found that simultaneously assembling the training set and training the model is imperative to the performance of our active learning scheme, as training on the complete training set generated during active learning did not improve, but rather worsened, the NNP fit.
We then demonstrated that the NNP is well-suited for structure optimizations of the Cu–Ni system under shear. In particular, we showed that a well-trained NNP can be used to discover numerous pathways on the PES, by applying a stochastic simulation scheme in which small perturbations were applied to the atomic positions prior to structure optimization using the NNP. Using the NNP for atomistic shear simulations yielded significant speedups over DFT, while preserving DFT-level accuracy of the structure–energy relationship: each NNP-driven shear simulation took ∼20 min, while a single DFT shear simulation typically required multiple days. In addition, each simulation is independent, allowing parallel generation of shear deformation pathways. The ability of NNPs to perform an increased number of simulations on larger systems and for longer time periods, while maintaining the level of accuracy at which they were trained, moves atomic-scale modeling one step closer to correspondence with experiment.
Our active learning strategy can be adopted for other material and chemical systems and other neural network architectures. Moreover, because a trained NNP can be readily integrated into a number of optimization and dynamics algorithms, our NNP-driven stochastic simulation method can be applied to other stimuli of interest to examine a range of structure–property relationships. The codebase is openly available at https://github.com/pnnl/Active-Sampling-for-Atomistic-Potentials.
SUPPLEMENTARY MATERIAL
See the supplementary material for the DFT-simulated shear deformation pathways, DFT benchmarking summary, and full description of SchNet code expansions.
ACKNOWLEDGMENTS
This research was supported by the Laboratory Directed Research and Development program under the Solid Phase Processing (SPP) science initiative at Pacific Northwest National Laboratory (PNNL). PNNL is operated for the United States Department of Energy by Battelle Memorial Institute, under Contract No. DE-AC05-76RL01830. The research was performed using resources available through Research Computing at PNNL.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Henry W. Sprueill: Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). Jenna A. Bilbrey: Conceptualization (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal). Qin Pang: Data curation (equal); Investigation (equal); Validation (equal); Writing – original draft (equal). Peter V. Sushko: Conceptualization (equal); Writing – original draft (equal).
DATA AVAILABILITY
The data that support the findings of this study are openly available in PNNL’s DataHub repository at data.pnnl.gov, reference numbers 33219 and 33220. DFT simulation results in the form of VASP OUTCAR and XDATCAR files can be obtained at https://data.pnnl.gov/group/nodes/dataset/33219. Repeated NNP simulations in the form of extxyz files can be obtained at https://data.pnnl.gov/group/nodes/dataset/33220.