Transition metal nitride alloys possess exceptional properties, making them suitable for cutting applications due to their inherent hardness or as protective coatings due to corrosion resistance. However, the computational demands associated with predicting these properties using ab initio methods can often be prohibitively high at the conditions of their operation at cutting tools, that is, at high temperatures and stresses. Machine learning approaches have been introduced into the field of materials modeling to address the challenge. In this paper, we present an active learning workflow to model the properties of our benchmark alloy system cubic B1 Ti Al N at temperatures up to 1500 K. With a minimal requirement of prior knowledge about the alloy system for our workflow, we train a moment tensor potential (MTP) to accurately model the material’s behavior over the entire temperature range and extract elastic and vibrational properties. The outstanding accuracy of MTPs with relatively little training data demonstrates that the presented approach is highly efficient and requires about two orders of magnitude less computational resources than state-of-the-art ab initio molecular dynamics.
I. INTRODUCTION
High hardness1–4 and corrosion resistance5 are among the key properties of transition metal nitride alloys, leading to a high industrial interest. The demand to accurately model these alloy systems for coating applications has its origin in the need to gain understanding of their performance during operation and to accelerate knowledge-based design of new alloys. For hard ceramic coatings, solving the task at high temperature and stress conditions, i.e., imitating a cutting operation, can be essential to both estimate cutting tool lifetime or find new potential material systems for niche applications. Operational temperatures for TiN based hard coating can reach up to 1300 K,6,7 and a good theoretical understanding of material parameters at elevated temperatures are of great importance to the cutting tool industry. Ti Al N alloy systems are commonly known to exhibit age hardening due to spinodal decomposition at elevated temperature8,9 but increasing temperature beyond a certain point will cause the material to break down and significantly decrease its hardness and fracture toughness.1,10
While simpler ab initio calculations can provide an insight into the stability or elastic properties of these alloy materials,11 calculating these properties at elevated temperatures often comes at a significantly higher computational demand.12,13 Furthermore, simulations carried out to gain understanding of, e.g., dislocations or crack propagation in hard coating systems usually require a large simulation size, often combined with long time scales, to extract useful results.14,15 While sophisticated ab initio molecular dynamics (AIMD) approaches have been successfully used to model certain properties of industrially relevant material systems,16 the excessive computational demand limits both size and time scale and often require simplifying simulations in order to make them viable, compromising the accuracy of results. Some methods developed in recent years, such as the temperature dependent effective potential (TDEP) approach,17 have been successful in efficiently predicting vibrational properties, thermal conductivity, and free energies at elevated temperatures.18 In large scale simulations, classical (empirical) interatomic potentials have been used to model crack propagation and fracture toughness in Ti Al N alloys to shed some light on the behavior of ceramic systems under stress at operational temperatures.14,19,20 However, while most available approaches are reliable within their own respective fields, they generally come with one or more drawbacks.
Machine learning interatomic potentials (MLIPs) are becoming popular because a well-trained potential can be used to accurately model a range of different material properties at exceedingly low computational demand.21–24 With a multitude of available material databases25–30 full of ab initio data or even experimental measurements, machine learning (ML) approaches are well on their way to become a state-of-the-art tool in materials science. In particular, neural network based approaches, such as crystal graph convolutional neural networks (CGCNNs), have shown great predictive capabilities: CGCNNs trained on alloy data from the Materials Project database were shown to predict elastic properties of new alloy systems rather well,31 and re-training the CGCNN with additional ab initio data greatly reduced the prediction error.32 Machine learning force field approaches, such as the Gaussian approximations potentials (GAPs) or moment tensor potentials (MTPs), can substantially accelerate large scale simulations when trained on ab initio data.33,34
While ML approaches with MTPs have shown to have excellent predictive capabilities,35 training a MTP requires a diverse training set (TS), which should include data points from throughout most of the phase space of a material system. Browsing through the large material databases25–30 of today, it is easy to assume that the large number of materials studied at varying conditions are a diverse enough data set to train on, but when focusing on a single material system with a given composition, available data often turns out to be too homogeneous to directly train an accurate MTP. On top of that, training a potential on data obtained from these databases requires full trust in the integrity of the data. Constructing a TS by mixing results obtained through different approaches and/or with varying accuracy is a recipe for disaster when it comes to any ML method.
Learning on the fly, or active learning (AL), has, therefore, been suggested as a scheme to circumvent the lack of data when using ML approaches.36,37 Using an AL setup, we can iteratively improve a pretrained potential and extend its predictive capabilities. The initial data used to pretrain the MTP does not need to be of high accuracy, quite the opposite: the lower the cost to generate an initial TS, the better. However, for an efficient AL workflow, even the initial TS should already be diverse enough to cover a large part of the phase space, even if at questionable accuracy.
In this study, we present an AL workflow and apply it to the prototypical hard coating alloy Ti Al N. Using the MLIP-2 package38 interfaced with the large-scale atomic/molecular massively parallel simulator39 (LAMMPS), we present a method of investigating a material system with a minimal requirement of knowledge about the studied system. We have trained an MTP that has exceptional predictive capabilities: calculated elastic and vibrational properties compare well with published data from AIMD, TDEP, or other approaches. The computational demand for the presented AL workflow is orders of magnitude lower than conventional AIMD, while maintaining higher accuracy, and more versatile in its use than methods of similar computational demand, such as TDEP.
II. METHODOLOGY
In the investigation of Ti Al N presented in this paper, simulations were run in parallel at five different temperatures ranging from 300 to 1500 K, but data from all temperatures were combined and used to train a single MTP, which could accurately model Ti Al N throughout the entire temperature range.
A. Moment tensor potentials
When fitting the energies of a TS containing atomic configurations, each configuration will provide a linear equation of the form in Eq. (3). If the number of basis functions is , and , the overdetermined set of equations is represented by a matrix . The upper square matrix is constructed to contain the most linearly independent equations, which correspond to the most diverse atomic configurations, called the active set.
Equation (4) presents a simple way to calculate the respective change in determinant when exchanging each of the configurations in the active set with the new configuration. The highest of the obtained values is then taken as the extrapolation grade of the new configuration, as defined by Eq. (5). For the study presented in this paper, we have used the so-called 10g MTP with 232 basis functions to start our AL workflow and later switched to the 16g MTP with 380 basis functions for higher accuracy.
B. Generating low-accuracy data
To generate the initial TS used to train a low-accuracy MTP for Ti Al N, the phase space was crudely sampled by running AIMD simulation with sampling of the Brillouin zone being reduced to only the -point to equilibrate the system at the desired temperatures and extract training data. This approach significantly reduced the computational demand of obtaining initial training data but yielded results with questionable accuracy. Assuming minimal prior knowledge about the Ti Al N alloy system, we did not make use of known or published Ti Al N data for our initial guess. As shown in Fig. 1, we relied on data from related or similar systems, in this case cubic B1 AlN and cubic B1 TiN. Both of these simple systems have well known and published values for both equilibrium lattice parameter and coefficients of linear thermal expansion , from which we obtained initial guesses for Ti Al N by linear interpolation. The validity of such an approximation, along with whether information about similar systems is available, can have an impact on the overall computational demand of this step as a bad initial structure will require a longer AIMD simulation to equilibrate the system.
With the initial structures for all five temperatures determined, a -point NVT AIMD simulation is run for 0.5 ps, mainly to let atoms relax from their initial positions and obtain reasonable atomic velocities corresponding to the temperature of the simulation. The obtained atomic configurations are then used as input in a -point NpT AIMD simulation, which is run for long enough to allow the structures to reach equilibrium at their respective temperatures. In this paper, the initial guess was good enough to only require roughly 1000 time steps to fully equilibrate the systems, but AIMD simulations were allowed to run for up to 7500 time steps to sample the phase space of Ti Al N.
When constructing the TS and validation set (VS) from AIMD data, care has to be taken that all data before the system has reached its equilibrium is removed, which may vary depending on the material system in question and the starting guess of the structure. In our approach, we discarded the first 2000 time steps of the simulation. The next 2000 time steps from each temperature were then combined into a single TS, which will be referred to as full TS, before another 1000 time steps from every simulation were removed. These served as a buffer between training and validation data, to ensure that the two data sets were not directly correlated. Remaining data points acted as a VS for the initial potential.
C. Initial training and active selection
Initial training of the first MTP was done with the 10g MTP and 100 atomic configurations (cfgs), randomly selected from the TS. Since MTPs are randomly initialized when training an empty potential from scratch, three potentials were trained on the selected data and evaluated based on their ability to predict data in the VS. After this, the best potential was chosen to carry out an active selection of new data points from the TS, as visualized in Fig. 2, before training a new set of IPs from scratch and repeating the process to iteratively increase the amount of data the IPs are trained on. In this paper, two iterations of active selection were carried out, with 50 new cfgs added in each, so that the initial low-accuracy MTP was trained on 200 time steps from -point AIMD trajectories. Exceeding training beyond this point is not advised, as the trade-off of achieving greater stability in the MTP comes at the expense of introducing more inaccuracies into the TS, which need to be removed at some point during the AL process.
D. Active learning
For the AL process, we used the LAMMPS interfaced with the MLIP-2 code package to run MD simulations with the trained IPs. The process of a single iteration is schematically visualized in Fig. 3. Each iterations started by running 16 differently seeded MD simulations at every temperature, i.e., 80 MD runs in parallel, with extrapolation grades being calculated for the atomic configurations after every time step. Atomic configurations with extrapolation grades [see Eq. (5)] above the selection threshold were added to a data set of potential new data to learn from, while encountering configurations above the breaking threshold would stop the MD simulation. Initially, the selection threshold was set to 2.0 and the breaking threshold to 5.0, but as the potentials predictive capabilities improved, we opted to lower the selection threshold to 1.1. Changing the breaking threshold has no effect on the convergence of the potential once it is trained enough to run full length MD simulations without crashing, but a sufficiently low value should be chosen too prohibit the initial potential from selecting and training on unphyscial atomic configurations.
Using the same active selection process as in the initial training, the ten most beneficial data points were extracted from the set of potential new training data, and ab initio calculations were carried out to obtain accurate energies, forces and stresses. The ten accurately labeled data points were then added to the TS of the potential, which was retrained before starting the next iteration. Note that retraining a potential from a previous iteration does not benefit from training and comparing multiple potentials, since no random initialization is taking place.
In the first couple of iterations, the MTP performed poorly, and MD simulations only ran for a few time steps before breaking, but as the TS was expanded with more high-accuracy data, the MD runs eventually no longer broke. For MD simulations in this AL process, we set that maximum number of time steps to 20 000. Iterating the process described in Fig. 3 improved the potential, but there are two important things to keep in mind: First, the TS still contained low-accuracy data on which the MTP may rely to make its predictions, and second, at some point, increasing the number of high-accuracy data will no longer have an effect on the performance of the IP, unless we switch to a MTP of higher complexity.
E. Strategies for low-accuracy data removal and converging the potential
The switch to a higher complexity MTP, in this case the 16g MTP, was done after 20 iterations of the AL workflow, as visualized by the black dashed-dotted line in Fig. 4(a). At this point, the 10g MTP was sufficiently trained to no longer find any new atomic configurations above the selection threshold. Training of three new IPs was carried out using the empty 16g MTP, and the best one chosen based on the prediction error on the training data. The small spread of lattice parameter in newly selected data points in Fig. 4(a) shows that MD simulations with the new MTP initially broke early, but the active selection of new training data quickly mitigated this behavior. At this point, however, the low-accuracy data obtained from -point AIMD was still part of the TS used to train the 16g potential.
We chose to remove the 200 low-accuracy data points in two steps, removing first 100 and allowing the MTP to recover before removing the remaining -point AIMD data. The need for the potential to recover, seen in the data selection after dashed lines in Fig. 4(a), indicates the potential does still heavily rely on the low-accuracy data even after many AL iterations. The same behavior can be seen in Fig. 4(b), where the three spikes in average extrapolation grade correspond to the switch of potential and the two steps of discarding low accuracy data from the TS. Removal of these points caused the potential to crash and MD simulations would once again break after only a few time steps as the potential selects new high-accuracy data points to replace the discarded low-accuracy configurations.
The AL workflow was run until the potential had recovered and was once again no longer selecting new configurations to add to the TS, after which the selection threshold was decreased to further increase the accuracy of the IPs predictive capabilities. We chose to continue running the AL process with a lower selection threshold of until the MTP no longer selected new atomic configurations. The final MTP was trained on a TS containing a total of 600 data points labeled by high-accuracy ab initio calculations.
III. CALCULATIONAL DETAILS
We model the Ti Al N system with a supercell of the primitive cell of cubic B1 crystal structure, containing 128 atoms. Ti and Al atoms were distributed on the metal sites of the supercell according to the Special Quasi-random Structure42 approach as implemented in the Alloy Theoretic Automated Toolkit.43,44
A. Ab initio calculations
All ab initio calculation were carried out using the Vienna Ab initio Simulation Package45 (VASP) with the projector augmented wave46 (PAW) scheme. The plane wave energy cut-off was set to 600 eV in accordance with published convergence tests for the Ti Al N alloy system. The exchange-correlation functional was approximated using the generalized gradient approximation (GGA) as described by Perdew, Burke, and Ernzerhof47 (PBE).
For AIMD simulations run in the initial data generation scheme, the Brillouin zone was only sampled at the point, and the time step size was set to 0.5 fs. High-accuracy VASP calculations for labeling of newly selected training data used a k-point mesh generated with the Monkhorst–Pack scheme.48
B. Molecular dynamics simulations
Results of material properties were obtained from LAMMPS MD simulations run with the fully trained potential and a time step of 1.0 fs. For elastic constants, we follow the method of finite cell distortions described by Tidholm et al.,16 but account for thermal expansion by first running LAMMPS NpT MD simulations at all temperatures. Energies were then extracted from LAMMPS NVT MD simulations, discarding the first 5000 time steps of the simulation to allow for equilibration of the system. The stress tensor of the system at different temperatures and with different distortions was then extracted by averaging over the remaining 15 000 time steps.
Phonon dispersions and vibrational free energies were instead extracted from lengthy LAMMPS NVT simulations run for at least 100 000 time steps. Atomic velocities from each time step are saved and analyzed using the implementation of the velocity autocorrelation function in the DynaPhoPy49 python package. The process is visualized in Fig. 5. From these long MD simulations, the first 20 000 time steps are disregarded.
IV. RESULTS
Throughout Secs. IV A–IV D, error bars for the AL approach presented in figures were calculated based on the standard deviation calculated from five different IPs, each a 16g MTP trained from scratch on the final TS containing only high accuracy data.
A. Thermal expansion
By extracting average volume from MD simulations at the five different temperatures, the thermal expansion of the Ti Al N system was calculated. Figure 6(a) shows results with the MTP trained with the AL workflow compared to values obtained from experiment,50 TDEP,12 and machine learning with MTP trained from the full high-accuracy AIMD simulations in Ref. 16, which we in the future will refer to as passive learning (PL).35 In Fig. 6(b), the relative change of lattice parameter is instead shown, with the reference value of corresponding to the lattice parameter at 300 K.
Figure 6(a) shows that GGA based theoretical approaches overestimate the lattice parameter by . Both MTP based methods deviate from TDEP results at higher temperature, but are well in agreement with each other, even though PL MTPs were trained on data taken from the AIMD NVT simulations with the lattice parameter set to the TDEP value. The relative change in lattice parameter in Fig. 6(b) shows that values from all theoretical methods mostly lie within error bars of the experimental measurements, though TDEP results seem to diverge more rapidly from experimental values with increasing temperature. Calculations carried out for labeling training data in our AL workflow used a denser mesh for Brillouin zone integration than TDEP and results with the AL MTP are, therefore, considered to be of higher accuracy.
B. Elastic constants
Elastic constants, calculated for the B1 Ti Al N system equilibrated at each respective temperature, are shown in Fig. 7 along with results from AIMD NVT and PL approaches.
For most part, agreement between all three methods is remarkable, and values for both AIMD and MLIP elastic constants are within or just slightly outside the error bars for the AL approach. For the elastic constant, a larger discrepancy can be seen with increasing temperature, likely due to the differences in thermal expansion seen in Fig. 6. However, to investigate this deviation, our AL trained MTP was used in combination with the TDEP to calculate the elastic constant at 1500 K, the results of which are shown in Table I. While both the PL and AL approach differs from NVT AIMD results by 9 GPa and 16 GPa, respectively, calculations with the AL MTP at TDEP yield a value closer to the NVT AIMD value. However, due to the presence of residual stresses in the NVT AIMD simulations, values obtained from the AL MTP with the relaxed structure are considered to be the most accurate.
C. Phonons and vibrational free energy
Vibrational properties of B1 Ti Al N were extracted via the velocity autocorrelation function from long MD simulations. Figure 8 compares the phonon density of states (DOS) at 300 and 1500 K. The phonon DOS at 300 K is in agreement with TDEP data published in the supplemental material of Ref. 12. Similarly, we observe a shift in the phonon DOS to lower frequencies with increasing temperature. This shift contributes most to the shift in free energy at high temperature.12
From the phonon dispersion obtained with the Ti Al N supercell, the unit cell dispersion was calculated by unfolding as implemented in the UPHO package,52 shown in Fig. 9 at 300 K.
To further test the reliability of the MTP trained via the AL workflow, we also extract vibrational free energies at all five temperatures, see Fig. 10. Here, results from MD simulations with our MTP are compared to state-of-the-art methods for modeling anharmonic effects, such as Thermodynamic Integration (TI) and the TDEP method with the Symmetry-Imposed Force Constant (SIFC) approximation.12 For the PL MTP approach, vibrational free energy was only calculated for an MTP trained at 1500 K.
Figure 10 shows that the results of the AL approach closely follow the same trend as the SIFC-TDEP method but start to deviate more at higher temperatures. At 1500 K, both MTP based methods differ from the SIFC-TDEP and TI results by about 0.1 eV/f.u. However, this is due to the differences in lattice parameter between our calculations and TDEP/TI calculations. Indeed, using our AL trained MTP with the TDEP yields a value much closer to published results, as seen in the inset in Fig. 10. Note that discussing the difference between the TDEP and the lattice parameters predicted by AL MTPs at high temperature we noted that the latter seem to better follow the experimental trend. Thus, the temperature dependence of vibrational free energy from AL MTPs should also be more accurate.
D. Computational demand
The obvious advantage of the AL MTPs is its high efficiency. The total amount of core-hours spent on the AL approach presented in this paper and other methods that have similar accuracy is given in Table II. At the top, unrivaled in its vast demand for computational resources, is the AIMD study by Tidholm et al.16 The study with the SIFC-TDEP approach12 requires an order of magnitude less computational resources to obtain phonons and vibrational free energies of Ti Al N, with accuracy close to that of thermodynamic integration but does not include any elastic properties of the system. It should be noted that the AIMD study did not account for thermal expansion by equilibrating the systems in an NpT simulation but rather used the lattice parameters found with the TDEP approach. With proper equilibration of the system in an NpT AIMD at the same level of accuracy, the computational demand would increase by at least three to four times.
Method . | Computational demand . | |
---|---|---|
VASP NVT AIMDa | 6 348 000 | |
SIFC-TDEP | 320 000 | |
Active learning MTP | ||
Initial training | 20 000 | |
Active learning 10g | 25 000 | |
Active learning 16g | 60 000 | |
Final MD simulations and result extraction | 15 000 | |
Total | 120 000 |
Method . | Computational demand . | |
---|---|---|
VASP NVT AIMDa | 6 348 000 | |
SIFC-TDEP | 320 000 | |
Active learning MTP | ||
Initial training | 20 000 | |
Active learning 10g | 25 000 | |
Active learning 16g | 60 000 | |
Final MD simulations and result extraction | 15 000 | |
Total | 120 000 |
Lattice parameters at elevated temperature taken from TDEP
In a recent paper, we have investigated the machine learning approach of the MTPs as implemented in the MLIP-2 code. While the trained potentials have good predictive capabilities for both elastic and vibrational properties, the small resource usage of 5000 core-hours does not account for the full computational cost, given that the MTP was trained on data obtained from the resource intensive AIMD approach.
For the full AL approach, from start to finish, no more than 120 000 core-hours were used. The majority of resource usage lies within the high-accuracy VASP calculations during the AL iterations, with the initial data generation and training requiring about the same amount of computational resources as the final simulations. It should be noted that for the final MD simulations, the majority of computational demand stems from the extraction of results, mainly analysis of the lengthy MD runs to obtain vibrational properties via the velocity autocorrelation function. Simulations with the final AL potential are effortless enough to run on modern PCs or laptops for system sizes up to 1000 atoms and with simulation times up to nanoseconds, making them comparable to novel methods using empirical potentials for materials modeling.53 However, the required effort of designing these empirical potentials is hard to quantify, making a direct comparison to the calculated computational demand of the AL approach difficult.
V. CONCLUSIONS
We have presented an active learning workflow and have used it to calculate elastic and vibrational properties of cubic B1 Ti Al N with minimal requirements of prior knowledge about the material system. Comparison of the thermal expansion and elastic tensor to studies using NVT AIMD or passive learning of MTPs shows that the active learning approach can reproduce earlier results exceptionally well. Similarly, vibrational free energies and phonons calculated with the MTP obtained from our workflow are in agreement with values obtained from TDEP and thermodynamic integration. Differences between our results and data from the literature have been attributed to TDEP results overestimating thermal expansion at 1500 K by roughly compared to active learning MTPs, with our results being in better agreement with experiment. Besides exceptional accuracy at a low computational demand, the versatility of the trained MTP makes the active learning approach a highly attractive tool for simulations of material properties.
ACKNOWLEDGMENTS
The authors gratefully acknowledge the financial support from the VINNOVA Excellence Center Functional Nanoscale Materials (FunMat-II) Grant No. 2022-03071 and from the Knut and Alice Wallenberg Foundation (Wallenberg Scholar Grant No. KAW-2018.0194). F.T. and I.A.A. are grateful to support from the Swedish Government Strategic Research Area in Materials Science on Functional Materials at Linköping University (Faculty Grant SFO-Mat-Liu No. 2009 00971) and the Swedish Government Strategic Research Area in the Swedish e-Science Research Centre (SeRC). Computational resources were provided by both the Swedish National Infrastructure for Supercomputing (SNIC) and the National Academic Infrastructure for Supercomputing in Sweden (NAISS) through access to the National Supercomputing Centre (NSC) and Paralleldatorcentrum (PDC) in Sweden and the CSC IT Center for Science in Finland.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
F. Bock: Conceptualization (equal); Data curation (lead); Formal analysis (lead); Investigation (lead); Methodology (equal); Software (supporting); Visualization (lead); Writing – original draft (lead); Writing – review & editing (equal). F. Tasnádi: Conceptualization (equal); Data curation (supporting); Formal analysis (supporting); Investigation (supporting); Methodology (equal); Resources (equal); Software (lead); Supervision (equal); Validation (equal); Visualization (supporting); Writing – review & editing (equal). I. A. Abrikosov: Funding acquisition (lead); Project administration (lead); Resources (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings included in this study are available from the corresponding author upon reasonable request.