We propose a scheme for ab initio configurational sampling in multicomponent crystalline solids using Behler–Parinello type neural network potentials (NNPs) in an unconventional way: the NNPs are trained to predict the energies of relaxed structures from the perfect lattice with configurational disorder instead of the usual way of training to predict energies as functions of continuous atom coordinates. An active learning scheme is employed to obtain a training set containing configurations of thermodynamic relevance. This enables bypassing of the structural relaxation procedure that is necessary when applying conventional NNP approaches to the lattice configuration problem. The idea is demonstrated on the calculation of the temperature dependence of the degree of A/B site inversion in three spinel oxides, MgAl2O4, ZnAl2O4, and MgGa2O4. The present scheme may serve as an alternative to cluster expansion for “difficult” systems, e.g., complex bulk or interface systems with many components and sublattices that are relevant to many technological applications today.

Configurational order/disorder determines many properties of functional materials including mechanical strength,1 catalytic activity,2,3 ion/electron/phonon conductivity,4–6 and so on. Configurational disorder can be predicted using Monte Carlo (MC) methods based on statistical mechanics, but MC methods usually require a huge number of energy evaluations. Because of this, most works in the past have relied on effective low-cost models that are fitted to density functional theory (DFT) calculations.7–9 For solid-state systems that can be mapped onto a lattice, the cluster expansion method10–17 is often considered the de facto standard for obtaining such effective models. This method describes configuration energies using an expansion based on effective cluster interactions (ECIs),

E(σ)=αmαVαΦα(σ),
(1)

where α denotes empty, single, and multi-component clusters, mα is the cluster multiplicity, Vα is the ECI, and σ is the occupation vector. Φα is the cluster basis function, which for a binary system takes the simple form of

Φα(σ)=pασp,
(2)

where p are site indices and σp = ±1 depending on which component occupies site p. Appropriate basis functions for systems with more components can also be generated according to, e.g., Ref. 14. We note that the validity and formal approximations involved in mapping a realistic alloy model with vibrations and intra-atomic excitations to an on-lattice model has been explored rather thoroughly by Ceder.11 

The ECIs are usually fitted to energies of relaxed structures from small-scale DFT calculations and are then used to calculate energies of a much larger supercell in the course of MC sampling calculations. Although this cluster expansion approach has seen much success, especially for metallic alloys, certain limitations and difficulties have also been pointed out over the years. For example, it is known that the accuracy of the cluster expansion energy prediction degrades when there is significant lattice relaxation.18 In fact, cluster expansion with constant ECIs is, in a way, a zeroth order approximation and formally cannot provide an exact fitting, including relaxation effects.15 Another difficulty is in choosing the finite number of clusters that will give the best predictions. For example, Seko and Tanaka reported a spurious prediction of a discontinuous order–disorder transition in an oxide system with long-range interactions when choosing clusters that perform best within limited supercell sizes.19 Still another issue is in how to avoid undesirable training set bias. These issues are, in fact, highly nontrivial, and developing robust methods for cluster and training set selection is still an area of active research.13,17,19–21 Also, cluster expansion becomes computationally demanding in a combinatorial way as the number of components and sublattices increases, and this limits the number of clusters that can be included in the expansion.16 

In view of these issues, we as well as some other workers have opted to bypass fitted models and sample directly on DFT energies.22–25 Using more sophisticated MC schemes that are suited for parallel computation, such as Wang–Landau26 or replica exchange Monte Carlo (RXMC)27 sampling, sufficient statistical sampling has been achieved on calculation models with up to a few hundred atoms. However, our works24,25 required weeks of calculations on up to 100 supercomputer nodes, and those by others are also likely to be as costly although wall times are seldom reported in such papers. Thus, much acceleration is needed if this type of approach is to be used in a high-throughput setting for materials design. Toward this end, we consider, in this work, the so-called “active learning” approach, which has been developed to accelerate first-principles molecular dynamics simulations or MC calculations.28–36 The basic idea is to use machine-learning potentials (MLPs) that have been fitted to DFT results to accelerate the calculations. Since MLPs are usually good at interpolating but not at extrapolating, a relearning is performed when the system wanders into a previously unlearned region of structure space, then the simulation is restarted with the newly tuned potential. Here, we apply this idea to the lattice configuration problem.

Many forms of MLPs have been proposed in the literature;37 in this work, we employ the high-dimensional neural network potential (NNP) scheme proposed by Behler and Parinello (BPNNP).38–41 This scheme assumes that the total energy can be decomposed into atomic energies determined by the environment around each atom and uses neural networks to fit these atomic energies. That is, the total energy is expressed as

E(σ)=iatomsNNPti(f[σiRc]),
(3)

where the configuration vector σ is represented by continuous atom coordinates in contrast to the cluster expansion [Eq. (1)], where σ represents occupations on a discrete lattice. ti represents the atom type, meaning that a unique NNP is fitted for each atom type. σiRcσ represents coordinates of atoms within a cutoff radius Rc of atom i, and f is a fingerprint function that transforms σiRc into a multidimensional descriptor of the atomic environment. To be physically meaningful and transferable, f must be chosen so that the resulting descriptor is invariant with respect to translation and rotation of the system. To this end, several fingerprinting schemes have been devised, including symmetry functions,38–40smooth overlap of atomic positions (SOAP),42 and Deep Potential.43,44 In this work, we employ the Chebychev polynomial-based fingerprint proposed by Artrith et al.;45 the effectiveness of this fingerprint function has been demonstrated, especially for multicomponent systems. There are many other local fingerprints or MLP forms as well as the more recently proposed graph network representations that show potential for enabling construction of universal models for energies, forces, and/or other physical properties covering the entire Periodic Table 46–50 The effectiveness of such alternatives for our purposes is certainly of interest and may be considered in future works.

The main distinguishing point of this work compared to the literature on BPNNPs is that we train the NNP model to predict the total energies of relaxed structures from unrelaxed ideal lattice structures with configurational disorder. This is quite different from the usual approach of learning the total energies as a function of continuous atom coordinates (the only exception we are aware of is Ref. 51, which used the NNP scheme to describe the relaxation energy when a single Cu interstitial is dissolved in amorphous Ta2O5). The obvious merit compared to conventional NNP approaches is that we can bypass structural relaxation, while the demerit is that we lose access to ion dynamics, and that we cannot expect transferability to other lattice structures. Since relaxation calculations in a moderately sized supercell can sometimes take hundreds of steps, the speedup attained in this manner can be significant. On the other hand, the above-mentioned demerits are not an issue when using the NNP model for MC sampling on a lattice (it should be noted that MC sampling in continuous coordinate space using random trial displacements is quite inefficient because atoms are more or less close-packed and such trial steps will almost never lead to exchange of atom sites24).

The task of predicting properties from unrelaxed structures is also of importance in a slightly different context of high-throughput materials search using universal graph network models. For example, the Open Catalyst Project identifies “Initial Structure to Relaxed Energy (IS2RE)” as a critical task for catalyst design.52,53 Traditional graph networks were inept at predictions based on unrelaxed structures because they were trained on relaxed structures to predict, e.g., formation energies or adsorption energies.46,48 However, the issue is being tackled through various approaches, including Bayesian optimization utilizing crystal symmetry54 or augmentation of the training data with unrelaxed structures.55 We tentatively suggest that the on-lattice mapping approach presented in this work may also be useful in this context. Alternatively, one can utilize more recent graph network models that can predict interatomic forces47,49,50 to perform the relaxation procedure, but this can be more than two orders of magnitude slower than a direct approach as already mentioned.

In a sense, our approach is a “lazy” alternative to the cluster expansion method without the need to explicitly consider optimized bases, i.e., clusters; there is also no need to perform any explicit basis reduction based on lattice symmetry. This circumvents various issues associated with choosing the finite number of clusters to be used for prediction that were mentioned above. Also, promising atom–environment descriptors have been proposed for treating multicomponent and multi-sublattice systems without combinatorial explosion,45 which is an unsolved issue for cluster expansion (we note that graph networks are also promising in this regard50). We can also use the same fingerprint functions for bulk, surface, and interface systems, while cluster expansion requires more clusters for non-bulk systems.56 Another merit is that the nonlinearity of the NNP model can lead to better convergence, in practice, compared to linear models (such as the cluster expansion).45 This may lead to more accurate description of systems with significant lattice relaxation where cluster expansion exhibits some difficulties.18 We will explore this aspect in the following sections. A remaining issue is in the choice of input structures for training, which is just as problematic as other approaches. In fact, we demonstrate a rather spectacular failure when training only on randomly generated configurations and then show how well the active learning approach can solve this issue.

As a side note, we point out that the NNP can be directly trained to predict the energy as a function of the configuration vector σ, as was done in Ref. 57. However, the transferability of the NNP to larger supercells in such a scheme is nontrivial, and thus we advocate the use of atom-centric fingerprint functions. It is also possible to perform nonlinear cluster expansion, i.e., fitting the total energy as a function of cluster correlations using a neural network.58 Yet another successful model is the low rank potential, which expresses the interatomic potential as a low rank decomposition of a tensor whose indices specify the local atomic environment.33,59 We do not claim that the approach presented in this work is an overall better method, as it is difficult to compare overall performance, including calculation speed, accuracy, and ease-of-use on equal footing. However, as noted above, cluster selection or explicit mapping to a tensor-like expression required in other schemes can become intractable with increasing complexity of the system. Thus, we believe that the current approach will enable configurational sampling in complex bulk or interface systems with many components and sublattices, i.e., those systems where it is technically difficult to employ previously proposed approaches.

We demonstrate the above idea on the calculation of the degree of A/B-site inversion in a few oxides with spinel-type structures. Spinel oxides share the general formula AB2O4 with A (B) representing a divalent (trivalent) cation and are of interest in mineralogy as well as technological applications, including magnets, photocatalysis, and battery materials.60–62 In an “ordered” or “normal” spinel, all divalent cations occupy the tetrahedrally coordinated sites, and all trivalent cations occupy the octahedral sites. However, it is known that spinels with various divalent/trivalent cation combinations can exhibit significant deviation from the “normal” occupation, and this is often quantified by the degree of inversion (DOI) defined as the fraction of A-sites (tetrahedral) occupied by B (trivalent) cations. The DOI corresponding to a completely random occupation is 2/3, while a nearly “inverse” occupation with DOI of nearly unity is also known for some compounds. Prediction of the DOI as a function of temperature is of interest because of its potential impact on the magnetic, optical, thermal, and electrochemical properties of the material.

In this work, we calculate the degree of inversion in MgAl2O4, ZnAl2O4, and MgGa2O4 and compare to a series of cluster expansion works by Seko et al.13,19,20,63 The former two are known to be nearly normal, while significant inversion has been reported for MgGa2O4. Spinels constitute a prototypical multivalent system (i.e., A2+ and B3+ share lattice sites) where a naive cluster expansion can lead to large qualitative errors due to overfitting in a small calculation cell; augmentation with a screened point charge model (CE-SPCM) was found to be necessary to obtain accurate predictions without considering rather long-range pair clusters in Ref. 19. Also, the feasibility of avoiding cluster expansion fitting and sampling directly on DFT-relaxed energies was demonstrated in combination with RXMC sampling in Ref. 24 with a 48-cation model. Here, we consider a 192-cation model, which is completely beyond the reach of direct DFT sampling for two reasons: (1) because of the longer time required for each DFT calculation and (2) the much larger configurational space that has to be sampled. Regarding (2), the number of degrees of freedom for the 48 cation model is 48C16 ∼ 2 × 1012, while that for the 192 cation model is 192C64 ∼ 7 × 1051. This explosion in the configuration space should attest to the curse of dimensionality that we are fighting against, although there is actually no need to perform calculations on all of these configurations if good sampling schemes (such as RXMC) are available.

The reference DFT calculations were performed using the Vienna Ab initio Simulation Package (VASP) code.64,65 We employed the projector augmented wave method66 to describe electron–ion interactions. A plane wave basis set with an energy cutoff of 400 eV was used to expand the wave functions. The Brillouin zone was sampled only at the Γ point. The Perdew–Burke–Ernzerhof generalized gradient approximation (GGA-PBE) functional67 was used to approximate the exchange-correlation energy. In this work, we performed relaxation of the lattice vectors as well as the internal coordinates to within 0.04 eV/Å. It should be noted that lattice vector relaxation leads to changes in the calculation mesh density, which means that the effective cutoff energy deviates from the preset value. To alleviate this issue, the relaxations were repeated several times so that the final structures and energies are consistent with the cutoff energy of 400 eV.

The NNP training and subsequent evaluation were performed using Atomic Energy NETwork (ænet) code.45,68 As noted above, the NNP model is trained to predict relaxed energies. This may be represented using a slightly modified version of Eq. (3),

Erel(σ)=iatomsNNPtirel(f[σiRc])for σ{σlattice},
(4)

where Erel(σ) represents the energy after structural relaxation from a starting structure σ, f[σiRc] represents atomic environment descriptors for the starting structure, and NNPtirel represents atomic energy contributions to the relaxed total energy. The training and use of the NNP is restricted to σ{σlattice}, where {σlattice} represents the set of ideal lattice structures with configurational disorder. Equation (4) states that the relaxed energies can be calculated as a sum of atomic contributions, which, in turn, can be calculated from coordinates corresponding to the ideal lattice. This is an ansatz that we are making. In general, it is impossible to prove the exactness of a certain machine learning model in reproducing a physics model, and our approach is no exception. This is in contrast to cluster expansion for which a rigorous proof exists for mapping fully relaxed energies to a lattice-based Hamiltonian.11,15 The ansatz will need to be verified for each system under study, and active learning is an ideal approach in this regard.

We used the fingerprinting scheme by Artrith et al.,45 where the radial and angular distribution functions (RDF and ADF) with an appropriately chosen cutoff are expanded by an orthogonal basis set based on Chebychev polynomials, and the expansion coefficients are fed in to the neural network as the input descriptors. A key point in their scheme is that the structural and compositional descriptors are given separately; the structural descriptor is given by the usual RDF and ADF, and the compositional descriptor is also given by RDF and ADF but with atomic contributions weighted differently for each chemical species. In this work, we expand the RDF with a cutoff of 8.0 Å and expansion order 16 and the ADF with a cutoff of 6.5 Å and expansion order 4. The employed neural network thus has an input layer with 44 nodes,69 and we chose to use two hidden layers with 15 nodes each and the tanh activation (there is also an additional bias node to accommodate arbitrary references for the energy68). The neural network weights were optimized using the bounded limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm.70,71 The training epocs were monitored and terminated when the error on a separate validation set began to increase. We note that it is necessary to evaluate the performance of the model on an additional dataset that is separate from validation data used for early stopping of the training epocs. This is done, in a sense, through the active learning procedure detailed below.

The configurational sampling was performed using the replica exchange Monte Carlo (RXMC) method combined directly with NNP evaluation using ænet. The RXMC method employs several copies, or replicas of the system under study. Each of the replicas are sampled using the usual Metropolis MC algorithm at different temperatures. At preset intervals, temperatures are swapped between replicas according to the Metropolis criterion with the following probabilities,

P=min{1,exp[(βiβj)(EiEj)]}.
(5)

Here, i, j are the replica indices, β is the inverse temperature 1/(kBT), and E is the energy of the replica. Higher temperature replicas are given the task of global surveillance of the energy landscape while lower temperature replicas basically perform local optimization. The swapping of temperatures occurs when a higher temperature replica finds a new energy minimum; local optimization is then performed at the lower temperature. The detailed theory behind RXMC is given in Ref. 27. Our previous studies24,25 may also be helpful for understanding the basic concept. A parallelized code for RXMC calculations in combination with DFT codes as well as ænet is available as a part of ab initio Configuration Sampling Toolkit (abICS) https://github.com/issp-center-dev/abICS. In this work, 15 replicas of the system were sampled in parallel with temperatures ranging from 400 to 1900 K, and temperature exchange was attempted every four steps to speed up the global sampling. Equilibration was performed for 80 000 steps, and then the DOI, which is calculated as the ratio of Al ions on Mg sites, was averaged from 400 000 steps to obtain the temperature dependence.

The NNPs used in the RXMC sampling were refined iteratively using the active learning scheme shown in Fig. 1. We note that a very similar process using an effective pair interaction model was reported recently.72 First, we train an NNP set with randomly generated samples, then use those NNPs to perform RXMC sampling. In subsequent iterations, we sample a preset number of configurations from the previous RXMC run and perform DFT relaxations on those configurations. Comparison of the DFT energies with the NNP predictions serves as the validation procedure for the machine-learning model, although the validation dataset will be biased somewhat as we discuss below. Discrepancies between DFT energies and NNP predictions will signify that the simulation has wandered into previously unlearned regions of configuration space. We then add those DFT results to the training set, retrain the NNPs, and run RXMC sampling again. This process is repeated until convergence is achieved in the physical quantities of interest. In this work, we sampled 300 structures from the first 6400 RXMC steps in each active learning iteration. The final DOI’s were calculated from a separate longer RXMC run with 480 000 RXMC steps (80 000 for equilibration and 400 000 for thermodynamic averaging) as mentioned above.

FIG. 1.

A flowchart of our configurational sampling scheme using active learning.

FIG. 1.

A flowchart of our configurational sampling scheme using active learning.

Close modal

We note that the above procedure biases the training dataset toward thermodynamically relevant configurations predicted by the NNP model in that active learning step. The thermodynamic biasing, in itself, is usually not a bad thing since we are often only concerned with thermodynamically relevant, i.e., realistic configurations. However, the resulting NNPs cannot be expected to perform well when used to predict energies of configurations with both low entropy and high energy, since such configurations are thermodynamically unstable over all temperatures and rarely visited in the RXMC procedure. Such an example in the case of AB2O4 spinels considered in this work would be a phase-separated configuration with “A” cations in one-half of the cell and “B” cations in the other half. If, for some reason, we want to correctly reproduce the energy of such situations, we would have to add such configurations to the dataset, or we may accumulate training data using active learning with varying compositions. The performance of the on-lattice NNP model for varying compositions is tested below.

On the other hand, the fact that we are relying on an inaccurate NNP model to generate the training set during the active learning iterations may pose issues with uncontrolled biasing of the training data. In the case where the model underestimates the energy of certain configurations, the RXMC algorithm at lower temperatures will bias the sampling toward those configurations so that the underestimation is quickly corrected in the subsequent iteration. The biggest foreseeable problem is when the model overestimates the energy. This is covered to some extent by setting a high enough temperature for the RXMC sampling so that configurations with higher predicted energies will still be sampled. It is also noted that RXMC in each iteration is initialized with random configurations, i.e., with an effective temperature of infinity. Ultimately, it is necessary to evaluate if such issues are severe through computational experiments, which we present in the following sections.

We should also mention that we are using the term “active learning” in a rather broad sense that the training set configurations are not given a priori but are chosen according to the given calculation procedure. In our case, this is done through thermodynamic importance sampling as explained above. On the other hand, many works in the active learning MLP literature employ various uncertainty estimates to determine if a structure met during simulation is in an extrapolation region and should be added to the training set.28,36 The efficacy of using such uncertainty indicators to further limit the number of expensive ab initio calculations may be explored in the near future.

We started the active learning process with 300 randomly chosen configurations. DFT relaxation and energy calculations were performed on these configurations, and then an NNP model was trained to predict relaxed energies from the input configurations. The oxygen sublattice is neglected in the training process since the input structures all have identical oxygen coordinates and do not provide any useful information for the prediction. We used 90% of input structures for training and 10% for validation-based early stopping. For the spinel systems studied in this work, we typically obtained mean absolute errors (MAEs) and root mean squared errors (RMSEs) of less than a few meV/cation in the training and validation sets, which is close to the accuracy limitations of the DFT calculations themselves. However, such low fitting error does not guarantee high accuracy in predicting the energies of structures obtained in the RXMC sampling. This is checked by performing DFT relaxations on the structures encountered during RXMC sampling using the NNP. As shown in Fig. 2, the NNP trained on random samples (indicated by blue circles in Fig. 2) shows poor accuracy, especially in predicting the energies of the lower energy configurations. This is because the model was trained on random configurations with higher energies, and the training set did not include lower energy structures with less disorder (please note that the splitting of samples between higher and lower reference DFT energies seen here and in subsequent figures originates from our sampling procedure and is not an inherent property of the system.73). Increasing the number of random samples does not improve the results as we show later. The importance of having configurations with varying degrees of order in the training set has already been pointed out in the cluster expansion literature. A suggested solution was to perform a grouping of input structures based on correlation functions (which roughly corresponds to the degree of disorder), and to perform cross validation on each group during the fitting process.20 Here, we take an alternate approach, i.e., active learning, as was detailed in Sec. II D. The predictions at lower energies progressively improve with the number of active learning iterations because of the thermodynamic biasing mentioned above; the prediction RMSE for the three spinel systems studied here converged within 0.5 meV after the third iteration. This is clearly smaller than ∼10 meV/cation reported in Ref. 19 using cluster expansion, although it may not be a fair comparison because of the difference in the dataset used to calculate prediction errors.

FIG. 2.

Improvement in NNP predictions vs DFT reference energies with the number of active learning iterations in the three spinel oxides: (a) MgAl2O4, (b) ZnAl2O4, and (c) MgGa2O4. RXMC runs are performed using the NNP, and then DFT calculations are performed on structures from the same RXMC run to evaluate the prediction accuracy. The zero reference for the energy is taken to be the DFT energy for the ordered spinel configuration. Data on the diagonal line correspond to high prediction accuracy. The dashed circles indicate energy regions of randomly generated configurations in the first steps of each active learning RXMC iteration.73 

FIG. 2.

Improvement in NNP predictions vs DFT reference energies with the number of active learning iterations in the three spinel oxides: (a) MgAl2O4, (b) ZnAl2O4, and (c) MgGa2O4. RXMC runs are performed using the NNP, and then DFT calculations are performed on structures from the same RXMC run to evaluate the prediction accuracy. The zero reference for the energy is taken to be the DFT energy for the ordered spinel configuration. Data on the diagonal line correspond to high prediction accuracy. The dashed circles indicate energy regions of randomly generated configurations in the first steps of each active learning RXMC iteration.73 

Close modal

Convergence is also attained in the physical quantity of interest, i.e., the DOI, as shown in Fig. 3. For the three spinel oxides treated here, 2 or 3 active learning iterations with 300 DFT relaxation calculations each were sufficient to obtain a converged DOI plot (Fig. 3, purple diamonds), which looks completely different from that predicted using NNPs trained on randomly generated configurations (Fig. 3, blue circles). The final DOI plots are in good agreement with most of the cluster expansion literature as well our previous work employing direct RXMC sampling on DFT energies (Fig. 4). There is, however, a sizable deviation for MgGa2O4, and there is one cluster expansion work for MgAl2O4 that deviates from the others and shows a discontinuous transition around 600 K13 (denoted by “Seko2009” in Fig. 4). As for MgGa2O4, the current work predicts a DOI that is about 0.1 smaller than cluster expansion in Ref. 63. Our result is within the range of most of the available experimental data that report 0.75,74 0.67,75 0.84–0.90,76 and 0.81–0.84,77 although the first work that put forth the idea of an “inverse spinel” reported unity, i.e., complete inversion.78 The discrepancy may come from two sources apart from the difference in the employed models (i.e., cluster expansion and NNP): the supercell size used in the training stage, and that used in the MC sampling stage. The cluster expansion literature has performed ECI fitting in supercells with 24 cation sites or smaller, while they performed MC simulations on supercells with as many as 48 000 cations. Here, both the training and sampling were performed on 192-cation supercells. The impact of these supercell sizes on the results are discussed in more detail in Sec. III B. As for “Seko2009,”13 the discontinuous transition was reported in a later work to be a spurious result due to truncation of the range of pair clusters in the cross validation process.19 

FIG. 3.

The convergence of the calculated temperature-dependence of the degree of inversion vs active learning iterations in the three spinel oxides: (a) MgAl2O4, (b) ZnAl2O4, and (c) MgGa2O4.

FIG. 3.

The convergence of the calculated temperature-dependence of the degree of inversion vs active learning iterations in the three spinel oxides: (a) MgAl2O4, (b) ZnAl2O4, and (c) MgGa2O4.

Close modal
FIG. 4.

The degree of inversion calculated in the present work for the three spinel oxides MgGa2O4 (green), MgAl2O4 (blue), and ZnAl2O4 (orange) in the 192-cation cell whose neural network model was trained in the 192-cation cell (•). The results are also compared to the cluster expansion (CE) literature13,19,63 and our previous work using DFT calculations directly in combination with RXMC24 in a 48-cation model (lines).

FIG. 4.

The degree of inversion calculated in the present work for the three spinel oxides MgGa2O4 (green), MgAl2O4 (blue), and ZnAl2O4 (orange) in the 192-cation cell whose neural network model was trained in the 192-cation cell (•). The results are also compared to the cluster expansion (CE) literature13,19,63 and our previous work using DFT calculations directly in combination with RXMC24 in a 48-cation model (lines).

Close modal

To confirm that the active learning procedure is indeed more efficient than random sampling, we performed DFT relaxation on 1200 randomly generated configurations of MgAl2O4, trained the NNP on those configurations, and compared the results to those from active learning with the same number of DFT calculations [1 random + 3 active learning iterations, which was enough to converge the DOI results as shown in Fig. 3(a)]. The training on the random dataset was performed independently twice with different training/validation data separation and initial neural network weights. As shown in Fig. 5(a), the energy predictions from NNPs trained on random configurations deviate from the DFT value for configurations with lower energies, and it is also clearly less accurate than the NNP trained through the active learning procedure. Again, this is not surprising because lower energy configurations are seldom included in the randomly generated training set. Moreover, the deviation behavior is rather uncontrolled as indicated by different predictions in the energies and the DOIs [Fig. 5(b)] for the two independent training runs; thus, it is virtually impossible to obtain the correct temperature-dependent results with only random generation of the training dataset.

FIG. 5.

(a) Comparison of NNP predictions vs DFT reference energies when the NNP was trained on 1200 configurations generated by the active learning procedure (blue circles) or random sampling (green squares and orange crosses corresponding to two independent training runs on the same randomly generated dataset). The comparison is performed on the dataset taken from the third active learning iteration [same data as Fig. 2(a), red +]. The inset shows all data points, while the main figure shows a magnified version for clarity. (b) The degree of inversion calculated for MgAl2O4 using the three NNPs compared in (a).

FIG. 5.

(a) Comparison of NNP predictions vs DFT reference energies when the NNP was trained on 1200 configurations generated by the active learning procedure (blue circles) or random sampling (green squares and orange crosses corresponding to two independent training runs on the same randomly generated dataset). The comparison is performed on the dataset taken from the third active learning iteration [same data as Fig. 2(a), red +]. The inset shows all data points, while the main figure shows a magnified version for clarity. (b) The degree of inversion calculated for MgAl2O4 using the three NNPs compared in (a).

Close modal

Savings in computer time achieved by the current approach turned out to be quite significant compared to sampling directly on DFT energies. On the Xeon E5-2680v3 central processor unit (CPU), the NNP prediction using 1 CPU core is a few hundred to a few thousand times faster than DFT relaxation using 216 cores depending on the number relaxation steps necessary for convergence. We also note that the number of necessary DFT calculations turned out to be orders of magnitude smaller than the number of NNP evaluations, i.e., RXMC steps, necessary for converging the DOI. A converged DOI was obtained for ZnAl2O4 and MgGa2O4 at the first active learning iteration, while the DOI for MgAl2O4 was converged at the third active learning iteration (Fig. 3), which correspond to 600 and 900 DFT relaxations, respectively. On the other hand, 7200000 NNP evaluations (=480 000 MC steps/replica × 15 replicas) were necessary to obtain a converged DOI vs temperature. The 1000 DFT calculations can be completed within a few hours on a modern supercomputer system since the calculations can be performed completely in parallel. The RXMC calculations took roughly one day using 15 CPU cores, which would fit in one node of a modern workstation. We also note that the RXMC simulations were initialized with random configurations, and yet they still found the single ground state ordered spinel configuration out of 1051 possible configurations for the normal spinels MgAl2O4 and ZnAl2O4. This attests to the effectiveness of modern sampling methods in battling the curse of dimensionality.

We also tested active learning cycles with smaller numbers of DFT calculations to examine if we can attain similar accuracy with less data. The prediction RMSEs for MgAl2O4 were found to degrade slightly; after the first active learning cycle, the RMSEs were 0.9, 1.3, and 1.7 meV/cation for 300, 150, and 75 DFT calculations per cycle (the total number of DFT data is 600, 300, and 150, respectively, including the initial random configurations). The values improved to 0.7, 0.8, and 1.2 meV after the second cycle. This means that we cannot attain RMSEs less than 1 meV with less than about 600 DFT calculations for this system. Nonetheless, the necessary amount of DFT training data should depend on the complexity of the system under study and will need to be tested for each case.

The fact that less than 1000 reference samples were necessary to obtain an accurate NNP model might be surprising, as previous studies on NNPs usually report at least one order of magnitude larger reference sets (see, e.g., Refs. 44 and 68). This is most likely because the input configuration space is restricted to σ{σlattice}, and this is a clear merit of our approach. It should be pointed out, however, that each of our reference samples is a result of DFT relaxation calculations, which comes at an additional cost compared to conventional NNP approaches that can be trained on structures without relaxation. Still, it should be much easier to attain sufficient sample coverage in this restricted configuration space. The restriction also makes the problem easier for the neural network, as it needs to distinguish between much fewer structures. These facts combined with the ability to bypass lattice relaxation make the present approach more suitable for the lattice configuration problem in comparison to conventional continuous-coordinate NNP methods. That is not to say that those methods cannot be applied to systems with configurational disorder. In fact, the Deep Potential approach has been applied to randomly generated high entropy alloys44,79 with promising results, and it can provide vibration properties while the present approach cannot. However, the computational expense for NNP training and simulation of configurational order/disorder behavior will be much larger than our lattice-restricted approach, and they are yet to be applied to this type of calculation.

The usual approach in the NNP and cluster expansion literature is to use a set of many small-scale DFT calculations to fit the model and use that model to accelerate calculations in an expanded supercell. In doing so, one must be careful that (1) the fitting procedure does not lead to overfitting in the small cells at the cost of transferability to larger cells not contained in the training set, and (2) thermodynamically relevant correlations are contained within the training set supercell size. These points are especially important here since spinel oxides are mulitvalent ionic oxides where significant long-range interactions are expected.

In the case of cluster expansion, cross validation is usually performed within the training set to truncate the number of clusters to a computationally tractable amount, but for systems such as MgAl2O4, this procedure can lead to large prediction errors for structures with longer periods than those in the training set.19 This lead to a spurious prediction of a discontinuous order–disorder transition in MgAl2O4 at ∼600 K in Ref. 13 (denoted by “Seko 2009” in Fig. 4); it was found to be necessary to either ignore the truncation procedure and use as many long-distance pair clusters as possible, or to use fewer pair clusters and augment the cluster expansion Hamiltonian with a screened Coulomb term, which lead to prediction of a more continuous transition in Ref. 19 (“Seko 2014” in Fig. 4). It is difficult to see if similar issues may arise in our approach. Although there is no explicit truncation procedure, neural networks are prone to overfitting to the available data; the network might decide that interactions beyond the training set supercell dimensions are unimportant and effectively truncate the range of interactions. Even if it does not, we would have to see if the cutoff radii of 8 Å for the radial descriptor and 6.5 Å for the angular descriptor was long enough for sufficient precision.

To check for such issues, we examined the convergence behavior against the supercell size used for training and MC sampling as follows: First, we performed active learning of the NNP model in a conventional cell of cubic spinel having 24 cation sites in addition to the 192-cation model examined above. Two active learning iterations were performed to converge the results for the 24-cation model. Then, we performed RXMC sampling calculations on the 24-cation, 192-cation, and 648-cation models using those NNP models. This means that we performed six RXMC calculations resulting in DOI data that can be indexed by the number of cation sites in the training and sampling steps as DOI(Ntrain, Nsample). Figure 6 shows the results for MgAl2O4 and MgGa2O4, where the left-hand side plot shows the dependence vs Nsample at Ntrain = 192 and the right-hand side plot shows that against Ntrain with Nsample = 648. It is clear that we have attained convergence against RXMC sampling cell size, as the results for Nsample = 192 and 648 are virtually identical, and Nsample = 24 also results in a decent prediction in the case of MgAl2O4. This is surprisingly small compared to the supercell sizes employed for MC calculations in the cluster expansion literature (48 000 as mentioned above), but we have obtained essentially identical results for MgAl2O4 and ZnAl2O4 as seen in Fig. 4. On the other hand, the convergence with respect to the training supercell size Ntrain is more difficult to confirm. Sizable differences are seen for Ntrain = 24 and 192, especially for MgGa2O4. This shows that the NNP trained in the 24-cation supercell of MgGa2O4 is not capable of reproducing the correct energetics in larger supercells. Now, the ideal way to check whether we have achieved convergence at Ntrain = 192 would be to perform active learning on a larger supercell, but that would be prohibitively expensive in terms of computer resources. As a compromise, we decided to test the accuracy of the Ntrain = 192 NNP model by performing DFT relaxations on a small subset of the 648-cation configurations from the (Ntrain, Nsample) = (192, 648) calculation. As shown in Fig. 7, the Ntrain = 192 NNP turns out to be quite accurate in predicting the energies in the 648-cation cell. Thus, we can be fairly confident that the results are converged with respect to the training set supercell size at 192 cations. The discrepancy of the MgGa2O4 DOI with Ref. 63 is at least partly due to the fact that they employed only up to 24-cation supercells in the fitting of the ECIs.

FIG. 6.

The dependence of the calculated degree of inversions (DOIs) in (a) MgAl2O4 and (b) MgGa2O4 vs training cell size (denoted by the number of cations in the training cell Ntrain) and MC sampling cell size (denoted by the number of cations in the sampling cell Nsample).

FIG. 6.

The dependence of the calculated degree of inversions (DOIs) in (a) MgAl2O4 and (b) MgGa2O4 vs training cell size (denoted by the number of cations in the training cell Ntrain) and MC sampling cell size (denoted by the number of cations in the sampling cell Nsample).

Close modal
FIG. 7.

Correspondence between NNP prediction and DFT results for 648-cation cell MgGa2O4 configurations when the NNP was trained in the 192-cation cell.

FIG. 7.

Correspondence between NNP prediction and DFT results for 648-cation cell MgGa2O4 configurations when the NNP was trained in the 192-cation cell.

Close modal

These results show that regardless of the model used (NNP or cluster expansion), the training set needs to include structures with periods that are long enough for the system under study. Also, if the model includes hyperparameters pertaining to the range of interactions, then that must be set long enough as well. For spinel oxide compounds examined here, the periods were found to be rather long but still tractable with the help of modern-day supercomputers. This may be surprising because our NNP model (or cluster expansion, for that matter) does not consider truly long-range Coulomb interactions along the lines of, e.g., Ref. 80, as all explicit interactions are truncated by the cutoff in the descriptors. However, our results suggest that the cutoff is long enough, especially for lower energy or higher entropy structures encountered during the Monte Carlo procedure that should have relatively small polarization. The structural relaxation should also reduce the required cutoff due to dielectric screening.

One point worth stressing is that the success of the present approach in predicting the relaxed energies from ideal lattice structures is not a trivial consequence of negligibly small relaxation energies in this system. As shown in Fig. 8 for MgAl2O4, the relaxation energies are roughly one order of magnitude larger than energy differences between relaxed configurations (notice the difference in scales of the horizontal and vertical axes). Also, although the relaxation energies are correlated with energies before relaxation, the correspondence is not one-to-one and the scatter is as large as 50 meV/cation. This is clearly significant when comparing against relaxed energy differences between distinct configurations.

FIG. 8.

Energies before relaxation vs fully relaxed energies for samples taken from the active learning procedure on MgAl2O4. The solid line is a guide for the eye and corresponds to zero relaxation energy.

FIG. 8.

Energies before relaxation vs fully relaxed energies for samples taken from the active learning procedure on MgAl2O4. The solid line is a guide for the eye and corresponds to zero relaxation energy.

Close modal

We also examine the magnitude of relaxation, as it has been shown to be correlated with prediction errors in the cluster expansion literature.18 To quantify the amount of relaxation in the three spinel systems under equal footing, we use the normalized displacement, which we define as

ND=ΔdV01/3,
(6)

where Δd is the atom displacement upon relaxation and V0 is the volume per atom after volume relaxation; V01/3 can be interpreted as an average lattice parameter, so the normalized displacement defined in this manner is an indicator of local strain. As shown in Fig. 9, the relaxation in the three spinel oxides are far from negligible. A significant number of atoms show ND’s larger than 5%, and the maximum ND reaches ∼19% for ZnAl2O4. The correlation between relaxation and prediction errors are shown in Fig. 10, where the prediction RMSEs for the three systems are plotted vs mean normalized displacement (MND), which is calculated by averaging the normalized displacements over all atoms from all configurations in the dataset (Fig. 9). Active learning calculations without relaxation were also performed on MgAl2O4 to examine the prediction errors not attributed to relaxation. As a result, we see an increasing trend in the errors vs MND, suggesting that relaxation does lead to higher errors. However, the RMSEs are all less than 0.5 meV per atom, which is as good as we can reasonably expect when considering numerical noise in the relaxation process.

FIG. 9.

Histograms of normalized displacements of atoms upon relaxation in the spinel oxides under study. The displacements are calculated by performing DFT relaxation on 300 structures from the fourth (final) active learning iteration, and the histograms are generated from all atoms in those structures.

FIG. 9.

Histograms of normalized displacements of atoms upon relaxation in the spinel oxides under study. The displacements are calculated by performing DFT relaxation on 300 structures from the fourth (final) active learning iteration, and the histograms are generated from all atoms in those structures.

Close modal
FIG. 10.

Prediction RMSEs vs mean normalized displacements due to relaxation in the spinel oxides calculated for the final active learning iteration.

FIG. 10.

Prediction RMSEs vs mean normalized displacements due to relaxation in the spinel oxides calculated for the final active learning iteration.

Close modal

We also note that the normalized mean squared displacement (NMSD) was suggested in Ref. 18 as a relaxation metric that correlates somewhat with prediction errors, and they heuristically categorized NMSD values into “Good,” “Maybe,” and “Bad” regions in terms of expected reliability and quality of cluster expansion fitting. The NMSDs for MgAl2O4, ZnAl2O4, and MgGa2O4 calculated from the active learning procedure are 0.158%, 0.156%, and 0.147%, respectively, and they would all fall into the “Maybe” category. The fact that we were able to obtain RMSEs of less than 0.5 meV/atom for those systems indicates that our neural network scheme is at least as robust as cluster expansion against relaxation. Ultimately, we would need to accumulate experience in applying this approach to a wider variety of systems, but that is beyond the scope of this work.

As noted in Sec. II C, we designed our NNP model to be trained and used only on coordinates of the ideal lattice σ{σlattice}. However, there is no way to specify the exact lattice coordinate due to the nature of floating point numbers used to represent non-integer values in computers. Thus, to see how robust our method is against deviations in the input coordinates, we examined the predictions of our final NNP model for randomly perturbed lattices (Fig. 11). We find that the predictions are virtually indistinguishable from predictions on the corresponding perfect lattice structures (that is, perfect within the numerical precision of our software stack) when each atom is perturbed by 0.01 Å. The NNP looks useable up to a perturbation of 0.05 Å, while at 0.1 Å, we start to observe sizable deviations. 0.01 Å is not that small, and it should not be too difficult to make sure that the errors in the input coordinates fall within this range. The robustness may originate from the fact that the NNP maps similar input to similar energies and because there are no other similar structures to choose from in the training set space except the corresponding ideal lattice structure.

FIG. 11.

Predicted NNP energies when the input coordinates of all atoms are perturbed in random directions with a magnitude of 0.1 Å (blue circles), 0.05 Å (orange crosses), and 0.01 Å (green triangles). The results are plotted against the predictions using the unperturbed ideal lattice coordinates as input.

FIG. 11.

Predicted NNP energies when the input coordinates of all atoms are perturbed in random directions with a magnitude of 0.1 Å (blue circles), 0.05 Å (orange crosses), and 0.01 Å (green triangles). The results are plotted against the predictions using the unperturbed ideal lattice coordinates as input.

Close modal

Being able to predict energies at varying compositions with a single NNP model is necessary if the model is used for grand canonical simulations or if phase separation occurs in the system of interest as mentioned in Sec. II D. Thus, we test the capability of our model to reproduce energies of varying compositions, and we also examine the transferability to compositions that are not in the training set by varying the composition during active learning cycles.81 First, we trained the NNP model for stoichiometric MgAl2O4 (i.e., Mg64Al128O256) in the same manner as in Sec. III A with one random cycle and two active learning cycles (Fig. 12 AL0–AL2). The model was then used for RXMC sampling of Mg1.125Al1.875O4 (Mg72Al120O256), and a subset of the samples were recalculated using DFT and compared with the NNP predictions as shown in Fig. 12 AL3. There is a systematic error in the total energy prediction, but the error is corrected after these configurations are added to the training dataset (Fig. 12 AL4). The stoichiometry was then changed to Mg1.0625Al1.9374O4 (Mg68Al124O256) that lies in the middle of the previous compositions, and the prediction accuracy is quite high even before this composition is added to the training set (Fig. 12 AL5). This means that our model is capable of interpolation between compositions. Finally, we checked that the resulting model can reproduce energies of all three compositions as shown in Fig. 12(b).

FIG. 12.

(a) NNP predictions compared to DFT energies as compositions are changed between active learning cycles and (b) NNP predictions compared to DFT energies for training and validation data with varying compositions. The initial random cycle and two subsequent active learning cycles were performed for the stoichiometric Mg64Al128O256 (MgAl2O4) composition (AL0–AL2), then the composition was switched to Mg72Al120O256 (Mg1.125Al1.875O4) for two cycles (AL3–AL4), after which the composition was changed to Mg68Al124O256 (Mg1.0625Al1.9374O4) for the final RXMC sampling and model testing (AL5). Raw total energies are plotted unlike previous figures because there is no common ordered configuration among varying compositions.

FIG. 12.

(a) NNP predictions compared to DFT energies as compositions are changed between active learning cycles and (b) NNP predictions compared to DFT energies for training and validation data with varying compositions. The initial random cycle and two subsequent active learning cycles were performed for the stoichiometric Mg64Al128O256 (MgAl2O4) composition (AL0–AL2), then the composition was switched to Mg72Al120O256 (Mg1.125Al1.875O4) for two cycles (AL3–AL4), after which the composition was changed to Mg68Al124O256 (Mg1.0625Al1.9374O4) for the final RXMC sampling and model testing (AL5). Raw total energies are plotted unlike previous figures because there is no common ordered configuration among varying compositions.

Close modal

Finally, we note that for truly complicated systems, local frustrations may lead to several local minima when starting the relaxation from ideal lattice structures. This means that the implicitly assumed one-to-one correspondence between lattice configuration and relaxed energy is broken. In such cases, it may be advisable to perform the active learning scheme for a few iterations, then use the final NNP in nested Monte Carlo82–87 or upsampling schemes,88 where MC or MD steps are carried out with low cost potentials then augmented with DFT calculations at preset intervals. These schemes are guaranteed to converge to the same result as sampling based only on DFT energies, and the speedup will depend on the quality of the low cost potential. The frustration problem mentioned above can be solved by nested DFT relaxations during the MC sampling: by randomly perturbing the structure slightly before relaxation, the system has a chance of visiting any of those local minima. This is in the spirit of basin-hopping MC.89 We plan on implementing and testing such nesting schemes in the near future.

In this work, we proposed a new method to obtain lightweight models that can be used to sample millions of configurations in the ab initio lattice configuration problem: Behler–Parinello NNPs are trained to predict relaxed energies from perfect-lattice structures with configurational disorder, and the NNPs are iteratively refined using an active learning scheme involving several RXMC runs. The idea was tested successfully in calculating the degree of A/B site inversion in three spinel oxides: MgAl2O4, ZnAl2O4, and MgGa2O4. Merits compared to conventional continuous-coordinate NNP models include the ability to bypass structural relaxation, as well as the restriction on the input coordinate space that makes machine learning much easier. This approach may also be considered a “lazy” alternative to cluster expansion; no explicit cluster selection is necessary, and it also avoids the combinatorial explosion in the amount of computation that makes cluster expansion practically unusable for many-component systems with multiple sublattices. We also demonstrated that the scheme is at least as robust as cluster expansion with respect to magnitude of local relaxation, and also in treating systems with long-range interactions. Due to the merits outlined above, we believe that this approach will enable configurational sampling in complex bulk or interface systems where it is technically difficult, if not impossible, to apply other previous approaches.

The software used in this work (abICS) was developed, in part, by the “Project for advancement of software usability in materials science” by the Institute for Solid State Physics, the University of Tokyo, and the calculations were performed on the joint-use supercomputer system at the same Institute. This research was supported by the “Program for Promoting Research on the Supercomputer Fugaku” (Fugaku Battery & Fuel Cell Project) and JST CREST Program (Grant No. JPMJCR18J3, Japan). S.K. acknowledges funding from the Japan Society for the Promotion of Science (JSPS) KAKENHI, Grant No. 20H05284 (Grant-in-Aid for Scientific Research on Innovative Areas “Interface IONICS”), and support from the JST FOREST Program (Grant No. JPMJFR2037, Japan). T.O. was funded by JSPS KAKENHI Grant No. 19H05792 (Grant-in-Aid for Scientific Research on Innovative Areas “Crystal Defect Core”) and JSPS KAKENHI Grant No. 21K04648 [Grain-in-Aid for Scientific Research(C)]. Plots of the data were generated using Matplotlib90 and seaborn.91 

The authors have no conflicts to disclose.

Shusuke Kasamatsu: Conceptualization (equal); Data curation (lead); Formal analysis (lead); Funding acquisition (lead); Investigation (lead); Methodology (equal); Project administration (lead); Resources (lead); Software (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). Yuichi Motoyama: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Kazuyoshi Yoshimi: Software (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). Ushio Matsumoto: Investigation (equal); Methodology (equal); Validation (equal); Writing – original draft (equal). Akihide Kuwabara: Funding acquisition (equal); Supervision (equal); Writing – original draft (equal). Takafumi Ogawa: Conceptualization (equal); Funding acquisition (equal); Methodology (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

The dataset of configurations and corresponding energies for MgAl2O4, ZnAl2O4, and MgGa2O4 as well as trained models that can be used for energy evaluation using aenet code are available at ISSP Data Repository https://isspns-gitlab.issp.u-tokyo.ac.jp/abics-dev/abics-gallery. Other data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
R.
Zhang
,
S.
Zhao
,
J.
Ding
,
Y.
Chong
,
T.
Jia
,
C.
Ophus
,
M.
Asta
,
R. O.
Ritchie
, and
A. M.
Minor
, “
Short-range order and its impact on the CrCoNi medium-entropy alloy
,”
Nature
581
,
283
287
(
2020
).
2.
V.
Ponec
, “
Selectivity in catalysis by alloys
,”
Catal. Rev.: Sci. Eng.
11
,
41
70
(
1975
).
3.
L.
Xiong
and
A.
Manthiram
, “
Effect of atomic ordering on the catalytic activity of carbon supported PtM (M = Fe, Co, Ni, and Cu) alloys for oxygen reduction in PEMFCs
,”
J. Electrochem. Soc.
152
,
A697
(
2005
).
4.
F.
Takeiri
,
A.
Watanabe
,
K.
Okamoto
,
D.
Bresser
,
S.
Lyonnard
,
B.
Frick
,
A.
Ali
,
Y.
Imai
,
M.
Nishikawa
,
M.
Yonemura
,
T.
Saito
,
K.
Ikeda
,
T.
Otomo
,
T.
Kamiyama
,
R.
Kanno
, and
G.
Kobayashi
, “
Hydride-ion-conducting K2NiF4-type Ba–Li oxyhydride solid electrolyte
,”
Nat. Mater.
21
,
325
330
(
2022
).
5.
R. J.
Clément
,
Z.
Lun
, and
G.
Ceder
, “
Cation-disordered rocksalt transition metal oxides and oxyfluorides for high energy lithium-ion cathodes
,”
Energy Environ. Sci.
13
,
345
373
(
2020
).
6.
R.
Hu
,
S.
Iwamoto
,
L.
Feng
,
S.
Ju
,
S.
Hu
,
M.
Ohnishi
,
N.
Nagai
,
K.
Hirakawa
, and
J.
Shiomi
, “
Machine-learning-optimized aperiodic superlattice minimizes coherent phonon heat conduction
,”
Phys. Rev. X
10
,
021050
(
2020
).
7.
J.
Hafner
,
C.
Wolverton
, and
G.
Ceder
, “
Toward computational materials design: The impact of density functional theory of materials research
,”
MRS Bull.
31
,
659
668
(
2006
).
8.
G.
Ceder
, “
Opportunities and challenges for first-principles materials design and applications to Li battery materials
,”
MRS Bull.
35
,
693
701
(
2010
).
9.
J.
Neugebauer
and
T.
Hickel
, “
Density functional theory in materials science
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
3
,
438
448
(
2013
).
10.
J. M.
Sanchez
,
F.
Ducastelle
, and
D.
Gratias
, “
Generalized cluster description of multicomponent systems
,”
Physica A
128
,
334
350
(
1984
).
11.
G.
Ceder
, “
A derivation of the Ising model for the computation of phase diagrams
,”
Comput. Mater. Sci.
1
,
144
150
(
1993
).
12.
A.
van de Walle
,
M.
Asta
, and
G.
Ceder
, “
The alloy theoretic automated toolkit: A user guide
,”
Calphad
26
,
539
553
(
2002
).
13.
A.
Seko
,
Y.
Koyama
, and
I.
Tanaka
, “
Cluster expansion method for multicomponent systems based on optimal selection of structures for density-functional theory calculations
,”
Phys. Rev. B
80
,
165122
(
2009
).
14.
J. M.
Sanchez
, “
Cluster expansion and the configurational theory of alloys
,”
Phys. Rev. B
81
,
224202
(
2010
).
15.
J. M.
Sanchez
, “
Foundations and practical implementations of the cluster expansion
,”
J. Phase Equilib. Diffus.
38
,
238
251
(
2017
).
16.
Q.
Wu
,
B.
He
,
T.
Song
,
J.
Gao
, and
S.
Shi
, “
Cluster expansion method and its application in computational materials science
,”
Comput. Mater. Sci.
125
,
243
254
(
2016
).
17.
J. H.
Chang
,
D.
Kleiven
,
M.
Melander
,
J.
Akola
,
J. M.
Garcia-Lastra
, and
T.
Vegge
, “
CLEASE: A versatile and user-friendly implementation of cluster expansion method
,”
J. Phys.: Condens. Matter
31
,
325901
(
2019
).
18.
A. H.
Nguyen
,
C. W.
Rosenbrock
,
C. S.
Reese
, and
G. L. W.
Hart
, “
Robustness of the cluster expansion: Assessing the roles of relaxation and numerical error
,”
Phys. Rev. B
96
,
014107
(
2017
).
19.
A.
Seko
and
I.
Tanaka
, “
Cluster expansion of multicomponent ionic systems with controlled accuracy: Importance of long-range interactions in heterovalent ionic systems
,”
J. Phys.: Condens. Matter
26
,
115403
(
2014
).
20.
A.
Seko
and
I.
Tanaka
, “
Grouping of structures for cluster expansion of multicomponent systems with controlled accuracy
,”
Phys. Rev. B
83
,
224111
(
2011
).
21.
Z.
Leong
and
T. L.
Tan
, “
Robust cluster expansion of multicomponent systems using structured sparsity
,”
Phys. Rev. B
100
,
134108
(
2019
).
22.
S. N.
Khan
and
M.
Eisenbach
, “
Density-functional Monte-Carlo simulation of CuZn order-disorder transition
,”
Phys. Rev. B
93
,
024203
(
2016
).
23.
R. B.
Wexler
,
T.
Qiu
, and
A. M.
Rappe
, “
Automatic prediction of surface phase diagrams using ab initio grand canonical Monte Carlo
,”
J. Phys. Chem. C
123
,
2321
2328
(
2019
).
24.
S.
Kasamatsu
and
O.
Sugino
, “
Direct coupling of first-principles calculations with replica exchange Monte Carlo sampling of ion disorder in solids
,”
J. Phys. Condens. Matter
31
,
085901
(
2019
).
25.
S.
Kasamatsu
,
O.
Sugino
,
T.
Ogawa
, and
A.
Kuwabara
, “
Dopant arrangements in Y-doped BaZrO3 under processing conditions and their impact on proton conduction: A large-scale first-principles thermodynamics study
,”
J. Mater. Chem. A
8
,
12674
12686
(
2020
).
26.
F.
Wang
and
D. P.
Landau
, “
Efficient, multiple-range random walk algorithm to calculate the density of states
,”
Phys. Rev. Lett.
86
,
2050
2053
(
2001
).
27.
K.
Hukushima
and
K.
Nemoto
, “
Exchange Monte Carlo method and application to spin glass simulations
,”
J. Phys. Soc. Jpn.
65
,
1604
1608
(
1996
).
28.
E. V.
Podryabinkin
and
A. V.
Shapeev
, “
Active learning of linearly parametrized interatomic potentials
,”
Comput. Mater. Sci.
140
,
171
180
(
2017
).
29.
E. V.
Podryabinkin
,
E. V.
Tikhonov
,
A. V.
Shapeev
, and
A. R.
Oganov
, “
Accelerating crystal structure prediction by machine-learning interatomic potentials with active learning
,”
Phys. Rev. B
99
,
064114
(
2019
).
30.
K.
Gubaev
,
E. V.
Podryabinkin
,
G. L. W.
Hart
, and
A. V.
Shapeev
, “
Accelerating high-throughput searches for new alloys with active learning of interatomic potentials
,”
Comput. Mater. Sci.
156
,
148
156
(
2019
).
31.
J. S.
Smith
,
B.
Nebgen
,
N.
Lubbers
,
O.
Isayev
, and
A. E.
Roitberg
, “
Less is more: Sampling chemical space with active learning
,”
J. Chem. Phys.
148
,
241733
(
2018
).
32.
L.
Zhang
,
D.-Y.
Lin
,
H.
Wang
,
R.
Car
, and
W.
E
, “
Active learning of uniformly accurate interatomic potentials for materials simulation
,”
Phys. Rev. Mater.
3
,
023804
(
2019
).
33.
T.
Kostiuchenko
,
F.
Körmann
,
J.
Neugebauer
, and
A.
Shapeev
, “
Impact of lattice relaxations on phase transitions in a high-entropy alloy studied by machine-learning potentials
,”
npj Comput. Mater.
5
,
55
(
2019
).
34.
T. D.
Loeffler
,
T. K.
Patra
,
H.
Chan
,
M.
Cherukara
, and
S. K. R. S.
Sankaranarayanan
, “
Active learning the potential energy landscape for water clusters from sparse training data
,”
J. Phys. Chem. C
124
,
4907
4916
(
2020
).
35.
J.
Vandermause
,
S. B.
Torrisi
,
S.
Batzner
,
Y.
Xie
,
L.
Sun
,
A. M.
Kolpak
, and
B.
Kozinsky
, “
On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events
,”
npj Comput. Mater.
6
,
20
(
2020
).
36.
R.
Jinnouchi
,
K.
Miwa
,
F.
Karsai
,
G.
Kresse
, and
R.
Asahi
, “
On-the-fly active learning of interatomic potentials for large-scale atomistic simulations
,”
J. Phys. Chem. Lett.
11
,
6946
6955
(
2020
).
37.
Y.
Zuo
,
C.
Chen
,
X.
Li
,
Z.
Deng
,
Y.
Chen
,
J.
Behler
,
G.
Csányi
,
A. V.
Shapeev
,
A. P.
Thompson
,
M. A.
Wood
, and
S. P.
Ong
, “
Performance and cost assessment of machine learning interatomic potentials
,”
J. Phys. Chem. A
124
,
731
745
(
2020
).
38.
J.
Behler
and
M.
Parrinello
, “
Generalized neural-network representation of high-dimensional potential-energy surfaces
,”
Phys. Rev. Lett.
98
,
146401
(
2007
).
39.
J.
Behler
, “
Atom-centered symmetry functions for constructing high-dimensional neural network potentials
,”
J. Chem. Phys.
134
,
074106
(
2011
).
40.
J.
Behler
, “
Constructing high-dimensional neural network potentials: A tutorial review
,”
Int. J. Quantum Chem.
115
,
1032
1050
(
2015
).
41.
S.
Watanabe
,
W.
Li
,
W.
Jeong
,
D.
Lee
,
K.
Shimizu
,
E.
Mimanitani
,
Y.
Ando
, and
S.
Han
, “
High-dimensional neural network atomic potentials for examining energy materials: Some recent simulations
,”
J. Phys.: Energy
3
,
012003
(
2020
).
42.
A. P.
Bartók
,
R.
Kondor
, and
G.
Csányi
, “
On representing chemical environments
,”
Phys. Rev. B
87
,
184115
(
2013
).
43.
L.
Zhang
,
J.
Han
,
H.
Wang
,
R.
Car
, and
W.
E
, “
Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics
,”
Phys. Rev. Lett.
120
,
143001
(
2018
).
44.
L.
Zhang
,
J.
Han
,
H.
Wang
,
W.
Saidi
,
R.
Car
, and
W.
E
, “
End-to-end symmetry preserving inter-atomic potential energy model for finite and extended systems
,”
Adv. Neural. Inf. Process Syst.
31
,
4436
4446
(
2018
); available at https://papers.nips.cc/paper/2018/hash/e2ad76f2326fbc6b56a45a56c59fafdb-Abstract.html.
45.
N.
Artrith
,
A.
Urban
, and
G.
Ceder
, “
Efficient and accurate machine-learning interpolation of atomic energies in compositions with many species
,”
Phys. Rev. B
96
,
014112
(
2017
).
46.
T.
Xie
and
J. C.
Grossman
, “
Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties
,”
Phys. Rev. Lett.
120
,
145301
(
2018
).
47.
K. T.
Schütt
,
H. E.
Sauceda
,
P.-J.
Kindermans
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
SchNet—A deep learning architecture for molecules and materials
,”
J. Chem. Phys.
148
,
241722
(
2018
).
48.
C.
Chen
,
W.
Ye
,
Y.
Zuo
,
C.
Zheng
, and
S. P.
Ong
, “
Graph networks as a universal machine learning framework for molecules and crystals
,”
Chem. Mater.
31
,
3564
3572
(
2019
).
49.
S.
Takamoto
,
S.
Izumi
, and
J.
Li
, “
TeaNet: Universal neural network interatomic potential inspired by iterative electronic relaxations
,”
Comput. Mater. Sci.
207
,
111280
(
2022
).
50.
S.
Takamoto
,
C.
Shinagawa
,
D.
Motoki
,
K.
Nakago
,
W.
Li
,
I.
Kurata
,
T.
Watanabe
,
Y.
Yayama
,
H.
Iriguchi
,
Y.
Asano
,
T.
Onodera
,
T.
Ishii
,
T.
Kudo
,
H.
Ono
,
R.
Sawada
,
R.
Ishitani
,
M.
Ong
,
T.
Yamaguchi
,
T.
Kataoka
,
A.
Hayashi
,
N.
Charoenphakdee
, and
T.
Ibuka
, “
Towards universal neural network potential for material discovery applicable to arbitrary combination of 45 elements
,”
Nat. Commun.
13
,
2991
(
2022
).
51.
W.
Li
,
Y.
Ando
, and
S.
Watanabe
, “
Cu diffusion in amorphous Ta2O5 studied with a simplified neural network potential
,”
J. Phys. Soc. Jpn.
86
,
104004
(
2017
).
52.
C. L.
Zitnick
,
L.
Chanussot
,
A.
Das
,
S.
Goyal
,
J.
Heras-Domingo
,
C.
Ho
,
W.
Hu
,
T.
Lavril
,
A.
Palizhati
,
M.
Riviere
,
M.
Shuaibi
,
A.
Sriram
,
K.
Tran
,
B.
Wood
,
J.
Yoon
,
D.
Parikh
, and
Z.
Ulissi
, “
An introduction to electrocatalyst design using machine learning for renewable energy storage
,” arXiv:2010.09435 [cond-mat] (
2020
).
53.
L.
Chanussot
,
A.
Das
,
S.
Goyal
,
T.
Lavril
,
M.
Shuaibi
,
M.
Riviere
,
K.
Tran
,
J.
Heras-Domingo
,
C.
Ho
,
W.
Hu
,
A.
Palizhati
,
A.
Sriram
,
B.
Wood
,
J.
Yoon
,
D.
Parikh
,
C. L.
Zitnick
, and
Z.
Ulissi
, “
Open catalyst 2020 (OC20) dataset and community challenges
,”
ACS Catal.
11
,
6059
6072
(
2021
).
54.
Y.
Zuo
,
M.
Qin
,
C.
Chen
,
W.
Ye
,
X.
Li
,
J.
Luo
, and
S. P.
Ong
, “
Accelerating materials discovery with Bayesian optimization and graph deep learning
,”
Mater. Today
51
,
126
135
(
2021
).
55.
J. B.
Gibson
,
A. C.
Hire
, and
R. G.
Hennig
, “
Data-augmentation for graph neural network learning of the relaxed energies of unrelaxed structures
,” arXiv:2202.13947 [physics] (
2022
).
56.
K.
Yuge
,
A.
Seko
,
A.
Kuwabara
,
F.
Oba
, and
I.
Tanaka
, “
Ordering and segregation of a Cu75Pt25 (111) surface: A first-principles cluster expansion study
,”
Phys. Rev. B
76
,
045407
(
2007
).
57.
H.
Ji
and
Y.
Jung
, “
Artificial neural network for the configuration problem in solids
,”
J. Chem. Phys.
146
,
064103
(
2017
).
58.
A. R.
Natarajan
and
A.
Van der Ven
, “
Machine-learning the configurational energy of multicomponent crystalline solids
,”
npj Comput. Mater.
4
,
56
(
2018
).
59.
A.
Shapeev
, “
Accurate representation of formation energies of crystalline alloys with many components
,”
Comput. Mater. Sci.
139
,
26
30
(
2017
).
60.
C.
Biagioni
and
M.
Pasero
, “
The systematics of the spinel-type minerals: An overview
,”
Am. Mineral.
99
,
1254
1264
(
2014
).
61.
M. M.
Thackeray
, “
Exploiting the spinel structure for Li-ion battery applications: A tribute to John B. Goodenough
,”
Adv. Energy Mater.
11
,
2001117
(
2021
).
62.
S. B.
Narang
and
K.
Pubby
, “
Nickel spinel ferrites: A review
,”
J. Magn. Magn. Mater.
519
,
167163
(
2021
).
63.
A.
Seko
,
K.
Yuge
,
F.
Oba
,
A.
Kuwabara
, and
I.
Tanaka
, “
Prediction of ground-state structures and order-disorder phase transitions in II-III spinel oxides: A combined cluster-expansion method and first-principles study
,”
Phys. Rev. B
73
,
184117
(
2006
).
64.
G.
Kresse
and
J.
Furthmüller
, “
Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set
,”
Phys. Rev. B
54
,
11169
11186
(
1996
).
65.
G.
Kresse
and
J.
Furthmüller
,
Comput. Mater. Sci.
6
,
15
(
1996
).
66.
P. E.
Blöchl
, “
Projector augmented-wave method
,”
Phys. Rev. B
50
,
17953
17979
(
1994
).
67.
J. P.
Perdew
,
K.
Burke
, and
M.
Ernzerhof
, “
Generalized gradient approximation made simple
,”
Phys. Rev. Lett.
77
,
3865
3868
(
1996
).
68.
N.
Artrith
and
A.
Urban
, “
An implementation of artificial neural-network potentials for atomistic materials simulations: Performance for TiO2
,”
Comput. Mater. Sci.
114
,
135
150
(
2016
).
69.

We have 16 coefficients for RDF and 4 for ADF resulting in 20 descriptors, and this is multiplied by 2 since we have separate RDFs/ADFs for composition and structure.

70.
R. H.
Byrd
,
P.
Lu
,
J.
Nocedal
, and
C.
Zhu
, “
A limited memory algorithm for bound constrained optimization
,”
SIAM J. Sci. Comput.
16
,
1190
1208
(
1995
).
71.
C.
Zhu
,
R. H.
Byrd
,
P.
Lu
, and
J.
Nocedal
, “
Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization
,”
ACM Trans. Math. Software
23
,
550
560
(
1997
).
72.
X.
Liu
,
J.
Zhang
,
J.
Yin
,
S.
Bi
,
M.
Eisenbach
, and
Y.
Wang
, “
Monte Carlo simulation of order-disorder transition in refractory high entropy alloys: A data-driven approach
,”
Comput. Mater. Sci.
187
,
110135
(
2021
).
73.

The splitting of samples between higher and lower energies with few samples in the interval of e.g., ∼50–75 meV/cation for MgAl2O4 originates from our sampling procedure: we always sampled the first step in the RXMC calculations which is generated randomly, and most of the replicas relax quickly to lower energy configurations before the next sampling step.

74.
F.
Machatschki
, “
Der magnesium-gallium-spinell
,”
Z. Kristallogr. - Cryst. Mater.
82
,
348
354
(
1932
).
75.
M.
Huber
,
C. R. Acad. Sci. Paris
244
,
2524
(
1957
).
76.
H.
Schmalzried
, “
Röntgenographische untersuchung der kationenverteilung in spinellphasen
,”
Z. Phys. Chem.
28
,
203
219
(
1961
).
77.
J. E.
Weidenborner
,
N. R.
Stemple
, and
Y.
Okaya
, “
Cation distribution and oxygen parameter in magnesium gallate, MgGa2O4
,”
Acta Crystallogr.
20
,
761
764
(
1966
).
78.
T. F. W.
Barth
and
E.
Posnjak
, “
Spinel structures: With and without variate atom equipoints
,”
Z. Kristallogr. - Cryst. Mater.
82
,
325
341
(
1932
).
79.
F.-Z.
Dai
,
B.
Wen
,
Y.
Sun
,
H.
Xiang
, and
Y.
Zhou
, “
Theoretical prediction on thermal and mechanical properties of high entropy (Zr0.2Hf0.2Ti0.2Nb0.2Ta0.2)C by deep learning potential
,”
J. Mater. Sci. Technol.
43
,
168
174
(
2020
).
80.
N.
Artrith
,
T.
Morawietz
, and
J.
Behler
, “
High-dimensional neural-network potentials for multicomponent systems: Applications to zinc oxide
,”
Phys. Rev. B
83
,
153101
(
2011
).
81.

Note that limited composition variation is expected for spinel oxides under normal experimental conditions, so MgAl2O4 is used here just as a proof-of-concept.

82.
R.
Iftimie
,
D.
Salahub
,
D.
Wei
, and
J.
Schofield
, “
Using a classical potential as an efficient importance function for sampling from an ab initio potential
,”
J. Chem. Phys.
113
,
4852
(
2000
).
83.
B.
Hetényi
,
K.
Bernacki
, and
B. J.
Berne
, “
Multiple ‘time step’ Monte Carlo
,”
J. Chem. Phys.
117
,
8203
8207
(
2002
).
84.
L. D.
Gelb
, “
Monte Carlo simulations using sampling from an approximate potential
,”
J. Chem. Phys.
118
,
7747
7750
(
2003
).
85.
J.
Leiding
and
J. D.
Coe
, “
Reactive Monte Carlo sampling with an ab initio potential
,”
J. Chem. Phys.
144
,
174109
(
2016
).
86.
Y.
Nagai
,
M.
Okumura
, and
A.
Tanaka
, “
Self-learning Monte Carlo method with Behler-Parrinello neural networks
,”
Phys. Rev. B
101
,
115111
(
2020
).
87.
Y.
Nagai
,
M.
Okumura
,
K.
Kobayashi
, and
M.
Shiga
, “
Self-learning hybrid Monte Carlo: A first-principles approach
,”
Phys. Rev. B
102
,
041124(R)
(
2020
).
88.
X.
Zhang
,
B.
Grabowski
,
T.
Hickel
, and
J.
Neugebauer
, “
Calculating free energies of point defects from ab initio
,”
Comput. Mater. Sci.
148
,
249
259
(
2018
).
89.
D. J.
Wales
and
J. P.
Doye
, “
Global optimization by basin-hopping and the lowest energy structures of Lennard-Jones clusters containing up to 110 atoms
,”
J. Phys. Chem. A
101
,
5111
5116
(
1997
).
90.
J. D.
Hunter
, “
Matplotlib: A 2D graphics environment
,”
Comput. Sci. Eng.
9
,
90
95
(
2007
).
91.
M. L.
Waskom
, “
Seaborn: statistical data visualization
,”
J. Open Source Software
6
,
3021
(
2021
).