Thermal properties of materials are essential to many applications of thermal electronic devices. Density functional theory (DFT) has shown capability in obtaining an accurate calculation. However, the expensive computational cost limits the application of the DFT method for high-throughput screening of materials. Recently, machine learning models, especially graph neural networks (GNNs), have demonstrated high accuracy in many material properties’ prediction, such as bandgap and formation energy, but fail to accurately predict heat capacity( ) due to the limitation in capturing crystallographic features. In our study, we have implemented the material informatics transformer (MatInFormer) framework, which has been pretrained on lattice reconstruction tasks. This approach has shown proficiency in capturing essential crystallographic features. By concatenating these features with human-designed descriptors, we achieved a mean absolute error of 4.893 and 4.505 J/(mol K) in our predictions. Our findings underscore the efficacy of the MatInFormer framework in leveraging crystallography, augmented with additional information processing capabilities.
INTRODUCTION
Thermal properties such as thermal conductivity and phonon heat capacity are the fundamental properties of materials and relate to various applications such as thermal electronic devices.1 In the past decades, accurate results are achieved through implementation of density functional theory (DFT) to calculate phonon frequencies and scattering rates,2,3 which is computationally expensive for large calculation, making it non-feasible for high-throughput screening of new materials.4 Recently, machine learning (ML) methods have been widely used in scientific research in materials, molecule property prediction, and simulation.5–15
Machine learning for thermal properties has previously been investigated; Seko et al.16 applied a kernel ridge regression model to the 110 compounds with lattice thermal conductivity (KL) computed by the DFT method. Chen et al.17 built an ML model based on a Gaussian process regression (GPR) algorithm and 100 inorganic materials with experimentally measured KL. Xingming et al.18 constructed crystal structural and compositional information descriptors and used XGBoost to predict KL, Debye temperature, heat capacity, and thermal expansion.
Graph neural networks (GNNs) achieve remarkable performance in predicting material properties, such as formation energy, elastic modules, and bandgap. Many excellent regression models like CGCNN,5 ALIGNN,13 PotNet,19 and coNGN20 have been proposed with accuracies similar to those of DFT calculations. However, the GNN model still has limitations. Sheng et al.21 have evaluated the limitation of GNN for crystal materials in multiple regression tasks. It shows that GNN has limitations in capturing certain knowledge of the crystal structure, such as lattice parameters, crystal system, and periodicity. This limits the potential of GNNs to predict some of the properties of the material, including phonon heat capacity. Compared to the descriptor-only model, the mean absolute error (MAE) of CGCNN and ALIGNN is almost 6 and 5 times larger. From the large-scale benchmark of material on JARVIS-leaderboard22 which has 44 property prediction tasks and the model ALIGNN have better performance than the descriptor-based model in most of the tasks, but the performance on phonon heat capacity is much worse than the descriptor-based machine learning model.
For most previous GNNs, message-passing networks were employed to update the atomic feature based on neighboring atoms and bonds and readout by an average pooling layer to obtain the feature of the global crystal structure. This allows GNN to effectively capture the local atomic environment but diminishes its ability to capture global information. As a consequence, this limitation leads to poorer performance in predicting extensive properties, such as heat capacity. The limitation comes from two sides: (1) The range of the receptive field is related to the neighboring neighbor and the number of convolution layers. The GNN models are not able to capture periodicity length larger than the receptive field. However, simply increasing the cutoff distance and the number of convolution layers cannot lead to better performance.21,23 (2) Average pooling in the GNN architecture also brings restrictions in predicting extensive property; this can be improved by using sum pooling. However, sum pooling cannot be used for intensive property prediction because we have to maintain the size invariance of the crystal material.5 Therefore, a better representation of the crystal material and model architecture is necessary and essential for extensive properties’ prediction and material discovery.
Transformer-based language model24 have been rising in the area of molecule and material properties. The key mechanism of the transformer, self-attention, has been used not only in the transformer model but also in the graph transformer model for the prediction of material and molecular properties,10,25–31 allowing the model to capture and process long-range dependencies within the input data or graphs. Second, transformers are adept at handling sequential data, making them particularly suitable for analyzing one-dimensional representations such as amino acid sequences (FASTA)32 or chemical strings such as SMILES.33
We adapt our previous work, the materials informatics transformer (MatInFormer),34 which is designed for crystal material and takes the formula and crystal structure symmetry as input. In our framework, a lattice parameter pretraining is designed to leverage the crystallography information to the transformer encoder by reconstructing the lattice parameter. The primary objective of this pretraining is to establish a more informed set of initial weights for the transformer, which is expected to significantly improve its performance in downstream tasks. We have advanced our approach by integrating human-designed descriptors into a comprehensive multi-model framework, as shown in Fig. 1. This integration has been instrumental in enhancing the accuracy of our machine learning model by leveraging the crystallographic insights obtained from the transformer encoder and synergizing them with human-designed descriptors, particularly in the prediction of heat capacity. This advancement not only demonstrates the potential of our MatInFormer in accurately understanding and correlating properties within various crystal systems but also highlights the model’s flexibility and adaptability. Our approach underscores the efficacy of combining deep learning techniques with domain-specific knowledge, paving the way for more sophisticated and precise models in the field of material science.
The Multimodel Framework of de-MatInFormer. This framework processes crystal information through two distinct input channels. The first channel uses human-designed descriptors35 to compute specific features, while the second channel employs tokenized embeddings of formula and space group for MatInFormer.34 The transformer is pretrained using lattice parameter prediction. The embedding of both channels is concatenated after a projection layer and a final output layer is used for the prediction of heat capacity.
The Multimodel Framework of de-MatInFormer. This framework processes crystal information through two distinct input channels. The first channel uses human-designed descriptors35 to compute specific features, while the second channel employs tokenized embeddings of formula and space group for MatInFormer.34 The transformer is pretrained using lattice parameter prediction. The embedding of both channels is concatenated after a projection layer and a final output layer is used for the prediction of heat capacity.
METHOD
Dataset and evaluation metrics
Data Distribution of the JARVIS heat capacity dataset. Allocation of 12 054 entries across training (9644), validation (1205), and testing (1205) sets for model benchmarking. All models are benchmarking using the identical data split.
Data Distribution of the JARVIS heat capacity dataset. Allocation of 12 054 entries across training (9644), validation (1205), and testing (1205) sets for model benchmarking. All models are benchmarking using the identical data split.
Material informatics transformer
Tokenization and transformer encoder
The material informatics transformer (MatInFormer) is the model used to predict material properties, based on a Roberta37 encoder that has 8 layers and 12 attention heads with hidden size 768. The model’s initial step involves the tokenization of crystal features, which includes the embedding of compositional elements. The schiometic formula is used to generate the MatScholar Embedding. Each elemental information embedding has a dimensionality of . These embeddings are then concatenated with each element in the composition, weighted by its respective elemental fraction. Furthermore, crystallography tokens, encompassing the crystal system, point group, space group, and other symmetry-related information, are aggregated to form the initial embedding input for the MatInFormer.
In the last layer, the [CLS] token embedding is used for the prediction head and has the final prediction.
Lattice reconstruct pretraining
Pretraining in NLP tasks fundamentally enabling models to understand and manipulate human language and enhance the performance for downstream tasks. This process typically involves training a large text dataset, where the model learns the grammar of the language. Models such as BERT38 or GPT39 are exposed to a wide range of text sources, from literature to web content, allowing them to develop a broad understanding of language patterns and usage. This pretraining equips the models with a comprehensive linguistic foundation, enabling them to perform a wide range of language-based tasks.
Machine learning descriptor
In order to have vector representation for the crystal structure, human design descriptor is used to generate embedding. Matminer35 featurized is used to compute all the features, and there are 128 structural features that come from the following featurizer: SiteStatsFingerprint, StructuralHeterogeneity, MaximumPackingEfficiency, ChemicalOrdering, and 145 compositional features using the featurizer of Stoichiometry, ElementProperty, ValenceOrbital, and IonProperty.
Descriptor-hybridized multi-model
RESULTS AND DISCUSSION
The result of the heat capacity prediction is shown in Table I, where we compared the performance of the descriptor with the random forest model, the ALIGNN and coNGN. ALIGNN uses the line graph and has the ability to distinguish the bond angle, while coNGN uses the nested line graph and captures the dihedral angle.20
Result of heat capacity. Bold value indicates the best performance and underline value indicates the second. For all metrics, (↓) indicates that the lower the better, and (↑) indicates the higher the better. Schnet,7 CGCNN,5 ALIGNN,13 coNGN,20 and Matminer35 results can be obtained from JAVIS-Leaderboard.
Model . | MAE(↓) . | MSE(↓) . | R2(↑) . | (↓) . |
---|---|---|---|---|
Schnet7 | 17.771 | 1745.587 | 0.381 | 0.4408 |
CGCNN5 | 12.936 | 1181.627 | 0.549 | 0.3208 |
ALIGNN13 | 9.606 | 849.155 | 0.739 | 0.2383 |
coNGN20 | 7.813 | 1086.716 | 0.666 | 0.1938 |
Matminer + RF35 | 5.276 | 322.793 | 0.901 | 0.1309 |
MatInFormer34 | 4.893 | 307.325 | 0.907 | 0.1214 |
de-MatInFormer | 4.505 | 251.998 | 0.923 | 0.1118 |
Model . | MAE(↓) . | MSE(↓) . | R2(↑) . | (↓) . |
---|---|---|---|---|
Schnet7 | 17.771 | 1745.587 | 0.381 | 0.4408 |
CGCNN5 | 12.936 | 1181.627 | 0.549 | 0.3208 |
ALIGNN13 | 9.606 | 849.155 | 0.739 | 0.2383 |
coNGN20 | 7.813 | 1086.716 | 0.666 | 0.1938 |
Matminer + RF35 | 5.276 | 322.793 | 0.901 | 0.1309 |
MatInFormer34 | 4.893 | 307.325 | 0.907 | 0.1214 |
de-MatInFormer | 4.505 | 251.998 | 0.923 | 0.1118 |
It is observed that the Matminer descriptor with random forest model has excellent performance compared to GNNs. The Matminer + RF model demonstrates commendable accuracy, with an MAE of 5.276, alongside an value of 0.901. This outcome indicates that conventional machine learning approaches, especially when augmented with meticulously extracted features, can achieve greater accuracy compared to sophisticated GNN models such as ALIGNN and coNGN.
The MatInFormer model has achieved state-of-the-art (SOTA) performance in the benchmark, achieving an MAE of 4.893. This marks a significant enhancement outperforming the ALIGNN by and the coNGN model by . The MatInFormer model boasts an value of 0.907, slightly exceeding that of the Matminer + RF model, which stands at 0.9. This is in comparison to the ALIGNN and coNGN models, which have values of 0.739 and 0.666, respectively. A broader metric, the ratio, which is unaffected by scaling and was utilized previously, supports this assessment.13,21 The prediction with ratio is considered to have high accuracy. As shown in Table I, MatInFormer with low reflected its excellent performance in predicting heat capacity.
The enhanced multi-model framework, de-MatInFormer, shows further improvements, achieving an MAE of 4.505 and an value of 0.923. This surpasses the performance enhancements observed in the previous work that utilized several hybrid GNN models.21 Although the cited GNN model exhibited a 90% improvement, incorporating the descriptor into our framework led to a relatively modest enhancement in performance of 7.93%. This difference can be viewed in two ways. On one hand, the GNN model’s relatively inferior results suggest opportunities for improvement. On the other hand, it suggests that the MatInFormer model has successfully grasped the comprehensive crystallographic data and its periodic features.
Crystal system and unit cell volume calculation. The crystal system is one of the tokens used in MatInFormer to generate material representation. The values of the lattice parameters are targets of the MatInFormer pretrain stage.
Crystal system . | Lattice parameter . | Volume . |
---|---|---|
Triclinic | a, b, c, α, β, γ | |
Monoclinic | ||
Hexagonal | ||
Rhombohedral | a, a, a, α, α, α | |
Orthorhombic | a × b × c | |
Tetragonal | a2 × c | |
Cubic | a3 |
Crystal system . | Lattice parameter . | Volume . |
---|---|---|
Triclinic | a, b, c, α, β, γ | |
Monoclinic | ||
Hexagonal | ||
Rhombohedral | a, a, a, α, α, α | |
Orthorhombic | a × b × c | |
Tetragonal | a2 × c | |
Cubic | a3 |
The previous work in MLatticeABC42 and CRYSPNet43 has demonstrated that machine learning models, when provided with compositional data, are capable of learning lattice parameters for materials with high symmetry. However, this fails for the current GNN model, while the whole crystal structure is used to generate the graph and especially for low-symmetry materials.21 The failure of GNNs to capture periodicity, crystal system, and lattice parameters results in a poor performance in predicting material properties that are highly related to these features. Capturing the lattice parameter is crucial for the accuracy of the machine learning model’s predictions. Based on the previous analysis of the importance of the de-CGCNN feature,21 it is evident that the importance of the lattice parameter exceeds 0.98, significantly influencing the model’s performance, while the GNN-learned feature is below 0.02. The calculation of the unit cell volume in Table II shows a strong correlation with the lattice parameter , especially in high-symmetry systems such as cubic structures where volume . For simple CGCNN that only generate crystal graphs with atom distance, it is a challenge to capture angle information of . This clarifies how a structure-agnostic model can grasp crystal structures and underscores why the lattice parameter holds a high importance score in de-CGCNN for predicting .
MatInFormer is a coordinate-free44 model that does not take the whole crystal structure as input, rather take the crystal system, and more symmetry information as inputs. Our transformer model is able to learn the “grammar” of the crystal instead of human language. The unique tokenization process utilizes crystallography information: unlike a GNN crystal graph, the input for the MatInFormer avoids the limitations of a small receptive field and difficulties in identifying different crystal systems. Additionally, the pretraining phase focuses on predicting lattice parameters, enabling the model to link the chemical formula and crystal system with the corresponding lattice parameters. As a result, the ability of MatInFormer to capture the global feature is better than GNN. This finding underscores the critical importance of crystallographic information in predicting material properties. This also implies that models informed by physics can provide valuable insights into the inner workings of machine learning models, moving beyond the traditional “black box” approach to deepen our comprehension.
CONCLUSION
In this study, we adapt the MatInFormer framework for heat capacity prediction. The MatInFormer uses a composition and crystal symmetry approach without using the detail coordinate and structure information like bond distance or angle. This transformer-based framework has been pretrained using the lattice parameter pretraining method. Through this pretraining approach using the crystal lattice parameter, our MatInFormer manages to identify crystal characteristics, comprehend the nuances of each token, and attain SOTA performance in the JARVIS heat capacity benchmark.
We also advanced the development of a descriptor-hybridized multimodal de-MatInFormer, combining human-designed descriptors with transformer embeddings to further improve accuracy. We demonstrate the versatility of our MatInFormer and highlight the significance of incorporating crystal space group information for predicting heat capacity using machine learning. This multi-model architecture has the potential for further development and application in various fields. The domain knowledge can be leveraged in three different methods to a machine learning model similar to ours. Initially, we introduce a direct input token specific to crystallography; next, we employ lattice parameter prediction to reconstruct the crystal structure; and finally, we utilize a fusion of human-designed descriptors and features learned by the transformer encoder.
However, since transformers have limitations in tokenization of numerical values, the de-MatInFormer framework might not be able to incorporate more geometric details such as bond length, bond angle, and dihedral angle like GNNs. The progression of machine learning models in materials science is likely to shift more toward creating material structures with specific desired properties. Although our current model demonstrates superior accuracy in predicting heat capacity ( ), it may exhibit limitations in addressing quantum properties that are closely linked to the local atomic environment. To surmount these challenges, a hybrid approach that integrates the strengths of transformers, GNNs, and human design descriptors may offer a promising pathway.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Hongshuo Huang: Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Amir Barati Farimani: Project administration (lead); Supervision (lead).
DATA AVAILABILITY
The data that support the findings of this study are openly available in JARVIS Databases. MatInFormer can be obtained at https://github.com/hongshuh/MatInFormer, Ref. 34.