Thermal properties of materials are essential to many applications of thermal electronic devices. Density functional theory (DFT) has shown capability in obtaining an accurate calculation. However, the expensive computational cost limits the application of the DFT method for high-throughput screening of materials. Recently, machine learning models, especially graph neural networks (GNNs), have demonstrated high accuracy in many material properties’ prediction, such as bandgap and formation energy, but fail to accurately predict heat capacity( C V) due to the limitation in capturing crystallographic features. In our study, we have implemented the material informatics transformer (MatInFormer) framework, which has been pretrained on lattice reconstruction tasks. This approach has shown proficiency in capturing essential crystallographic features. By concatenating these features with human-designed descriptors, we achieved a mean absolute error of 4.893 and 4.505 J/(mol K) in our predictions. Our findings underscore the efficacy of the MatInFormer framework in leveraging crystallography, augmented with additional information processing capabilities.

Thermal properties such as thermal conductivity and phonon heat capacity are the fundamental properties of materials and relate to various applications such as thermal electronic devices.1 In the past decades, accurate results are achieved through implementation of density functional theory (DFT) to calculate phonon frequencies and scattering rates,2,3 which is computationally expensive for large calculation, making it non-feasible for high-throughput screening of new materials.4 Recently, machine learning (ML) methods have been widely used in scientific research in materials, molecule property prediction, and simulation.5–15 

Machine learning for thermal properties has previously been investigated; Seko et al.16 applied a kernel ridge regression model to the 110 compounds with lattice thermal conductivity (KL) computed by the DFT method. Chen et al.17 built an ML model based on a Gaussian process regression (GPR) algorithm and 100 inorganic materials with experimentally measured KL. Xingming et al.18 constructed crystal structural and compositional information descriptors and used XGBoost to predict KL, Debye temperature, heat capacity, and thermal expansion.

Graph neural networks (GNNs) achieve remarkable performance in predicting material properties, such as formation energy, elastic modules, and bandgap. Many excellent regression models like CGCNN,5 ALIGNN,13 PotNet,19 and coNGN20 have been proposed with accuracies similar to those of DFT calculations. However, the GNN model still has limitations. Sheng et al.21 have evaluated the limitation of GNN for crystal materials in multiple regression tasks. It shows that GNN has limitations in capturing certain knowledge of the crystal structure, such as lattice parameters, crystal system, and periodicity. This limits the potential of GNNs to predict some of the properties of the material, including phonon heat capacity. Compared to the descriptor-only model, the mean absolute error (MAE) of CGCNN and ALIGNN is almost 6 and 5 times larger. From the large-scale benchmark of material on JARVIS-leaderboard22 which has 44 property prediction tasks and the model ALIGNN have better performance than the descriptor-based model in most of the tasks, but the performance on phonon heat capacity is much worse than the descriptor-based machine learning model.

For most previous GNNs, message-passing networks were employed to update the atomic feature based on neighboring atoms and bonds and readout by an average pooling layer to obtain the feature of the global crystal structure. This allows GNN to effectively capture the local atomic environment but diminishes its ability to capture global information. As a consequence, this limitation leads to poorer performance in predicting extensive properties, such as heat capacity. The limitation comes from two sides: (1) The range of the receptive field is related to the neighboring neighbor and the number of convolution layers. The GNN models are not able to capture periodicity length larger than the receptive field. However, simply increasing the cutoff distance and the number of convolution layers cannot lead to better performance.21,23 (2) Average pooling in the GNN architecture also brings restrictions in predicting extensive property; this can be improved by using sum pooling. However, sum pooling cannot be used for intensive property prediction because we have to maintain the size invariance of the crystal material.5 Therefore, a better representation of the crystal material and model architecture is necessary and essential for extensive properties’ prediction and material discovery.

Transformer-based language model24 have been rising in the area of molecule and material properties. The key mechanism of the transformer, self-attention, has been used not only in the transformer model but also in the graph transformer model for the prediction of material and molecular properties,10,25–31 allowing the model to capture and process long-range dependencies within the input data or graphs. Second, transformers are adept at handling sequential data, making them particularly suitable for analyzing one-dimensional representations such as amino acid sequences (FASTA)32 or chemical strings such as SMILES.33 

We adapt our previous work, the materials informatics transformer (MatInFormer),34 which is designed for crystal material and takes the formula and crystal structure symmetry as input. In our framework, a lattice parameter pretraining is designed to leverage the crystallography information to the transformer encoder by reconstructing the lattice parameter. The primary objective of this pretraining is to establish a more informed set of initial weights for the transformer, which is expected to significantly improve its performance in downstream tasks. We have advanced our approach by integrating human-designed descriptors into a comprehensive multi-model framework, as shown in Fig. 1. This integration has been instrumental in enhancing the accuracy of our machine learning model by leveraging the crystallographic insights obtained from the transformer encoder and synergizing them with human-designed descriptors, particularly in the prediction of heat capacity. This advancement not only demonstrates the potential of our MatInFormer in accurately understanding and correlating properties within various crystal systems but also highlights the model’s flexibility and adaptability. Our approach underscores the efficacy of combining deep learning techniques with domain-specific knowledge, paving the way for more sophisticated and precise models in the field of material science.

FIG. 1.

The Multimodel Framework of de-MatInFormer. This framework processes crystal information through two distinct input channels. The first channel uses human-designed descriptors35 to compute specific features, while the second channel employs tokenized embeddings of formula and space group for MatInFormer.34 The transformer is pretrained using lattice parameter prediction. The embedding of both channels is concatenated after a projection layer and a final output layer is used for the prediction of heat capacity.

FIG. 1.

The Multimodel Framework of de-MatInFormer. This framework processes crystal information through two distinct input channels. The first channel uses human-designed descriptors35 to compute specific features, while the second channel employs tokenized embeddings of formula and space group for MatInFormer.34 The transformer is pretrained using lattice parameter prediction. The embedding of both channels is concatenated after a projection layer and a final output layer is used for the prediction of heat capacity.

Close modal
The dataset used in this work is from the JARVIS-DFT dataset.36 The version of the JARVIS-heat-capacity dataset used for this work consists of 12 054 crystal structures. Each entry in this dataset is associated with its corresponding phonon heat capacity ( C V) as the target label. To compare the performance of the model, the dataset is split into 0.8/0.1/0.1 for the train/val/test, the data distribution is shown in Fig. 2. The evaluation metrics on the test set is used to compare the performance. In this work, we mainly compared the MAE, mean square error(MSE), R 2 value, and the M A E M A D ratio in the test set as follows:
(1)
(2)
(3)
(4)
where N is the total number of samples in the test set, y i is the true value for the i sample, y i ^ is the prediction value from the models, and y ¯ is the mean of the true value.
FIG. 2.

Data Distribution of the JARVIS heat capacity dataset. Allocation of 12 054 entries across training (9644), validation (1205), and testing (1205) sets for model benchmarking. All models are benchmarking using the identical data split.

FIG. 2.

Data Distribution of the JARVIS heat capacity dataset. Allocation of 12 054 entries across training (9644), validation (1205), and testing (1205) sets for model benchmarking. All models are benchmarking using the identical data split.

Close modal

Tokenization and transformer encoder

The material informatics transformer (MatInFormer) is the model used to predict material properties, based on a Roberta37 encoder that has 8 layers and 12 attention heads with hidden size 768. The model’s initial step involves the tokenization of crystal features, which includes the embedding of compositional elements. The schiometic formula is used to generate the MatScholar Embedding. Each elemental information embedding has a dimensionality of d e l = 200. These embeddings are then concatenated with each element in the composition, weighted by its respective elemental fraction. Furthermore, crystallography tokens, encompassing the crystal system, point group, space group, and other symmetry-related information, are aggregated to form the initial embedding input for the MatInFormer.

At the core of the MatInFormer operational framework is the self-attention mechanism, specifically employing the “Scaled Dot-Product Attention” methodology. This approach involves the transformation of input data into three distinct matrices: queries (Q), keys (K), and values (V). These matrices are integral to the computation of attention,
(5)
where d k is the dimensions of K.

In the last layer, the [CLS] token embedding is used for the prediction head and has the final prediction.

Lattice reconstruct pretraining

Pretraining in NLP tasks fundamentally enabling models to understand and manipulate human language and enhance the performance for downstream tasks. This process typically involves training a large text dataset, where the model learns the grammar of the language. Models such as BERT38 or GPT39 are exposed to a wide range of text sources, from literature to web content, allowing them to develop a broad understanding of language patterns and usage. This pretraining equips the models with a comprehensive linguistic foundation, enabling them to perform a wide range of language-based tasks.

In contrast, pretraining of the transformer for material involves a different focus. The objective is for the model to learn the properties and feature of the materials, which is more structured and requires domain knowledge, unlike the fluidity of natural language. While NLP models learn to predict the next word or masked token,37 the MatInFormer is designed to predict the lattice parameters, a fundamental aspect of crystallography. This processes the model learning from a comprehensive dataset, rich in structural information about various materials. The pretraining enables the model to understand and encode the geometric configurations and periodicities of atoms in a crystal lattice. The objective of this stage is to train the model to minimize the normalized MSE of the lattice parameters a , b , c and α , β , γ. The pretrained dataset contains 642 459 crystal data obtained from JARVIS-tools,
(6)
where L i is the normalized lattice parameter and L ^ i is the model prediction from the pretrain regression head for the [CLS] tokens.

Machine learning descriptor

In order to have vector representation for the crystal structure, human design descriptor is used to generate embedding. Matminer35 featurized is used to compute all the features, and there are 128 structural features that come from the following featurizer: SiteStatsFingerprint, StructuralHeterogeneity, MaximumPackingEfficiency, ChemicalOrdering, and 145 compositional features using the featurizer of Stoichiometry, ElementProperty, ValenceOrbital, and IonProperty.

Descriptor-hybridized multi-model

Two projectors are used to reshape the dimension and generate a concatenate embedding Z C,
(7)
(8)
where Z D P and Z T P are the projection embedding from descriptor and the transformer. Z T is the output embedding from the transformer encoder, and here, we use a residual network40 that first concatenates the descriptor embedding Z D P and and transformer embedding Z T P and a skip connection with Z T.

The result of the heat capacity prediction is shown in Table I, where we compared the performance of the descriptor with the random forest model, the ALIGNN and coNGN. ALIGNN uses the line graph and has the ability to distinguish the bond angle, while coNGN uses the nested line graph and captures the dihedral angle.20 

TABLE I.

Result of heat capacity. Bold value indicates the best performance and underline value indicates the second. For all metrics, (↓) indicates that the lower the better, and (↑) indicates the higher the better. Schnet,7 CGCNN,5 ALIGNN,13 coNGN,20 and Matminer35 results can be obtained from JAVIS-Leaderboard.

ModelMAE(↓)MSE(↓)R2(↑) M A E M A D(↓)
Schnet7  17.771 1745.587 0.381 0.4408 
CGCNN5  12.936 1181.627 0.549 0.3208 
ALIGNN13  9.606 849.155 0.739 0.2383 
coNGN20  7.813 1086.716 0.666 0.1938 
Matminer + RF35  5.276 322.793 0.901 0.1309 
MatInFormer34  4.893 307.325 0.907 0.1214 
de-MatInFormer 4.505 251.998 0.923 0.1118 
ModelMAE(↓)MSE(↓)R2(↑) M A E M A D(↓)
Schnet7  17.771 1745.587 0.381 0.4408 
CGCNN5  12.936 1181.627 0.549 0.3208 
ALIGNN13  9.606 849.155 0.739 0.2383 
coNGN20  7.813 1086.716 0.666 0.1938 
Matminer + RF35  5.276 322.793 0.901 0.1309 
MatInFormer34  4.893 307.325 0.907 0.1214 
de-MatInFormer 4.505 251.998 0.923 0.1118 

It is observed that the Matminer descriptor with random forest model has excellent performance compared to GNNs. The Matminer + RF model demonstrates commendable accuracy, with an MAE of 5.276, alongside an R 2 value of 0.901. This outcome indicates that conventional machine learning approaches, especially when augmented with meticulously extracted features, can achieve greater accuracy compared to sophisticated GNN models such as ALIGNN and coNGN.

The MatInFormer model has achieved state-of-the-art (SOTA) performance in the benchmark, achieving an MAE of 4.893. This marks a significant enhancement outperforming the ALIGNN by 49.03 % and the coNGN model by 37.37 %. The MatInFormer model boasts an R 2 value of 0.907, slightly exceeding that of the Matminer + RF model, which stands at 0.9. This is in comparison to the ALIGNN and coNGN models, which have R 2 values of 0.739 and 0.666, respectively. A broader metric, the M A E M A D ratio, which is unaffected by scaling and was utilized previously, supports this assessment.13,21 The prediction with M A E M A D ratio < 0.2 is considered to have high accuracy. As shown in Table I, MatInFormer with low M A E M A D = 0.1214 reflected its excellent performance in predicting heat capacity.

The enhanced multi-model framework, de-MatInFormer, shows further improvements, achieving an MAE of 4.505 and an R 2 value of 0.923. This surpasses the performance enhancements observed in the previous work that utilized several hybrid GNN models.21 Although the cited GNN model exhibited a 90% improvement, incorporating the descriptor into our framework led to a relatively modest enhancement in performance of 7.93%. This difference can be viewed in two ways. On one hand, the GNN model’s relatively inferior results suggest opportunities for improvement. On the other hand, it suggests that the MatInFormer model has successfully grasped the comprehensive crystallographic data and its periodic features.

As discussed by Gong et al.,21  C V can be approximated by the Debye model of density of states when only consider the acoustic phonons as follows:41 
(9)
(10)
(11)
(12)
where θ is the debye temperature, V is the volume of the primitive cell, and v is the velocity of sound. C is the effective spring constant, m is the mass of atoms in the primitive cell, and d is the effective distance between the atomic planes along the direction of vibration. At a given temperature T, an approximate value can be calculated with the information of C , V , m, and d. For a given crystal structure, the unit cell volume can be calculated by the lattice parameter, as Table II corresponds to the crystal system.
TABLE II.

Crystal system and unit cell volume calculation. The crystal system is one of the tokens used in MatInFormer to generate material representation. The values of the lattice parameters are targets of the MatInFormer pretrain stage.

Crystal systemLattice parameterVolume
Triclinic a, b, c, α, β, γ  a × b × c × 1 cos 2 α cos 2 β cos 2 γ + 2 cos α cos β cos γ 
Monoclinic  a , b , c , π 2 , β , π 2  a × b × c × sin β 
Hexagonal  a , a , c , π 2 , π 2 , 2 π 3  a 2 × c × sin 2 π 3 
Rhombohedral a, a, a, α, α, α  a 3 × 1 3 cos 2 α + 2 cos 3 α 
Orthorhombic  a , b , c , π 2 , π 2 , π 2 a × b × c 
Tetragonal  a , a , c , π 2 , π 2 , π 2 a2 × c 
Cubic  a , a , a , π 2 , π 2 , π 2 a3 
Crystal systemLattice parameterVolume
Triclinic a, b, c, α, β, γ  a × b × c × 1 cos 2 α cos 2 β cos 2 γ + 2 cos α cos β cos γ 
Monoclinic  a , b , c , π 2 , β , π 2  a × b × c × sin β 
Hexagonal  a , a , c , π 2 , π 2 , 2 π 3  a 2 × c × sin 2 π 3 
Rhombohedral a, a, a, α, α, α  a 3 × 1 3 cos 2 α + 2 cos 3 α 
Orthorhombic  a , b , c , π 2 , π 2 , π 2 a × b × c 
Tetragonal  a , a , c , π 2 , π 2 , π 2 a2 × c 
Cubic  a , a , a , π 2 , π 2 , π 2 a3 

The previous work in MLatticeABC42 and CRYSPNet43 has demonstrated that machine learning models, when provided with compositional data, are capable of learning lattice parameters for materials with high symmetry. However, this fails for the current GNN model, while the whole crystal structure is used to generate the graph and especially for low-symmetry materials.21 The failure of GNNs to capture periodicity, crystal system, and lattice parameters results in a poor performance in predicting material properties that are highly related to these features. Capturing the lattice parameter is crucial for the accuracy of the machine learning model’s predictions. Based on the previous analysis of the importance of the de-CGCNN feature,21 it is evident that the importance of the lattice parameter a exceeds 0.98, significantly influencing the model’s performance, while the GNN-learned feature is below 0.02. The calculation of the unit cell volume in Table II shows a strong correlation with the lattice parameter a, especially in high-symmetry systems such as cubic structures where volume V = a 3. For simple CGCNN that only generate crystal graphs with atom distance, it is a challenge to capture angle information of α , β , γ. This clarifies how a structure-agnostic model can grasp crystal structures and underscores why the lattice parameter holds a high importance score in de-CGCNN for predicting C V.

MatInFormer is a coordinate-free44 model that does not take the whole crystal structure as input, rather take the crystal system, and more symmetry information as inputs. Our transformer model is able to learn the “grammar” of the crystal instead of human language. The unique tokenization process utilizes crystallography information: unlike a GNN crystal graph, the input for the MatInFormer avoids the limitations of a small receptive field and difficulties in identifying different crystal systems. Additionally, the pretraining phase focuses on predicting lattice parameters, enabling the model to link the chemical formula and crystal system with the corresponding lattice parameters. As a result, the ability of MatInFormer to capture the global feature is better than GNN. This finding underscores the critical importance of crystallographic information in predicting material properties. This also implies that models informed by physics can provide valuable insights into the inner workings of machine learning models, moving beyond the traditional “black box” approach to deepen our comprehension.

In this study, we adapt the MatInFormer framework for heat capacity prediction. The MatInFormer uses a composition and crystal symmetry approach without using the detail coordinate and structure information like bond distance or angle. This transformer-based framework has been pretrained using the lattice parameter pretraining method. Through this pretraining approach using the crystal lattice parameter, our MatInFormer manages to identify crystal characteristics, comprehend the nuances of each token, and attain SOTA performance in the JARVIS heat capacity benchmark.

We also advanced the development of a descriptor-hybridized multimodal de-MatInFormer, combining human-designed descriptors with transformer embeddings to further improve accuracy. We demonstrate the versatility of our MatInFormer and highlight the significance of incorporating crystal space group information for predicting heat capacity using machine learning. This multi-model architecture has the potential for further development and application in various fields. The domain knowledge can be leveraged in three different methods to a machine learning model similar to ours. Initially, we introduce a direct input token specific to crystallography; next, we employ lattice parameter prediction to reconstruct the crystal structure; and finally, we utilize a fusion of human-designed descriptors and features learned by the transformer encoder.

However, since transformers have limitations in tokenization of numerical values, the de-MatInFormer framework might not be able to incorporate more geometric details such as bond length, bond angle, and dihedral angle like GNNs. The progression of machine learning models in materials science is likely to shift more toward creating material structures with specific desired properties. Although our current model demonstrates superior accuracy in predicting heat capacity ( C V), it may exhibit limitations in addressing quantum properties that are closely linked to the local atomic environment. To surmount these challenges, a hybrid approach that integrates the strengths of transformers, GNNs, and human design descriptors may offer a promising pathway.

The authors have no conflicts to disclose.

Hongshuo Huang: Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Amir Barati Farimani: Project administration (lead); Supervision (lead).

The data that support the findings of this study are openly available in JARVIS Databases. MatInFormer can be obtained at https://github.com/hongshuh/MatInFormer, Ref. 34.

1.
J.-C.
Zheng
, “
Recent advances on thermoelectric materials
,”
Front. Phys. China
3
,
269
279
(
2008
).
2.
L.
Lindsay
,
D.
Broido
, and
T.
Reinecke
, “
Ab initio thermal transport in compound semiconductors
,”
Phys. Rev. B
87
,
165201
(
2013
).
3.
A.
Ward
and
D.
Broido
, “
Intrinsic phonon relaxation times from first-principles studies of the thermal conductivities of Si and Ge
,”
Phys. Rev. B
81
,
085205
(
2010
).
4.
A.
Seko
,
A.
Togo
,
H.
Hayashi
,
K.
Tsuda
,
L.
Chaput
, and
I.
Tanaka
, “
Prediction of low-thermal-conductivity compounds with first-principles anharmonic lattice-dynamics calculations and Bayesian optimization
,”
Phys. Rev. Lett.
115
,
205901
(
2015
).
5.
T.
Xie
and
J. C.
Grossman
, “
Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties
,”
Phys. Rev. Lett.
120
,
145301
(
2018
).
6.
M.
Karamad
,
R.
Magar
,
Y.
Shi
,
S.
Siahrostami
,
I. D.
Gates
, and
A. B.
Farimani
, “
Orbital graph convolutional neural network for material property prediction
,”
Phys. Rev. Mater.
4
,
093801
(
2020
).
7.
K. T.
Schütt
,
H. E.
Sauceda
,
P. -J.
Kindermans
,
A.
Tkatchenko
, and
K. -R.
Müller
, “
SchNet–A deep learning architecture for molecules and materials
,”
J. Chem. Phys.
148
,
241722
(
2018
).
8.
R.
Magar
,
Y.
Wang
,
C.
Lorsung
,
C.
Liang
,
H.
Ramasubramanian
,
P.
Li
, and
A. B.
Farimani
, “AugLiChem: Data augmentation library of chemical structures for machine learning,” arXiv:2111.15112 (2021).
9.
R.
Magar
,
Y.
Wang
, and
A.
Farimani
, “
Crystal twins: Self-supervised learning for crystalline material property prediction
,”
npj Comput. Mater.
8
,
231
(
2022
).
10.
Z.
Cao
,
R.
Magar
,
Y.
Wang
, and
A.
Barati Farimani
, “
Moformer: Self-supervised transformer model for metal–organic framework property prediction
,”
J. Am. Chem. Soc.
145
,
2958
2967
(
2023
).
11.
S.-Y.
Louis
,
Y.
Zhao
,
A.
Nasiri
,
X.
Wang
,
Y.
Song
,
F.
Liu
, and
J.
Hu
, “
Graph convolutional neural networks with global attention for improved materials property prediction
,”
Phys. Chem. Chem. Phys.
22
,
18141
18148
(
2020
).
12.
C.
Chen
,
W.
Ye
,
Y.
Zuo
,
C.
Zheng
, and
S. P.
Ong
, “
Graph networks as a universal machine learning framework for molecules and crystals
,”
Chem. Mater.
31
,
3564
3572
(
2019
).
13.
K.
Choudhary
and
B.
DeCost
, “
Atomistic line graph neural network for improved materials property predictions
,”
npj. Comput. Mater.
7
,
1
8
(
2021
).
14.
J.
Ock
,
T.
Tian
,
J.
Kitchin
, and
Z.
Ulissi
, “
Beyond independent error assumptions in large GNN atomistic models
,”
J. Chem. Phys.
158
,
214702
(
2023
).
15.
N.
Kazeev
,
A. R.
Al-Maeeni
,
I.
Romanov
,
M.
Faleev
,
R.
Lukin
,
A.
Tormasov
,
A.
Castro Neto
,
K. S.
Novoselov
,
P.
Huang
, and
A.
Ustyuzhanin
, “
Sparse representation for machine learning the properties of defects in 2D materials
,”
npj. Comput. Mater.
9
,
113
(
2023
).
16.
A.
Seko
,
H.
Hayashi
,
K.
Nakayama
,
A.
Takahashi
, and
I.
Tanaka
, “
Representation of compounds for machine-learning prediction of physical properties
,”
Phys. Rev. B
95
,
144110
(
2017
).
17.
L.
Chen
,
H.
Tran
,
R.
Batra
,
C.
Kim
, and
R.
Ramprasad
, “
Machine learning models for the lattice thermal conductivity prediction of inorganic materials
,”
Comput. Mater. Sci.
170
,
109155
(
2019
).
18.
X.
Wang
,
S.
Zeng
,
Z.
Wang
, and
J.
Ni
, “
Identification of crystalline materials with ultra-low thermal conductivity based on machine learning study
,”
J. Phys. Chem. C
124
,
8488
8495
(
2020
).
19.
Y.
Lin
,
K.
Yan
,
Y.
Luo
,
Y.
Liu
,
X.
Qian
, and
S.
Ji
, “Efficient approximations of complete interatomic potentials for crystal property prediction,” arXiv:2306.10045 (2023).
20.
R.
Ruff
,
P.
Reiser
,
J.
Stühmer
, and
P.
Friederich
, “Connectivity optimized nested graph networks for crystal structures,” arXiv:2302.14102 (2023).
21.
S.
Gong
,
K.
Yan
,
T.
Xie
,
Y.
Shao-Horn
,
R.
Gomez-Bombarelli
,
S.
Ji
, and
J. C.
Grossman
, “
Examining graph neural networks for crystal structures: Limitations and opportunities for capturing periodicity
,”
Sci. Adv.
9
,
eadi3245
(
2023
).
22.
K.
Choudhary
,
D.
Wines
,
K.
Li
,
K. F.
Garrity
,
V.
Gupta
,
A. H.
Romero
,
J. T.
Krogel
,
K.
Saritas
,
A.
Fuhr
,
P.
Ganesh
et al., “Large scale benchmark of materials design methods,” arXiv:2306.11688 (2023).
23.
Y.
Li
,
Y.
Wang
,
L.
Huang
,
H.
Yang
,
X.
Wei
,
J.
Zhang
,
T.
Wang
,
Z.
Wang
,
B.
Shao
, and
T.-Y.
Liu
, “Long-short-range message-passing: A physics-informed framework to capture non-local interaction for scalable molecular dynamics simulation,” arXiv:2304.13542 (2023).
24.
A.
Vaswani
,
N.
Shazeer
,
N.
Parmar
,
J.
Uszkoreit
,
L.
Jones
,
A. N.
Gomez
,
Ł.
Kaiser
, and
I.
Polosukhin
, “Attention is all you need,” in Advances in Neural Information Processing Systems (2017), pp. 5998–6008.
25.
S.
Chithrananda
,
G.
Grand
, and
B.
Ramsundar
, “ChemBERTa: Large-scale self-supervised pretraining for molecular property prediction,” arXiv:2010.09885 (2020).
26.
C.
Xu
,
Y.
Wang
, and
A. B.
Farimani
, “TransPolymer: A transformer-based language model for polymer property predictions,” arXiv:2209.01307 (2022).
27.
A.
Elnaggar
,
M.
Heinzinger
,
C.
Dallago
,
G.
Rehawi
,
Y.
Wang
,
L.
Jones
,
T.
Gibbs
,
T.
Feher
,
C.
Angerer
,
M.
Steinegger
,
D.
Bhowmik
, and
B.
Rost
, “ProtTrans: Towards cracking the language of lifes code through self-supervised deep learning and high performance computing,” in IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE, 2021).
28.
Z.
Lin
,
H.
Akin
,
R.
Rao
,
B.
Hie
,
Z.
Zhu
,
W.
Lu
,
N.
Smetanin
,
M.
Fazel-Zarandi
,
T.
Sercu
,
S.
Candido
, and
A.
Rives
, “Language models of protein sequences at the scale of evolution enable accurate structure prediction,” bioRxiv (2022).
29.
R. M.
Rao
,
J.
Liu
,
R.
Verkuil
,
J.
Meier
,
J.
Canny
,
P.
Abbeel
,
T.
Sercu
, and
A.
Rives
, “MSA transformer,” in International Conference on Machine Learning (2021), pp. 8844–8856.
30.
A.
Yüksel
,
E.
Ulusoy
,
A.
Ünlü
, and
T.
Doǧan
, “SELFormer: Molecular representation learning via SELFIES language models,”
Machine Learning: Science and Technology
,
4
(
2
),
025035
(
2023
).
31.
E.
Nijkamp
,
J.
Ruffolo
,
E. N.
Weinstein
,
N.
Naik
, and
A.
Madani
, “ProGen2: Exploring the boundaries of protein language models”
Cell systems
14
(
11
),
968
978
(
2023
).
32.
D. J.
Lipman
and
W. R.
Pearson
, “
Rapid and sensitive protein similarity searches
,”
Science
227
,
1435
1441
(
1985
).
33.
D.
Weininger
, “
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
,”
J. Chem. Inform. Comput. Sci.
28
,
31
36
(
1988
).
34.
H.
Huang
,
R.
Magar
,
C.
Xu
, and
A. B.
Farimani
, “Materials informatics transformer: A language model for interpretable materials properties prediction,” arXiv:2308.16259 (2023).
35.
L.
Ward
,
A.
Dunn
,
A.
Faghaninia
,
N. E.
Zimmermann
,
S.
Bajaj
,
Q.
Wang
,
J.
Montoya
,
J.
Chen
,
K.
Bystrom
,
M.
Dylla
et al., “
Matminer: An open source toolkit for materials data mining
,”
Comput. Mater. Sci.
152
,
60
69
(
2018
).
36.
K.
Choudhary
,
K. F.
Garrity
,
A. C.
Reid
,
B.
DeCost
,
A. J.
Biacchi
,
A. R.
Hight Walker
,
Z.
Trautt
,
J.
Hattrick-Simpers
,
A. G.
Kusne
,
A.
Centrone
et al., “
The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design
,”
npj. Comput. Mater.
6
,
173
(
2020
).
37.
Y.
Liu
,
M.
Ott
,
N.
Goyal
,
J.
Du
,
M.
Joshi
,
D.
Chen
,
O.
Levy
,
M.
Lewis
,
L.
Zettlemoyer
, and
V.
Stoyanov
, “Roberta: A robustly optimized bert pretraining approach,” arXiv:1907.11692 (2019).
38.
J.
Devlin
,
M.-W.
Chang
,
K.
Lee
, and
K.
Toutanova
, Bert: “Pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805 (2018).
39.
T.
Brown
,
B.
Mann
,
N.
Ryder
,
M.
Subbiah
,
J. D.
Kaplan
,
P.
Dhariwal
,
A.
Neelakantan
,
P.
Shyam
,
G.
Sastry
,
A.
Askell
et al., “
Language models are few-shot learners
,”
Adv. Neural Inform. Process. Syst.
33
,
1877
1901
(
2020
); available at https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
40.
K.
He
,
X.
Zhang
,
S.
Ren
, and
J.
Sun
, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 770–778.
41.
C.
Kittel
,
Introduction to Solid State Physics
(
John Wiley & Sons, Inc
,
2005
).
42.
Y.
Li
,
W.
Yang
,
R.
Dong
, and
J.
Hu
, “
MLatticeABC: Generic lattice constant prediction of crystal materials using machine learning
,”
ACS Omega
6
,
11585
11594
(
2021
).
43.
H.
Liang
,
V.
Stanev
,
A. G.
Kusne
, and
I.
Takeuchi
, “
CRYSPNet: Crystal structure predictions via neural networks
,”
Phys. Rev. Mater.
4
,
123802
(
2020
).
44.
R. E.
Goodall
,
A. S.
Parackal
,
F. A.
Faber
,
R.
Armiento
, and
A. A.
Lee
, “
Rapid discovery of stable materials by coordinate-free coarse graining
,”
Sci. Adv.
8
,
eabn4117
(
2022
).