The joint automated repository for various integrated simulations (JARVIS) infrastructure at the National Institute of Standards and Technology is a large-scale collection of curated datasets and tools with more than 80 000 materials and millions of properties. JARVIS uses a combination of electronic structure, artificial intelligence, advanced computation, and experimental methods to accelerate materials design. Here, we report some of the new features that were recently included in the infrastructure, such as (1) doubling the number of materials in the database since its first release, (2) including more accurate electronic structure methods such as quantum Monte Carlo, (3) including graph neural network-based materials design, (4) development of unified force-field, (5) development of a universal tight-binding model, (6) addition of computer-vision tools for advanced microscopy applications, (7) development of a natural language processing tool for text-generation and analysis, (8) debuting a large-scale benchmarking endeavor, (9) including quantum computing algorithms for solids, (10) integrating several experimental datasets, and (11) staging several community engagement and outreach events. New classes of materials, properties, and workflows added to the database include superconductors, two-dimensional (2D) magnets, magnetic topological materials, metal-organic frameworks, defects, and interface systems. The rich and reliable datasets, tools, documentation, and tutorials make JARVIS a unique platform for modern materials design. JARVIS ensures the openness of data and tools to enhance reproducibility and transparency and to promote a healthy and collaborative scientific environment.

The joint automated repository for various integrated simulations (JARVIS)1 is an integrated infrastructure to accelerate materials discovery and design. The JARVIS infrastructure can be separated into electronic structure methods [density functional theory (DFT),2 tight binding,3 dynamical mean field theory (DMFT),4 many-body perturbation theory (PT) (GW),5 and quantum Monte Carlo (QMC)]6,7 classical force-fields (FF),8 machine learning (ML) techniques,9 quantum computation algorithms,10 and experiments.11 JARVIS is motivated by the materials genome initiative (MGI)12 principles of developing open-access databases and tools to reduce the cost and development time of materials discovery, optimization, and deployment. A depiction of the major areas of ongoing research as part of the JARVIS infrastructure is depicted in Fig. 1, and the publicly available JARVIS tools are listed in Table I.

FIG. 1.

Major areas of ongoing research as part of the JARVIS infrastructure.

FIG. 1.

Major areas of ongoing research as part of the JARVIS infrastructure.

Close modal
TABLE I.

A summary of the publicly available JARVIS tools.

The main components of the JARVIS infrastructure (databases, user-friendly web applications, tools) are centrally located at https://jarvis.nist.gov/. The code behind the JARVIS infrastructure is located in a collection of separate repositories [i.e., JARVIS tools, atomistic line graph neural network (ALIGNN), ChemNLP, etc., see Table I] that are centrally located in the main NIST GitHub repository (https://github.com/usnistgov/). Each of these repositories contains a myriad of code with specific separate installation instructions (i.e., using conda environments). For example, JARVIS tools is a software package that contains a plethora of python functions and classes ( tens of thousands of lines of code) used for automated materials simulations, post-processing of calculated data, and dissemination of results. Instructions for basic applications (i.e., example python code needed to screen materials in JARVIS or set up a basic DFT calculation) can be found in the JARVIS documentation (https://pages.nist.gov/jarvis/) or in the example python notebooks (see Sec. VII).

In the first three years since its creation in 2017 (see Fig. 2), JARVIS-DFT grew to include standard material properties1 such as formation energies, band gaps, elastic constants, piezoelectric constants, dielectric constants, and magnetic moments, as well as more exotic properties such as exfoliation energies for van der Waals (vdW) bonded materials,13 spin–orbit coupling (SOC) spillage,14–16 improved meta-GGA band gaps,17 frequency-dependent dielectric functions,17 solar cell efficiency,18 thermoelectric properties,19 and Wannier tight-binding Hamiltonians (WTBH).20,21 Protocols such as automatic k-point convergence22 were developed to improve data reliability. JARVIS force field (JARVIS-FF)23 offers a framework to use classical force fields to compute material properties such as defect formation energies, bulk modulus, and phonon spectra that can be utilized for molecular dynamics runs. Classical force-field inspired descriptors (CFID)24 were introduced in 2018 as a part of JARVIS-ML. CFIDs represent the relation between the chemistry, structure, and charge of a given material. By training CFIDs on JARVIS-DFT data, several classification and regression models have been developed. These include models to predict properties such as band gaps, formation energies, exfoliation energies, magnetic moments, thermoelectric properties, and several other properties.1 

FIG. 2.

An overview of the history of JARVIS-related projects since its creation in 2017 until present.

FIG. 2.

An overview of the history of JARVIS-related projects since its creation in 2017 until present.

Close modal

In this review article, we will give an overview of the several major updates that have been made to the JARVIS infrastructure (see Fig. 2). Recent updates to JARVIS-DFT, which now contains over 80 000 materials, include identifying the anomalous quantum confinement effect (AQCE) in materials,31 screening bulk magnetic topological materials,16 and screening bulk and two-dimensional (2D) superconducting materials.32,33 With regard to other electronic structure methods, tight binding models20,21 and QMC methods6,34,35 have recently been added to the JARVIS infrastructure. JARVIS-ML has been expanded to include the atomistic line graph neural network (ALIGNN)25 model that has been utilized for fast and accurate property and spectra prediction of formation energies, band gaps, electron and phonon density-of-states (DOS),36,37 properties of metal-organic frameworks for carbon capture,38 defect properties,39 and properties of superconductors.32 The ALIGNN model has also been recently used to develop universal force fields for the periodic table (ALIGNN-FF).40 The AtomVision26 model has been added to JARVIS-ML, with the intention of generating and analyzing scanning tunneling microscope (STM) and high angle annular dark field (HAADF) scanning transmission electron microscope (STEM) images to accelerate the interpretation of experimental images. A natural language processing-based library for materials chemistry text data (ChemNLP)27 and tools to perform quantum computation algorithms28 such as variational quantum eigen solver (VQE)41 and variational quantum deflation (VQD)42 have also been added to the JARVIS infrastructure. Finally, several experimental measurements have been performed to validate our computational predictions. In addition to the major recent updates of JARVIS, we will detail large-scale data efforts, educational notebooks, leaderboard, and external outreach.

1. Magnetic topological materials screening

There have been few high-quality magnetic topological insulator and semimetal candidates identified in the literature, which can have potential applications in spintronics and quantum computation. We used a screening criteria based on spin–orbit spillage (SOS), which is a way to quantify spin–orbit-induced band inversion (a property of topological materials) by comparing the wave functions with and without spin–orbit coupling (SOC).14,15,45 This study is an extension of the previous work, which used SOS to screen for bulk nonmagnetic materials and magnetic and nonmagnetic 2D materials.14,15

We used systematic high-throughput DFT calculations to identify magnetic topological materials from the over 40 000 bulk materials in the JARVIS-DFT database.16 First, we screen materials with net magnetic moment > 0.5  μ B and SOS > 0.25, resulting in 25 insulating and 564 metallic candidates. We then perform Wannier tight-binding Hamiltonian (WTBH)-based techniques to calculate Wannier charge centers, Chern numbers, anomalous Hall conductivities (AHC), surface band structures, and Fermi surfaces to determine interesting topological characteristics of the screened compounds. After narrowing down the search, we experimentally synthesized and characterized a few candidate materials such as CoNb3S6 and Mn3Ge.

The full workflow is given in Fig. 3(a), while a full analysis of the data trends for the materials is given in Figs. 3(b)–3(e). A summary of candidate materials with high values of SOS is given in Table II. Further analysis of the electronic band structure (with and without SOC) and k-dependent spin–orbit spillage was conducted. Strong focus was placed on Y3Sn (JVASP-37701), which is a candidate semimetal, and further analysis of the Fermi surface (001) surface band structure, nodal points-lines, and AHC was performed. In addition, strong focus was placed on NaRuO2 (JVASP-8122), which is a candidate Chern insulator, and further analysis of the Wannier charge center and AHC was performed. Further details of computational screening, methodologies, and specific calculated results can be found in Ref. 16.

FIG. 3.

(a) Flowchart depicting the screening process for high-spillage materials, (b) distribution of the spillage for all materials, (c) pie chart displaying high-spillage metals and insulators, (d) distribution of the magnetic moment for high-spillage structures, (e) band gaps computed with Perdew–Burke–Ernzerhof (PBE)43 vs strongly constrained and appropriately normed (SCAN) functionals.44 Reproduced with permission from Choudhary et al., Phys. Rev. B 103, 155131 (2021). Copyright 2021 American Physical Society.

FIG. 3.

(a) Flowchart depicting the screening process for high-spillage materials, (b) distribution of the spillage for all materials, (c) pie chart displaying high-spillage metals and insulators, (d) distribution of the magnetic moment for high-spillage structures, (e) band gaps computed with Perdew–Burke–Ernzerhof (PBE)43 vs strongly constrained and appropriately normed (SCAN) functionals.44 Reproduced with permission from Choudhary et al., Phys. Rev. B 103, 155131 (2021). Copyright 2021 American Physical Society.

Close modal
TABLE II.

A summary of magnetic topological materials: chemical formula (Form.), spacegroup number (Spg), JARVIS-DFT ID (JID), and maximum spillage values. Reproduced with permission from Choudhary et al., Phys. Rev. B 103, 155131 (2021). Copyright 2021 American Physical Society.

Form. Spg JID Spillage
Mn2Sb  P 6 3/mmc  15 693  0.5 
NaMnTe2  P 3 ¯ m 1  16 806  1.04 
Rb3Ga  F m 3 ¯ m  38 248  0.47 
CoSI  F 4 ¯ 3 m  78 508  0.69 
Mn3Sn  P 6 3/mmc  18 209  0.79 
Sc3In  P 6 3/mmc  17 478  1.01 
Sr3Cr  P m 3 ¯ m  37 600  1.01 
Mn3Ge  F m 3 ¯ m  78 840  3.01 
NaRuO2  R 3 ¯ m  8122  0.5 
CoNb3S6  P 6 3 22  21 459  1.03 
Y3Sn  P 6 3/mmc  37 701  0.29 
CaMn2Bi2  P 3 ¯ m 1  18 532  1.17 
Form. Spg JID Spillage
Mn2Sb  P 6 3/mmc  15 693  0.5 
NaMnTe2  P 3 ¯ m 1  16 806  1.04 
Rb3Ga  F m 3 ¯ m  38 248  0.47 
CoSI  F 4 ¯ 3 m  78 508  0.69 
Mn3Sn  P 6 3/mmc  18 209  0.79 
Sc3In  P 6 3/mmc  17 478  1.01 
Sr3Cr  P m 3 ¯ m  37 600  1.01 
Mn3Ge  F m 3 ¯ m  78 840  3.01 
NaRuO2  R 3 ¯ m  8122  0.5 
CoNb3S6  P 6 3 22  21 459  1.03 
Y3Sn  P 6 3/mmc  37 701  0.29 
CaMn2Bi2  P 3 ¯ m 1  18 532  1.17 

2. Anomalous quantum confinement effect

Quantum confinement effects, where the electronic bandgap of a bulk material is lower in magnitude than the bandgap of its 2D counterpart, are prevalent for vdW bonded materials. In contrast, it is possible that this bandgap trend is reversed, resulting in an anomalous quantum confinement effect (AQCE). We calculated the band gaps for bulk and corresponding 2D counterparts using DFT, starting from structures in the JARVIS-DFT database. We used semilocal functionals (OptB88vdW46) for  1000 materials and hybrid functionals (HSE0647 and PBE048) for  50 materials. We identify 65 AQCE candidates with OptB88vdW, but only confirm this peculiar effect with hybrid functionals for 14 materials. Depending on the material system, the bandgap differences (between bulk and monolayer) can range from less than 0.5 to 2 eV. Figure 4 depicts these computed results. A large portion of the AQCE candidates are hydroxides and oxide hydroxides [AlOH2, Mg(OH)2, Mg2H2O3, Ni(OH)2, SrH2O3], alkali-chalcogenides (RbLiS and RbLiSe), and Sb-halogen-chalcogenides (SbSBr, SbSeI).

FIG. 4.

Bulk vs monolayer band gaps using OptB88vdW (OPT), HSE06 and PBE0, demonstrating the AQCE. Reproduced with permission from Choudhary and Tavazza, Phys. Rev. Mater. 5, 054602 (2021). Copyright 2021 American Physical Society.

FIG. 4.

Bulk vs monolayer band gaps using OptB88vdW (OPT), HSE06 and PBE0, demonstrating the AQCE. Reproduced with permission from Choudhary and Tavazza, Phys. Rev. Mater. 5, 054602 (2021). Copyright 2021 American Physical Society.

Close modal

Strikingly, we found examples of 0D and 1D structures included in the 14 AQCE candidates. To quantify the effect of SOC on the band structure predictions and determine the SOS (similar procedure to Sec. II A 1) to screen for topological properties, we performed Perdew–Burke–Ernzerhof (PBE)-based SOC calculations. From these results, we found that SOC does not significantly alter the bandgap, and none of the 14 materials have topological properties. We further investigated the change in electronic structure and bond distances with the goal of understanding the AQCE. We found that in ACQE materials, there is a lowering of the conduction band in the 2D structures with changes in the contribution of the pz orbitals (z is the non-periodic direction). We also find for structures that contain OH, there are significant changes in the H–H bond distances, which can be responsible for the AQCE. More details on this work and a full list of AQCE materials can be found in Ref. 31.

3. Bulk and 2D BCS superconductors

The search for superconducting materials with high transition temperatures (TC) has been a goal of condensed matter physicists50,51 since the discovery of superconductivity in 1911.52 The search for novel superconductors can be expedited with more data-driven and systematic approaches. In order to identify high-TC conventional Bardeen–Cooper–Schrieffer (BCS) superconductors,53,54 a curated database of materials that can assist in screening candidates and an efficient high-throughput workflow to perform electron–phonon coupling (EPC) calculations are both required. The EPC that can be used to reliably predict TC can be obtained from the density functional theory perturbation theory (DFT-PT) calculations.54,55

We combined several approaches, each with different levels of computational expense, to design a high-throughput workflow to discover new BCS superconductors. This workflow, depicted in Fig. 5, begins with a prescreening step to identify materials in the JARVIS-DFT database with a high electron density of states (DOS) at Fermi-level [N(0)] and high Debye temperature (θD). Next, we developed and applied a DFT-PT workflow to obtain the EPC properties and calculated TC using the McMillan–Allen–Dynes formula56 (with initially low k-point and q-point convergence settings). Before applying this DFT-PT workflow to materials from the prescreening step, we validated our methods by benchmarking the workflow for several well-known superconductors. We performed additional k-point and q-point convergence for the top candidates from our prescreening step. As discussed in Sec. III A 3, we used our EPC computed data to develop a deep-learning property prediction model for superconducting properties using the atomistic line-graph graph neural network (ALIGNN).

FIG. 5.

The main workflow used to identify bulk BCS superconductors: (a) statistical distribution of the Debye temperature (in K) and (b) statistical distribution of the electronic density of states (DOS) (in states per eV per number of electrons) at the Fermi level, (c) the likelihood that a material contains a given element for θD greater than 300 K. Reproduced with permission from Choudhary and Garrity, npj Comput. Mater. 8, 244 (2022). Copyright 2022 Nature Publications.

FIG. 5.

The main workflow used to identify bulk BCS superconductors: (a) statistical distribution of the Debye temperature (in K) and (b) statistical distribution of the electronic density of states (DOS) (in states per eV per number of electrons) at the Fermi level, (c) the likelihood that a material contains a given element for θD greater than 300 K. Reproduced with permission from Choudhary and Garrity, npj Comput. Mater. 8, 244 (2022). Copyright 2022 Nature Publications.

Close modal

We specifically prescreened 1736 materials with high Debye temperature and electronic density of states. From our prescreening step, we identified 1736 candidates (high electronic density of states at the Fermi level and high Debye temperature) and performed DFT-PT for 1058 of them to obtain EPC properties and TC. From this, we found 105 stable structures with a TC above 5 K (top candidates are shown in Table III). The superconductors with the highest TC include MoN, VC, Mn, MnN, LaN2, KB6, and TaC. Most notably, we discover a new hexagonal form of MoN, which has not been experimentally observed (in contrast to the superconducting rock salt phase which has a TC of 30 K57,58). Further details of this work on 3D BCS superconductors can be found in Ref. 32.

TABLE III.

JARVIS screening workflow for some of the potential candidate superconductors: (Tc), chemical formula (Form.), spacegroup number (Spg), JARVIS ID (JID), inorganic crystal structure database ID (ICSD)49 wherever available, JARVIS-DFT based formation energy [Eform (eV/atom)] and energy above convex hull [Ehull (eV)]. Reproduced with permission from Choudhary and Garrity, npj Comput. Mater. 8, 244 (2022). Copyright 2022 Nature Publications.

Form. Spg JID ICSD Eform Ehull TC (K)
MoN  187  16 897  187 185  −0.47  0.09  33.4 
CaB2  191  36 379  237 011  −0.25  0.09  31.0 
ZrN  194  13 861  161 885  −1.76  0.18  30.0 
VC  225  19 657  619 079  −0.48  0.06  28.1 
V2CN  123  105 356  ⋯  −0.82  0.11  26.2 
Mn  225  25 344  41 509  0.08  0.08  23.0 
NbFeB  187  4546  ⋯  −0.15  0.39  22.1 
NbVC2  102 190  ⋯  −0.46  0.08  21.9 
ScN  225  15 086  290 470  −2.15  0.0  20.8 
LaN2  118 592  ⋯  −1.05  0.0  20.4 
VRu  221  19 694  106 010  −0.22  0.01  20.3 
TiReN3  161  36 745  ⋯  −0.68  0.10  20.0 
B2CN  51  91 700  183 794  −0.53  0.19  19.4 
KB6  221  20 067  98 987  −0.09  0.0  19.0 
ZrMoC2  166  99 893  ⋯  −0.49  0.08  17.9 
TaB2  191  20 082  30 420  −0.60  0.0  17.2 
NbS  194  18 923  44 992  −0.98  0.05  17.0 
TaVC2  166  101 106  ⋯  −0.54  0.05  16.3 
TaC  187  36 405  ⋯  −0.24  0.40  16.1 
MgBH  11  120 827  ⋯  −0.03  0.11  15.5 
CoN  216  14 724  236 792  −0.02  0.0  15.0 
NbRu3 221  8528  77 216  −0.02  0.19  15.0 
Form. Spg JID ICSD Eform Ehull TC (K)
MoN  187  16 897  187 185  −0.47  0.09  33.4 
CaB2  191  36 379  237 011  −0.25  0.09  31.0 
ZrN  194  13 861  161 885  −1.76  0.18  30.0 
VC  225  19 657  619 079  −0.48  0.06  28.1 
V2CN  123  105 356  ⋯  −0.82  0.11  26.2 
Mn  225  25 344  41 509  0.08  0.08  23.0 
NbFeB  187  4546  ⋯  −0.15  0.39  22.1 
NbVC2  102 190  ⋯  −0.46  0.08  21.9 
ScN  225  15 086  290 470  −2.15  0.0  20.8 
LaN2  118 592  ⋯  −1.05  0.0  20.4 
VRu  221  19 694  106 010  −0.22  0.01  20.3 
TiReN3  161  36 745  ⋯  −0.68  0.10  20.0 
B2CN  51  91 700  183 794  −0.53  0.19  19.4 
KB6  221  20 067  98 987  −0.09  0.0  19.0 
ZrMoC2  166  99 893  ⋯  −0.49  0.08  17.9 
TaB2  191  20 082  30 420  −0.60  0.0  17.2 
NbS  194  18 923  44 992  −0.98  0.05  17.0 
TaVC2  166  101 106  ⋯  −0.54  0.05  16.3 
TaC  187  36 405  ⋯  −0.24  0.40  16.1 
MgBH  11  120 827  ⋯  −0.03  0.11  15.5 
CoN  216  14 724  236 792  −0.02  0.0  15.0 
NbRu3 221  8528  77 216  −0.02  0.19  15.0 

Superconductivity in 2D has attracted attention59–61 due to the potential applications in quantum interferometers, superconducting transistors, and superconducting qubits.62–66 Since very few high TC 2D materials have been computationally or experimentally identified, we decided to extend our high-throughput workflow to 2D superconductors. First, we prescreened over 1000 2D materials in the JARVIS-DFT database on the basis of DOS at the Fermi level, electronic bandgap, and the total magnetic moment. This screening criterion is modified from our workflow on bulk superconductors because the elastic tensor is available only for a limited number of monolayers in JARVIS (it is more computationally expensive to calculate for 2D structures). This modified screening procedure is based on the fact that a candidate 2D superconductor will have a high density of states at the Fermi level (metallic) and zero magnetic moment per unit cell. We additionally found 24 monolayers based on a literature search of bulk and monolayer superconductors. A full depiction of this workflow for 2D materials and a summary of the results are given in Fig. 6.

FIG. 6.

(a) High-throughput workflow used to screen for 2D superconductors with high Tc and (b) and (c) the relationship between λ and ωlog (electron–phonon coupling) for the materials in this study. Reproduced with permission from Wines et al., Nano Lett. 23, 969–978 (2023). Copyright 2023 American Chemical Society.

FIG. 6.

(a) High-throughput workflow used to screen for 2D superconductors with high Tc and (b) and (c) the relationship between λ and ωlog (electron–phonon coupling) for the materials in this study. Reproduced with permission from Wines et al., Nano Lett. 23, 969–978 (2023). Copyright 2023 American Chemical Society.

Close modal

Several nitrides, borides, and carbides are found to be among the 2D materials found to have a high TC. Also, many oxide and niobium-based structures and transition metal dichalcogenides (such as NbS2 and NbSe2) are found to be good candidate superconductors. Similar to a recent computational study,59 we find W2N3 to possess a significantly high Tc of 18.7 K. We observe the highest Tc of 21.8 K for 2D Mg2B4N2, which has been previously undiscovered in 2D and 3D form. In addition, we studied 2D analogs of non-layered materials such as ScC, NbC, B2N, and MgB2 and oxide-based materials such as TiClO, ZrBrO, and NbO2, all of which are superconducting. Further information on 2D superconductors can be found in Ref. 33.

There are two types of tight-binding projects available in JARVIS: (1) Wannier tight-binding Hamiltonians (WTBH)20 and (2) a parametrized universal tight-binding model fit to first principles calculations (ThreeBodyTB.jl).21 The WTBH database provides a computationally efficient way to interpolate and understand the electronic properties of a set of 1771 preselected materials, based on a DFT calculation for each of those materials. The quality of the WTBH is evaluated by comparing the Wannier band structures to directly calculated DFT band structures, including SOC. The WTBH database is used for predicting the AHC, surface band structures, and various topological indexes.

In contrast to the WTBH database, the goal of the ThreeBodyTB.jl parametrized tight-binding model is to produce a tight-binding Hamiltonian and total energy without doing a computationally expensive DFT calculation first. Because tight-binding uses a minimal basis set of atomic orbitals, the calculations are up to three orders of magnitude faster than comparable plane wave DFT calculations, enabling computationally efficient materials prediction. Despite their simplicity, tight-binding approaches incorporate single-particle quantum mechanics as well as electrostatics as a self-consistency step.67–69 This built-in physics can enable improved predictions outside the set of training data, relative to classical force-fields or pure machine-learning approaches.

Unlike typical parametrized tight-binding models that consider only interactions between pairs of atoms when generating the tight-binding Hamiltonian,69,70 our model includes three-body contributions that modify the two-body contributions as well. These extra terms allow for improved transferability as compared to simpler models, at the cost of needing to fit more parameters.

Our fitting procedure is summarized in Fig. 7. For a given elemental or binary system, we first generate a set of standard crystal structures, perform DFT calculations, and fit an initial parameter set to reproduce the band structures and total energies. Then, we employ an active learning strategy to test and improve the model by using the current model to relax randomly generated crystal structures71 and test our tight-binding results vs new DFT calculations. If the results are poor, we add these new structures to our fitting database and repeat the process until the results improve.

FIG. 7.

Overview of the three-body tight binding (TB) model fitting workflow. Reproduced with permission from Garrity and Choudhary, Phys. Rev. Mater. 7, 044603 (2023). Copyright 2023 American Physical Society.

FIG. 7.

Overview of the three-body tight binding (TB) model fitting workflow. Reproduced with permission from Garrity and Choudhary, Phys. Rev. Mater. 7, 044603 (2023). Copyright 2023 American Physical Society.

Close modal

Our current parameter set can predict total energies, volumes, and band gaps with comparable accuracy to machine learning approaches, as well as produce band structures. Importantly, the results generalize to surfaces and vacancy calculations that are completely outside the fitting dataset, as shown in Fig. 8. For testing results and details, see Ref. 21. The julia code with a python interface is available, and an underlying DFT database with over 1 × 106 materials is available in JARVIS-QETB.

FIG. 8.

DFT results vs three-body tight-binding results for unrelaxed (a) point vacancy formation energy (in eV) and (b) (111) surface energies (in J mm 2) of various elemental solids. The results from tight-binding are out-of-sample. Reproduced with permission from Garrity and Choudhary, Phys. Rev. Mater. 7, 044603 (2023). Copyright 2023 American Physical Society.

FIG. 8.

DFT results vs three-body tight-binding results for unrelaxed (a) point vacancy formation energy (in eV) and (b) (111) surface energies (in J mm 2) of various elemental solids. The results from tight-binding are out-of-sample. Reproduced with permission from Garrity and Choudhary, Phys. Rev. Mater. 7, 044603 (2023). Copyright 2023 American Physical Society.

Close modal

A recent effort of the JARVIS infrastructure has been to incorporate many-body methods that go beyond the standard accuracy of DFT for selected materials that have a complicated or correlated electronic structure. Diffusion Monte Carlo (DMC)6 is a many-body correlated electronic structure method that has been applied successfully to the calculation of electronic and magnetic properties of a variety of periodic systems. It involves solving the imaginary-time Schrödinger equation for the near-exact ground state wavefunction using projector techniques (more details can be found in Ref. 6). Although it is a more computationally expensive method, DMC has a weaker dependence on the starting density functional and Hubbard U parameters,72 scales similarly to DFT with respect to the number of electrons in the simulation ( N 3 4 ),6 and can achieve results that are more accurate than DFT.6 

1. Systematic benchmark of 2D CrX3

We designed a workflow that applied a combination of DFT+U and DMC techniques to compute accurate magnetic properties for 2D CrX3 materials (X = I, Br, Cl, F).34 We chose these materials [depicted in Fig. 9(b)] as a case study since they have been experimentally synthesized,73–75 have a nonzero critical temperature,73,76 and have been studied with DFT extensively.76 Our first-principles data can be mapped to a 2D model spin Hamiltonian to extract useful observable quantities such as Tc.77,78 In our case, this Hamiltonian was a function of Heisenberg isotropic exchange (J), easy axis single ion anisotropy (D), and anisotropic exchange (λ). Note, this λ is different than the λ that represents the electron–phonon coupling strength in superconductors (previously mentioned). To obtain J, λ, and D, we performed spin–orbit (noncollinear) DFT+U calculations by rotating the easy axis by 90° and calculating the energy difference between the rotated and non-rotated configurations for ferromagnetic (FM) and antiferromagnetic (AFM) separately. We automated these four calculations using the JARVIS workflow, where four distinct total energy values were obtained for each structure. We benchmarked this for 2D CrI3 (JVASP-76195), CrBr3 (JVASP-6088), CrCl3 (JVASP-76498), and CrF3 (JVASP-153105) using multiple DFT functionals and values of U. The influence of the geometric structure on the magnetic properties was also assessed (see Ref. 34 for more details).

FIG. 9.

(a) High throughput workflow used to calculate the magnetic properties using DFT+U in conjunction with QMC for a 2D material and (b) side and top views of the structure of 2D CrX3 (X = I, Br, Cl, F). Reproduced with permission from Wines et al., J. Phys. Chem. C 127, 1176–1188 (2023). Copyright 2023 American Chemical Society.

FIG. 9.

(a) High throughput workflow used to calculate the magnetic properties using DFT+U in conjunction with QMC for a 2D material and (b) side and top views of the structure of 2D CrX3 (X = I, Br, Cl, F). Reproduced with permission from Wines et al., J. Phys. Chem. C 127, 1176–1188 (2023). Copyright 2023 American Chemical Society.

Close modal

It is possible to systematically improve these results with QMC. This can be accomplished by variationally determining the optimal U value with DMC and computing a statistical bound for the J parameter, which involves DMC calculations for the FM and AFM states separately. In comparison to the previous noncollinear (spin–orbit) DFT calculations, the energies in QMC are from collinear (spin-polarized) calculations. Due to the fact that spin–orbit calculations are limited in DMC at the moment, we are forced to neglect the λ contribution (spin–orbit dependent term) when computing J with DMC. However, since J λ, this has negligible impact on the end result for J. Figure 9(a) displays the full QMC and DFT+U high-throughput workflow that allows us to accurately estimate the 2D critical temperature77 with the extracted J from DMC and the anisotropy parameters (D, λ) from DFT+U (at the optimal U value determined from DMC). We estimated a maximum value of 43.56 K for the Tc of CrI3 and 20.78 K for the Tc of CrBr3. Additionally, we present a comparison between the magnetic moments and spin-density computed with DFT+U and DMC. For more details of this work, see Ref. 34.

2. Structure and phase stability of 2D 1T- and 2H-VSe2

Throughout the theoretical and experimental landscape, there have been controversies involving 2D VSe2 such as reports of near-room temperature ferromagnetism (Curie temperature ranging from 291 to 470 K).79–82 A coupling of structural parameters to magnetic properties is a likely cause for the discrepancies in experimental and calculated results79–83 for monolayer VSe2 in the T [octahedral phase (1 T)-centered honeycombs] phase and the H [the trigonal prismatic (2H)-hexagonal honeycombs] phase. These structures are shown in the insets of Fig. 10. Both the T- and H-phases have a close lattice match and similar total energies, which makes it a challenge to distinguish which phase is being experimentally observed.79,80,84,85 In order to resolve the discrepancies in geometric properties and relative phase stability of 2D VSe2 (T- and H-phase), we used a combination of DFT, DMC, and a newly developed surrogate Hessian line-search geometric optimization tool.86 

FIG. 10.

Deviation of the structural properties [lattice constant (a) and V–Se distance ( d V Se)] compared to the DMC computed structural properties for (a) T–VSe2 and (b) H–VSe2 and (c) the deviation of T–H energy compared to the DMC computed T–H energy (ET−H) for different DFT functionals (U = 2 eV), where the DMC error bar (standard error about the mean) is indicated by red bars. The side and top view of the geometric structure is shown in the insets. Reproduced with permission from Wines et al., J. Phys. Chem. Lett. 14, 3553–3560 (2023). Copyright 2023 American Chemical Society.

FIG. 10.

Deviation of the structural properties [lattice constant (a) and V–Se distance ( d V Se)] compared to the DMC computed structural properties for (a) T–VSe2 and (b) H–VSe2 and (c) the deviation of T–H energy compared to the DMC computed T–H energy (ET−H) for different DFT functionals (U = 2 eV), where the DMC error bar (standard error about the mean) is indicated by red bars. The side and top view of the geometric structure is shown in the insets. Reproduced with permission from Wines et al., J. Phys. Chem. Lett. 14, 3553–3560 (2023). Copyright 2023 American Chemical Society.

Close modal

In Fig. 10, DFT benchmarking results are shown for multiple DFT functionals (with and without U correction) for structural parameters such as lattice constant (a) and V–Se distance ( d V Se) and the relative energy between the T- and H-phase (ET−H). We observe a large deviation between DFT methods, which indicates the need to incorporate more accurate theories such as DMC. Using DMC, we computed the lattice constants to be 3.414(12) and 3.335(8) Å for T–VSe2 and H–VSe2, respectively. We also computed the V–Se distance to be 2.505(7) and 2.503(5) Å for T–VSe2 and H–VSe2. We find the DMC relative energy to be 0.06(2) eV per formula unit, which indicates that the H-phase is energetically more favorable than the T-phase in freestanding form. Using the DMC potential energy surface, we estimated a phase diagram between the phases and found that applying small amounts of strain can induce a phase transition. We also computed the magnetic moments and spin densities with DMC and find substantial differences between DMC and DFT+U. More detailed information on this study can be found in Ref. 35.

Non-Euclidean graphs are increasingly being used to represent crystal structures in deep-learning models. In comparison to composition-only based descriptors, the graph representation preserves the bond connectivity of atoms. Graph neural networks (GNN) are deep-learning frameworks that perform inference on graph data structures, and several high-performing GNN models have been proposed for the prediction of material properties, including but not limited to: SchNet,87 crystal graph convolutional neural networks (CGCNN),88 improved Crystal Graph Convolutional Neural Networks (iCGCNN),89 materials graph network (MEGNet),90 and OrbNet.91 In these frameworks, the graph nodes represent atoms and encode for elemental features, while the edges represent bonds and encode for bond distances. Therefore, only pairwise interactions are explicitly encoded in the materials representation. Through the use of multiple graph convolutional layers in the neural network, nodes (atoms) are updated based on neighboring states, allowing for an implicit handling of many-body interactions.25 

However, several material properties, for example, those related to wave-like electronic and phononic states, are influenced by local geometric distortions captured by changes in bond angles. In order to explicitly encode bond angles and three-body configurations of atoms, we introduced the atomistic line graph neural network (ALIGNN)25 into the suite of JARVIS tools. Following the original work on line graph neural networks by Chen et al.,92 the ALIGNN framework sequentially updates two graph representations: (1) the crystal graph with nodes representing atoms and edges representing bonds, and (2) the line graph built from the crystal graph with nodes representing bonds and edges representing bond pairs sharing a common atom or triplets of atoms. Note that the edges of the crystal graph and nodes of the line graph share the same latent representation. Figure 11 depicts how the compositional and structural features of a material are encoded in both graph representations. The atomistic feature set (νi for the ith atom) describing a crystal graph node includes the following elemental descriptors: electronegativity, covalent radius, group number, block, valence electron count, atomic volume, first ionization energy, and electron affinity.25 The bond features (εij for pairs of atoms i and j) are the bond distances, represented using a radial basis function (RBF) expansion. Finally, the triplet features (tijk for set of atoms i, j and k) are an RBF expansion of the bond angle cosines.

FIG. 11.

(a) A schematic of the encoding of the crystal and line graphs for Mg2Si. (b) Depicts the crystal graph, where the nodes represent the atomic sites and include an atomic feature set [ionization energy (E), volume per atom ( V 0 i), electronegativity (χ)]. Bonds are represented by edges in the crystal graph, and bond distances (rij) are represented by the edge features, which use a radial basis function (RBF) to encode rij in the model. (c) Depicts the line graph (constructed from the previous crystal graph), where the bonds of the structure are now represented by the nodes. Pairs of bonds with an atom in common (“triplets”) featurized by the bond angles are represented by the edges, which are also encoded using a RBF. Reproduced with permission from Choudhary and DeCost, npj Comput. Mater. 7, 185 (2021). Copyright 2021 Nature Publications.

FIG. 11.

(a) A schematic of the encoding of the crystal and line graphs for Mg2Si. (b) Depicts the crystal graph, where the nodes represent the atomic sites and include an atomic feature set [ionization energy (E), volume per atom ( V 0 i), electronegativity (χ)]. Bonds are represented by edges in the crystal graph, and bond distances (rij) are represented by the edge features, which use a radial basis function (RBF) to encode rij in the model. (c) Depicts the line graph (constructed from the previous crystal graph), where the bonds of the structure are now represented by the nodes. Pairs of bonds with an atom in common (“triplets”) featurized by the bond angles are represented by the edges, which are also encoded using a RBF. Reproduced with permission from Choudhary and DeCost, npj Comput. Mater. 7, 185 (2021). Copyright 2021 Nature Publications.

Close modal

The ALIGNN model was first applied to predict 52 solid-states and molecular properties, including formation energy, elastic constants, electronic band structure attributes, dielectric constants, and thermoelectric coefficients.25 In almost every task, the ALIGNN model outperformed classical force-field inspired descriptors (CFID)24 and the original CGCNN model88 by yielding a lower mean absolute error (MAE) for predictions using comparable or improved training speed. In comparison to 18 other machine learning algorithms, the ALIGNN model also yields the lowest prediction MAE error for several tasks including bandgap and formation energy prediction, as documented on the matbench website.93 Since then, the ALIGNN model has also been used to guide new materials searches in the realm of metal-organic frameworks for carbon capture,38 defect properties,39 and high-Tc conventional superconductors.32 

Since the initial presentation of the ALIGNN model in 2021,25 additional models built from ALIGNN have been introduced. ALIGNN-d is an extension of the ALIGNN representation to explicitly include four-body dihedral angles, which was successfully used to predict the peak location and intensity in the optical spectra of Cu(II)-aqua complexes.94 The de-ALIGNN model introduced by Gong et al.95 concatenates global descriptors of the material, such as average bond length or lattice parameters, to the learned features in the ALIGNN representation. Over 13 property prediction tasks, only two phonon-related tasks, phonon internal energy and heat capacity, showed large (>10%) prediction improvement in the de-ALIGNN model vs the original ALIGNN model. In Sec. III A 1, however, we show that phonon properties can be quickly and accurately predicted using ALIGNN through a direct prediction of the phonon density-of-states.37 

In Secs. III A 1–III A 3, we will discuss specialized uses of ALIGNN to predict defect properties, spectral properties, and forces.

1. ALIGNN-spectra

Thus far, the performance of ALIGNN has mainly been discussed in terms of scalar material property predictions, but the model has also been applied to predict spectral or frequency-dependent properties. The latter task requires multi-output predictions, which is relatively less well-developed for machine learning algorithms. Kaundiya et al. first extended the ALIGNN model to enable multiple output features for the prediction of the electronic density-of-states (DOS).36 Two ALIGNN models trained using different representations of the DOS were compared: (1) a discretized electronic DOS with 300 evenly spaced frequency bins (D-ALIGNN), and (2) a low-dimensional representation of the electronic DOS generated using an autoencoder network with a latent dimensionality of 8, 12, 16, or 20 features (AE-ALIGNN). The D-ALIGNN model slightly outperformed the AE-ALIGNN model, but both yielded good prediction accuracy with over 80% of test samples showing an MAE of less than 0.2 states per eV per electron.

The phonon DOS is the subject of a later work, which further emphasizes DOS-derived properties obtained using a weighted integration of the phonon DOS.37 These thermal and thermodynamic properties include the heat capacity ( C V), vibrational entropy ( S vib), and the phonon-isotope scattering rate ( τ i 1). The phonon DOS ALIGNN model was trained on a database of 14 000 DFT-computed phonon spectra calculated using the finite-difference method.1 As shown in the histogram and example spectra of Fig. 12(a), the spectra in the test set are concentrated at low prediction error levels, and the ALIGNN model does a good job of capturing the location of peaks and general distribution of phonon modes, although the shape of the peaks is often altered. The ALIGNN phonon DOS predictions yield highly accurate estimates of the DOS-derived properties with correlation coefficients between ALIGNN and DFT values greater than 0.97 for all properties of interest. A general conclusion shown in this work is that a DOS-mediated approach outperforms a direct deep-learning approach for phononic properties. In other words, calculating properties like C V and S vib from the ALIGNN-predicted phonon DOS yields higher accuracy than training an ALIGNN model to predict C V or S vib directly. More details can be found in Ref. 37.

FIG. 12.

Performance of the ALIGNN phonon density-of-states (DOS) model. Panel (a) shows the distribution of ALIGNN-predicted phonon density of states (DOS) with respect to mean absolute error (MAE), indicating that 78% of samples show an MAE of less than 0.086. The example spectra below show the ALIGNN prediction (colored) against the DFT spectrum (black) to highlight the types of prediction errors that occur at each MAE level. In panels (b) and (c), we show that the room temperature heat capacity ( C V) and vibrational entropy ( S vib) derived from the ALIGNN phonon DOS closely corresponds to the target DFT-derived values. Reproduced with permission from Gurunathan et al., Phys. Rev. Mater. 7, 023803 (2023). Copyright 2023 American Physical Society.

FIG. 12.

Performance of the ALIGNN phonon density-of-states (DOS) model. Panel (a) shows the distribution of ALIGNN-predicted phonon density of states (DOS) with respect to mean absolute error (MAE), indicating that 78% of samples show an MAE of less than 0.086. The example spectra below show the ALIGNN prediction (colored) against the DFT spectrum (black) to highlight the types of prediction errors that occur at each MAE level. In panels (b) and (c), we show that the room temperature heat capacity ( C V) and vibrational entropy ( S vib) derived from the ALIGNN phonon DOS closely corresponds to the target DFT-derived values. Reproduced with permission from Gurunathan et al., Phys. Rev. Mater. 7, 023803 (2023). Copyright 2023 American Physical Society.

Close modal

2. ALIGNN-FF

In Secs. III A and A 1, we described the application of the ALIGNN model for scalar and vector data, which are graph level outputs. Node level outputs such as forces, charges, and magnetic moments are the motivation for which the ALIGNN-atomwise model was developed. Specifically, for atomwise properties such as forces, they should be derivatives of energy and should be equivariant96 as the material system is rotated. For this task, we developed the ALIGNN force field (ALIGNN-FF)40 to treat diverse crystals (chemically and structurally) with a combination of 89 periodic table elements. The entire JARVIS-DFT dataset was used to train ALIGNN-FF, which consists of 4 × 106 energy-force entries (where 307 113 are taken).

Machine learning force fields (MLFFs) are a useful tool for the simulations of solids at a large scale. Previously, MLFFs have been designed for specific chemical environments and are not usually able to be transferred to chemistries that differ from the training set. In recent years, efforts to develop a universal interatomic potential that is generalizable to diverse chemistries have been successful [i.e., M3GNET (MatGL)97 and GemNet-OC.98] We demonstrate ALIGNN-FF applicability beyond specific system-types, as it is built to predict atomistic properties for solids made of any combination of 89 periodic table elements. It was validated on predictions of properties such as lattice constants, and energy-volume curve. As an example, Fig. 13 shows ALIGNN-FF ability to distinguish the polymorphs of various compounds (Si, SiO2, Ni3Al, and vdW-bonded MoS2).

FIG. 13.

ALIGNN-FF computed energy-volume curves for (a) Si, (b) SiO2, (c) Ni3Al, and (d) MoS2, with the ultimate goal of distinguishing polymorphs. Reproduced with permission from Choudhary et al., Digital Discovery 2, 346–355 (2023). Copyright 2023 Royal Society of Chemistry.

FIG. 13.

ALIGNN-FF computed energy-volume curves for (a) Si, (b) SiO2, (c) Ni3Al, and (d) MoS2, with the ultimate goal of distinguishing polymorphs. Reproduced with permission from Choudhary et al., Digital Discovery 2, 346–355 (2023). Copyright 2023 Royal Society of Chemistry.

Close modal

ALIGNN-FF can also be used for fast optimization of atomic structures and structure prediction using evolution algorithms such as genetic algorithms. When compared to DFT and embedded-atom method (EAM) force fields, ALIGN-FF produced very similar equation of state curves. Moreover, ALIGNN-FF was used to optimize crystal structures in the crystallography open database (COD) as well as in the JARVIS database. For additional testing, ALIGNN-FF was used along with a genetic algorithm to predict the convex hull of a Ni–Al alloy system. Promisingly, the resulting convex hull reproduced the expected low energy structures without generating any unphysical low energy structures for the Ni–Al phase diagram. As timing analysis shows that ALIGNN-FF is over 100 times faster than DFT methods, it can be used as pre-structure-optimizer before carrying out DFT calculations. More details on ALIGNN-FF can be found in Ref. 40.

3. ALIGNN-superconducting

In order to accelerate the initial BCS-inspired screening and direct computation of the electron–phonon coupling (EPC) parameters (see Sec. II A 3), we developed deep learning tools (trained on JARVIS-DFT data) for direct property prediction from an arbitrary crystal structure. The BCS prescreening step is far less computationally expensive than a full EPC calculation, but still requires a DFT calculation for the DOS at the Fermi level and θD, which can still require significant computational expense. The results of these deep learning models for DOS and θD (on 5% held-out test sets) are depicted in Figs. 14(a) and 14(b). Additionally, we developed machine-learning models to directly predict EPC properties trained exclusively on our DFT-PT calculations in Ref. 32. We used two methods (CFID and ALIGNN) and trained models for TC directly, in addition to training models for the EPC parameters (ωlog and λ). It is important to note that often deep learning models require much larger amounts of data. Although this is the case, our result show preliminary success with a smaller dataset, which is continually growing.

FIG. 14.

ALIGNN performance on a 5% test set for (a) Debye temperature and (b) electronic DOS. CFID [(c)–(e)] and ALIGNN [(f)–(h)] performance on a 5% test set for DFT computed TC, ωlog, and λ. Performance of a direct TC prediction (red), TC prediction using the Eliashberg function (black), and TC prediction utilizing the direct prediction of ωlog and λ and then using McMillan–Allen–Dynes formula56 (green) are shown in (f). Reproduced with permission from Choudhary and Garrity, npj Comput. Mater. 8, 244 (2022). Copyright 2022 Nature Publications.

FIG. 14.

ALIGNN performance on a 5% test set for (a) Debye temperature and (b) electronic DOS. CFID [(c)–(e)] and ALIGNN [(f)–(h)] performance on a 5% test set for DFT computed TC, ωlog, and λ. Performance of a direct TC prediction (red), TC prediction using the Eliashberg function (black), and TC prediction utilizing the direct prediction of ωlog and λ and then using McMillan–Allen–Dynes formula56 (green) are shown in (f). Reproduced with permission from Choudhary and Garrity, npj Comput. Mater. 8, 244 (2022). Copyright 2022 Nature Publications.

Close modal

The performance of the CFID-based predictions is shown in Figs. 14(c)–14(e) for TC, ωlog, and λ, and the performance of the ALIGNN-based predictions is shown in Figs. 14(f)–14(h). It is clear that ALIGNN outperforms CFID in terms of computing ωlog, while the MAE for TC and λ are quite similar, indicating that it is easier to learn ωlog. By computing ωlog and λ with ALIGNN and plugging the quantities into the McMillan–Allen–Dynes equation,56, TC is predicted with an MAE of 1.77 K. Alternatively, we attempted to compute TC from the ALIGNN predicted Eliashberg function. We observed that the ALIGNN model can capture the peaks of the Eliashberg function. From this alternative method, we predict a TC with an MAE of 1.39, which has over a 24% improvement over the direct ALIGNN prediction. This implies that in comparison to direct ML predictions of properties, learning more fundamental physics-based quantities (such as the Eliashberg function) can be helpful for deep learning approaches with smaller amounts of data. Additionally, we used this superconducting ALIGNN model in conjunction with a generative diffusion model (crystal diffusion variational autoencoder99) to inversely design new superconducting materials.100 More information on the ALIGNN superconducting model can be found in Ref. 32.

FIG. 15.

Schematic of select capabilities in the AtomVision package. (a) An example 2D crystal structure is sampled from the JARVIS-2D DFT database1 and a Rutherford scattering contrast model is applied to produce a synthetic HAADF-STEM image (b). Next, image analysis tasks implemented in the package are demonstrated, including: (c) localizing atom positions, (d) generating a non-Euclidean graph over the image, and (e) reconstructing the image from a low-dimensional representation using an autoencoder. Reproduced with permission from Choudhary et al., J. Chem. Inf. Model. 63(6), 1708–1722 (2023). Copyright 2023 American Chemical Society.

FIG. 15.

Schematic of select capabilities in the AtomVision package. (a) An example 2D crystal structure is sampled from the JARVIS-2D DFT database1 and a Rutherford scattering contrast model is applied to produce a synthetic HAADF-STEM image (b). Next, image analysis tasks implemented in the package are demonstrated, including: (c) localizing atom positions, (d) generating a non-Euclidean graph over the image, and (e) reconstructing the image from a low-dimensional representation using an autoencoder. Reproduced with permission from Choudhary et al., J. Chem. Inf. Model. 63(6), 1708–1722 (2023). Copyright 2023 American Chemical Society.

Close modal

The AtomVision library is designed to be a general toolkit for both generating and analyzing image databases.26 Currently, the library implements contrast models that can be used to simulate scanning tunneling microscope (STM) and high angle annular dark field (HAADF) scanning transmission electron microscope (STEM) images given the crystal structure, and easily generate databases of simulated atomistic images. The STM images are computed using the Tersoff–Hamann formalism, which models the STM tip as an s-wave spherical state.101 The HAADF STEM images are simulated using a convolution approximation often applied to thin film samples. This method convolves a point-spread function centered around the probe with a transmission function that considers the atomic number of the imaged specimen.102 The atomic number (Z) dependence of the intensity of the imaged atom is roughly proportional to Z2, as predicted by Rutherford scattering.103 Relevant images can also be curated from the literature through integration with the ChemNLP natural language processing package27 (more information in Sec. III C). In contrast to other image datasets, which focus on a specific chemistry, the AtomVision package prioritizes chemical and structural diversity.

Numerous analysis tools are also provided, primarily based on machine learning methods. Although the datasets published with the package are focused on STM and STEM images, the analysis scripts can be used to easily train deep learning methods on any user-provided image data by simply providing directory paths for the training and test set of images.

First, the t-distributed stochastic neighbor embedding (t-SNE)104 is implemented, which performs a dimensionality reduction in the high-dimensional image data, allowing the spread of samples (images) to be visualized in a two- or three-dimensional plot. The Euclidean distance between data points in a t-SNE plot relates to their similarity; however, the distances can only be interpreted qualitatively. Images that cluster together in the plot will tend to be more similar in their featurization, be it pixel intensity, red, green, blue triplets, or graph representations of images, which will be described in the next paragraph.

One critical image analysis task that is implemented in AtomVision, known as segmentation, consists of classifying pixels based on whether they compose the background or an object of interest. We utilize the U-Net pre-trained model105 to distinguish atoms from background in the atomistic images. After this pixelwise classification, we can identify atom positions as well as characteristics of their intensity peak (e.g., peak width, maximum intensity) using a blob detection method implemented in the scikit-learn package.106 With the atom positions identified, it is then possible to construct a non-Euclidean graph representation of the atomistic image. The atom peaks in the image become the nodes, which are featurized using the blob characteristics described above, and edges (representing bond vectors) are formed between atoms using the k–d tree nearest neighbor search algorithm. An additional line graph can then be constructed, as in the ALIGNN model, where bond vectors are the graph nodes and bond angles are the graph edges.

There are then two main representations of images in the AtomVision package: 2D arrays of pixel intensities, and the graph representation. The AtomVision package allows the user to train neural networks based on either image representation to perform tasks such as image classification. The pixel-based data can be used to train convolutional neural networks (CNN) based on popular frameworks that include VGG,107 ResNet,108 and DenseNet109 amongst others that are included in the AtomVision package. Similarly, the graph-based data can be used to train graph neural networks like the ALIGNN method described in Sec. III A. We demonstrate both model types in an image classification task performed on the HAADF STEM image dataset in which we train a model to classify images into the 2D Bravais lattice type of the material in the image. In this particular test, the best performing CNN was the DenseNet with a classification accuracy of 83%, while the ALIGNN classifier had an accuracy of 78%.

The pixelated images in the atomistic image datasets are described by 50 176 pixel intensities, and therefore exist in a very high-dimensional feature space. The manifold hypothesis suggests that the underlying structure of the data can be described by relatively fewer dimensions using feature extraction methods. The AtomVision package facilitates training and usage of an autoencoder network, which can create a low-dimensional representation of the image and then reconstruct the pixelated image from that latent representation.

The final functionality currently implemented in the AtomVision package is a super-resolution generative adversarial network (SRGAN) model,110 which can upsample a low-resolution image to produce a high-resolution image. The SRGAN uses two separate deep learning models that compete with each other during optimization: (1) the generator produces a super-resolution image by first performing a feature extraction step, akin to the autoencoder model, and then interpolating within the learned latent representation; and (2) the discriminator should classify the image as either real or generated. As a result of the competition loop, the generator learns to “trick” the discriminator model with increasingly realistic super-resolution images. We apply this model to the atomistic images and demonstrate a successful conversion of a low resolution image (64 × 64 pixels) to a high resolution image (256 × 256 pixels), showing the same image window. Additional information about the AtomVision package can be found in Ref. 26 (Fig. 15).

Much of the data on materials science is available in text format in the form of articles that are not easily amenable to standard automated analysis. To address this barrier, we developed ChemNLP,27 a library that utilizes natural language processing (NLP) for chemistry and materials science data. Currently, ChemNLP is based on publicly available platforms such arXiv (https://arxiv.org/), Pubchem (https://pubchem.ncbi.nlm.nih.gov/) datasets and Huggingface (https://huggingface.co/)111 libraries.

ChemNLP organizes the NLP data and tools for materials chemistry application in a format suitable for model training. In addition to data curation, it allows the integration of useful analyses, such as (1) classifying and clustering texts based on their categories, (2) named entity recognition for large-scale text-mining, (3) abstractive summarization for generating titles of articles from abstracts, (4) text generation for suggesting abstracts from titles, (5) integration with the density functional theory datasets for identification of potential candidate materials, and (6) web-interface development for text and reference query. A schematic of the ChemNLP library is given in Fig. 16.

FIG. 16.

A schematic overview of ChemNLP. The goal of ChemNLP is to provide a software toolkit with integrated dataset and comprehensive AI/ML tools for expanding the natural language processing technique applications for tasks such as text classification, clustering, named entity recognition, abstractive summarization, and text generation.

FIG. 16.

A schematic overview of ChemNLP. The goal of ChemNLP is to provide a software toolkit with integrated dataset and comprehensive AI/ML tools for expanding the natural language processing technique applications for tasks such as text classification, clustering, named entity recognition, abstractive summarization, and text generation.

Close modal

ChemNLP uses several conventional machine learning algorithms as well as state of the art transformer models for comparison and validation. Some of the algorithms in ChemNLP include support vector machines, random forest, graph neural networks, Google's T5,112 OpenAI's GPT-2 (Ref. 113), and Meta AI's OPT114 transformer models, all of which are fine-tuned on materials chemistry text data. The web-app of ChemNLP allows for the searching of various text information (such as material properties, synthesis procedure, etc.) given the chemistry information (stoichiometry). As an application of transformer models, ChemNLP showed that fine-tuning general large language models (LLMs) for abstract to title and vice versa can result in an improvement in performance compared to the original pre-trained model.

Specifically, we applied text classification models for the arXiv and PubChem datasets where we chose title, abstracts, and titles along with abstracts to classify the articles. We used ML algorithms such as random forest, support vector machine, logistic regression, and graph neural network and found that the highest classification accuracy (91.3% for arXiv and 97.6% for PubChem) was achieved for linear support vector machine. In order to mine text and extract meaningful information, named entity recognition or token classification can be used. Information such as material name, sample descriptor, symmetry label, synthesis/characterization method, property, and application can be extracted and utilized. We used the MatScholar115 dataset to train a transformer model with XLNet116 and applied it to the arXiv titles and abstracts and full texts, where we found the F1 score to be 87%. With regard to text-to-text generation models, we focused on abstract summarization (creating a title from the abstract) and text generation (generating an abstract from the title). We used a pre-trained T5112 model for abstract summarization and fine-tuned it for the arXiv dataset. As a metric of success, we used the ROGUE (recall-oriented understudy for gisting evaluation) score. For the fine-tuned T5 model, we obtained a ROGUE score of 46.5% (as opposed to the 30.8% of the pre-trained model). To generate the abstract from the title, we fine-tuned a pre-trained OPT114 model and obtained a ROGUE score of 37%. This proves that ChemNLP provides a flexible format to fine-tune existing text generation models that may be developed in the future.

Finally, ChemNLP allows a seamless integration of arXiv and DFT databases (i.e., JARVIS-DFT). In our work, we demonstrated how ChemNLP can aid in the discovery of new superconductors by simultaneously searching the JARVIS Superconductor dataset32 and the arXiv dataset. With regard to materials with a Tc above 1 K, we found 635 in the DFT dataset and 1071 chemical formulas in the arXiv dataset, with only 43 common materials. This integration of the literature and calculated results can motivate further screening of potential candidate superconductors. In addition to aiding the search for materials for specific applications, ChemNLP can be used along with DFT databases to generate formatted descriptions of atomic structure information that can be used for training future large language models (i.e., json formatted text). Further details of ChemNLP and the success metrics used (classification accuracy, F1 score, ROGUE score) can be found in Ref. 27.

Uncertainty quantification in ML-based material property prediction is important for assessing the accuracy and reliability of machine learning methods for material property predictions.117,118 For example, if the uncertainty in the prediction is not known or is too large, predictions can be challenged. For this reason, the field of uncertainty quantification for materials AI/ML-based predictions is a field that could benefit from advancements.117,118 Confidence intervals are widely reported for ML predictions, but the evaluation of individual uncertainties on each prediction (prediction intervals) is not as commonly reported. Due to this, JARVIS has focused on individual property uncertainty predictions for ML models.119 To compute individual uncertainties, we specifically used machine learning the prediction intervals directly, Quantile loss function, and Gaussian processes. These uncertainty prediction methods were tested and compared for 12 ML-computed properties. The JARVIS-DFT dataset was used for all training and testing.

In summary, we found that direct modeling of the individual uncertainty is favored due to the fact that the overestimation and underestimation of the errors is minimized in most cases. In addition, it is the easiest method to fit and implement.119 The Quantile method requires the fitting of three different models. Gaussian processes give a reasonable estimate for the prediction intervals, but are overestimated and more time consuming to fit compared to the other methods. Additionally, direct prediction of individual uncertainty has an advantage because it allows the use of any loss function. One caveat is that it requires splitting the data into three parts, which can be a potential issue if the dataset is too small. The codes developed for evaluating the prediction intervals are publicly available within JARVIS-tools. More details of this work can be found in Ref. 119.

The ability to solve quantum chemistry problems is one of the most promising near-term applications of a quantum computer.10 Variational quantum eigen solver (VQE)41 and variational quantum deflation (VQD)42 are Quantum algorithms that have been applied to molecules.41 There is a strong desire to implement these Quantum algorithms for crystals. As a result, we have developed the AtomQC28 package, which adds quantum computation tools to the JARVIS infrastructure.

WTBH approaches were utilized to demonstrate the application of VQE and VQD to compute electronic and phonon properties of various materials, including elemental solids and multi-component systems. For 307 spin–orbit-based electronic WTBHs and 933 finite-difference-based phonon WTBHs, we applied VQE and VQD algorithms. We only deal with the single-particle picture in this study, but we strongly believe our work can set the course for the solution of interacting Hamiltonians. Such interacting Hamiltonians can be obtained from methods such as the dynamical mean-field theory (DMFT) and Green's function with the screened Coulomb potential theory (GW), which can be a much more suitable problem to simulate on a quantum computer. A preliminary workflow that combines the VQD algorithm with DMFT-based solving of the lattice Green's function is provided in this work. Figure 17 shows the entire quantum computation workflow. These WTBH solvers can be used to test other various Quantum algorithms and are publicly available. More information on AtomQC can be found in Ref. 28.

FIG. 17.

The steps used in predicting phonon and electron properties of a material on a quantum computer. Reproduced from with permission from Choudhary, J. Phys.: Condens. Matter 33, 385501 (2021). Copyright 2021 IOP Science Publishing.

FIG. 17.

The steps used in predicting phonon and electron properties of a material on a quantum computer. Reproduced from with permission from Choudhary, J. Phys.: Condens. Matter 33, 385501 (2021). Copyright 2021 IOP Science Publishing.

Close modal

Although majority of the data in JARVIS originates from computation, we use data validation and benchmarking with experiments whenever applicable. In some cases, we obtain experimental data from the literature or from the in-house standard reference material (SRM) data at NIST. In other cases, we perform our own experiments along with our computational efforts. This experimental data include XRD and neutron diffraction patterns, CO2 adsorption isotherms,120 magnetic susceptibility measurements,33 spectroscopic ellipsometry dielectric functions, Raman spectra, STM/STEM images, and transport measurements.

Most recently and notably, we have conducted our own experiments for magnetic topological materials and 2D superconductors. With regard to magnetic topological materials, we measured the Anomalous Hall effect of CoNb3S6 and conducted inverse spin-Hall signal measurements for Mn3Ge, two materials theoretically predicted to be topological materials in Ref. 16 (see Sec. II A 1). We performed zero-field-cooled magnetometry experiments to determine the critical temperature of selected 2D superconductors to verify the theoretical predictions in Ref. 33 (see Sec. II A 3). We conducted these experiments for layered 2H-NbSe2, 2H-NbS2, FeSe, and ZrSiS. Figure 18 depicts the measured magnetic susceptibility (using a magnetic field strength of 0.01 T) as a function of temperature. We observe that out of these layered materials, 2H-NbSe2 has a Tc of 8.3 K, 2H-NbS2 has a Tc of 7.1 K, FeSe has a Tc of 7.5 K, and ZrSiS does not have a superconducting transition due to the measured decreasing magnetic susceptibility with increasing temperature. We discuss these measurements within the context of our DFT-computed results in Ref. 33.

FIG. 18.

Experimental DC magnetic susceptibility as a function of temperature (used to determine Tc) for layered: (a) 2H NbSe2, (b) 2H-NbS2, (c) FeSe, and (d) ZrSiS. Reproduced with permission from Wines et al., Nano Lett. 23, 969–978 (2023). Copyright 2023 American Chemical Society.

FIG. 18.

Experimental DC magnetic susceptibility as a function of temperature (used to determine Tc) for layered: (a) 2H NbSe2, (b) 2H-NbS2, (c) FeSe, and (d) ZrSiS. Reproduced with permission from Wines et al., Nano Lett. 23, 969–978 (2023). Copyright 2023 American Chemical Society.

Close modal

These experimental datasets are now being integrated in the JARVIS-Leaderboard for benchmarking and validation purpose (see Sec. VIII). Some simulated experimental data, such as XRD patterns, can be computed with JARVIS-Tools. As the experimental datasets in JARVIS are not exceedingly large at the moment, we can currently apply machine learning algorithms on computational data, and in the future, apply the same pipeline to experimental data.

Data in JARVIS are being integrated within large-scale data efforts such as NOMAD121 and OPTIMADE122 for sustainability, and interoperability.

The Open Databases Integration for Materials Design (OPTIMADE) consortium has designed a universal application programming interface (API) to make materials databases accessible and interoperable. The OPTIMADE API has a set of well-defined key-value pairs such as chemical formula name, number of elements, etc., for each atomic structure that allows sending a universal API search for multiple data-efforts. The implementation required a Django-rest-framework integration with specific data-models, pagination, and other specification as detailed in OPTIMADE to be compatible with other infrastructures.

Similarly, NOMAD project allows storage of raw data files and provides several interactive GUI tools. Although JARVIS has its own storage mechanism, having data distributed on platforms such as NOMAD and OPTIMADE allows enhanced transparency, which is essential for large-scale data-driven materials design.

A collection of interactive python notebooks are hosted in the Jarvis-tools-notebooks GitHub repository29 that can be run on a user's local computer or easily through a cloud-based python development environment. All package installation steps are included in the notebooks such that they can be executed and edited in a standalone fashion, allowing users to easily make use of JARVIS models. These interactive notebooks are meant to supply an example calculation that will execute quickly and reproduce a portion of the results in a JARVIS-associated publication. The collection of notebooks include machine learning models that allow users to train and utilize ALIGNN or AtomVision models for material property prediction and image classification. There are also several electronic structure and atomistic calculation notebooks that analyze DFT, tight-binding, or MD outputs to calculate material properties, including elastic properties, spin–orbit spillage, dielectric functions, thermoelectric and photovoltaic properties. Finally, a notebook based on the AtomQC and qiskit123 packages for quantum computation is provided.

The JARVIS-Leaderboard30 is a large-scale benchmarking effort for various computational and experimental methodologies for materials science applications. The main goal of the JARVIS-Leaderboard is to enhance reproducibility and transparency for various methodologies within the materials science field. Although leaderboard efforts have previously been developed for specific applications (i.e., MatBench,93 OpenCatalystProject,124 etc.), there lacks a benchmarking platform with multiple data modalities for perfect and defective materials as well as ease to add new benchmarks. The JARVIS-Leaderboard (https://pages.nist.gov/jarvis_leaderboard/) attempts to bridge the gap between different methods and material classes by allowing users to set up benchmarks and make contributions in the form of datasets, codes, and meta-data submissions (through GitHub actions). These contributions are compared with experimental data where applicable, and the accuracy of each contribution is assigned a “score” (MAE with respect to the ground truth). Some of the categories are: Artificial Intelligence (AI), Electronic Structure (ES), Quantum Computation (QC), and Experiments (EXP).

For AI, various methods (descriptor-based, neural network-based) and data (atomic structure, atomistic images, spectra, text) are benchmarked. For ES, multiple approaches (DFT, QMC, Tight binding, GW), software packages, and pseudopotentials are considered, and the results are compared to experiments whenever applicable. Multiple FF approaches for material property predictions are compared (classical FF, MLFF). For QC, we compare the performance of various quantum algorithms and circuits for Hamiltonian simulations. For experiments, inter-laboratory (round robin) approaches are used. In addition to prediction results, we attempt to capture the underlying software, hardware, and instrumental frameworks to enhance reproducibility and method validation, which can aid in developing new and more reliable techniques. Currently, there are over 1400 user contributions using over 150 different methods, and these numbers are growing rapidly.

In addition to the databases, tools and applications that are part of the JARVIS infrastructure, there has been substantial effort devoted to outreach in the materials science research community. The JARVIS team has annually hosted the Artificial Intelligence for Materials Science (AIMS) and Quantum Matters in Materials Science (QMMS) workshops, where speakers have been invited from academia, government, and industry to discuss key achievements and challenges in the respective fields. Topics presented at AIMS include: dataset and tools for employing AI for materials, integrating experiments with AI techniques, graph neural networks, comparison of AI techniques for materials, the challenges of applying AI to materials, uncertainty quantification and building trust in AI predictions, generative modeling, using AI to develop classical force-fields, natural language processing, and AI-guided autonomous experimentation. Topics presented at QMMS include: discovery and characterization of new materials, optimization of known quantum materials, investigation of defect induced behavior and transitions, electronics, spintronics, quantum memory applications, challenges in applying quantum information systems technologies at industrial scale, and accurate many-body computational methods to treat quantum materials.

The JARVIS team has also organized a series of hands-on workshops at different academic and government institutions, known as JARVIS-Schools. JARVIS-Schools consist of a tutorial and hands-on session to introduce open-access databases and tools for materials-design. These sessions are accompanied by a series of power-point presentations on the core-topics, Google-Collab/Jupyter notebook examples, and discussion. The hands-on session/discussion topics include: electronic structure calculations (DFT, tight-binding, etc.), the density functional theory for predicting properties (e.g., solid-state materials), machine learning (for atomistic, image and text data), quantum computation and its applications to materials, and classical force-field calculations for large-scale properties. An updated calendar of JARVIS events can be found here: https://jarvis.nist.gov/events/.

There are various mechanisms to collect external usage data for JARVIS inside and outside the materials science community. These sources include the number of users registered for the JARVIS API, Google analytics results for viewers, number of citations for papers, downloads of software tools on GitHub, number of views and downloads from the Figshare repository, number of attendees in the AIMS/QMMS workshops and JARVIS-Schools, and number of collaborators developed inside and outside NIST from academia, national labs, and industry. A wide usage of JARVIS resulted in JARVIS being highlighted as a standard platform for materials design in the NIST US CHIPS Acts strategic plan (https://www.nist.gov/chips/implementation-strategy).

All authors thank the National Institute of Standards and Technology for funding, computational, and data-management resources. K.C. thanks the computational support from XSEDE (Extreme Science and Engineering Discovery Environment) computational resources under allocation No. TG-DMR 190095. Contributions from K.C. were supported by the financial assistance Award No. 70NANB19H117 from the U.S. Department of Commerce, National Institute of Standards and Technology.

Certain commercial equipment or materials are identified in this paper to adequately specify the experimental procedures. In no case does the identification imply recommendation or endorsement by NIST, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose. The authors declare no competing interests. Please note that the use of commercial software (VASP) does not imply recommendation by the National Institute of Standards and Technology.

The authors have no conflicts to disclose.

Daniel Wines: Investigation (equal); Methodology (equal); Project administration (equal); Writing – original draft (equal); Writing – review & editing (equal). Ramya Gurunathan: Investigation (equal); Methodology (equal); Project administration (equal); Writing – original draft (equal); Writing – review & editing (equal). Kevin F. Garrity: Investigation (equal); Methodology (equal); Project administration (equal); Writing – original draft (equal); Writing – review & editing (equal). Brian DeCost: Investigation (equal); Methodology (equal); Project administration (equal); Writing – review & editing (equal). Adam J. Biacchi: Investigation (equal); Methodology (equal); Project administration (equal); Writing – review & editing (equal). Francesca Tavazza: Investigation (equal); Methodology (equal); Project administration (equal); Writing – original draft (equal); Writing – review & editing (equal). Kamal Choudhary: Investigation (equal); Methodology (equal); Project administration (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

The data and software mentioned in this review article are available at https://jarvis.nist.gov/ and https://github.com/usnistgov/jarvis. JARVIS-Tools is an open-access software package for atomistic data-driven materials design. JARVIS-Tools can be used for (a) setting up calculations, (b) analysis and informatics, (c) plotting, (d) database development, and (e) web-page development. Software used in workflow tasks for pre-processing, executing, and post-processing include: VASP,125,126 Quantum Espresso,127,128 Wien2k,129 BoltzTrap,130 Wannier90,131 QMCPACK,132,133 LAMMPS,134 scikit-learn,106 TensorFlow,135 LightGBM,136 Qiskit,123 Tequila,137 Pennylane,138,139 Deep Graph Library,140 and PyTorch.141 JARVIS databases such as JARVIS-DFT, FF, ML, WannierTB, Solar, and STM can be downloaded. Raw input and output files can be accessed from JARVIS databases to enhance reproducibility in calculations. Different descriptors, graphs, and datasets for training machine learning models are also included in JARVIS-Tools. Capabilities can be easily be extended to HPC systems (Torque/PBS and SLURM). Documentation for JARVIS-Tools, including installation instructions, can be found here: https://pages.nist.gov/jarvis/.

1.
K.
Choudhary
,
K. F.
Garrity
,
A. C.
Reid
,
B.
DeCost
,
A. J.
Biacchi
,
A. R.
Hight Walker
,
Z.
Trautt
,
J.
Hattrick-Simpers
,
A. G.
Kusne
,
A.
Centrone
et al, “
The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design
,”
npj Comput. Mater.
6
,
1
13
(
2020
).
2.
P.
Hohenberg
and
W.
Kohn
, “
Inhomogeneous electron gas
,”
Phys. Rev.
136
,
B864
B871
(
1964
).
3.
N.
Ashcroft
and
N.
Mermin
,
Solid State Physics
(
Saunders College Publishing, Fort Worth
,
1976
).
4.
D.
Vollhardt
,
K.
Byczuk
, and
M.
Kollar
, “
Dynamical mean-field theory
,” in
Strongly Correlated Systems: Theoretical Methods
, edited by
A.
Avella
and
F.
Mancini
(
Springer
,
Berlin, Heidelberg
,
2012
), pp.
203
236
.
5.
G.
Onida
,
L.
Reining
, and
A.
Rubio
, “
Electronic excitations: Density-functional versus many-body Green's-function approaches
,”
Rev. Mod. Phys.
74
,
601
659
(
2002
).
6.
W. M. C.
Foulkes
,
L.
Mitas
,
R. J.
Needs
, and
G.
Rajagopal
, “
Quantum Monte Carlo simulations of solids
,”
Rev. Mod. Phys.
73
,
33
83
(
2001
).
7.
R. M.
Martin
,
Electronic Structure: Basic Theory and Practical Methods
(
Cambridge University Press
,
2020
).
8.
M. P.
Allen
and
D. J.
Tildesley
,
Computer Simulation of Liquids
(
Oxford University Press
,
2017
).
9.
T.
Hastie
,
R.
Tibshirani
,
J. H.
Friedman
, and
J. H.
Friedman
,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
(
Springer
,
2009
), Vol.
2
.
10.
M. A.
Nielsen
and
I. L.
Chuang
, “
Quantum computation and quantum information
,”
Phys. Today
54
(
11
),
60
(
2001
).
11.
Y.
Leng
,
Materials Characterization: Introduction to Microscopic and Spectroscopic Methods
(
John Wiley and Sons
,
2009
).
12.
J. A.
Warren
, “
The materials genome initiative and artificial intelligence
,”
MRS Bull.
43
,
452
457
(
2018
).
13.
K.
Choudhary
,
I.
Kalish
,
R.
Beams
, and
F.
Tavazza
, “
High-throughput identification and characterization of two-dimensional materials using density functional theory
,”
Sci. Rep.
7
,
1
16
(
2017
).
14.
K.
Choudhary
,
K. F.
Garrity
, and
F.
Tavazza
, “
High-throughput discovery of topologically non-trivial materials using spin-orbit spillage
,”
Sci. Rep.
9
,
8534
(
2019
).
15.
K.
Choudhary
,
K. F.
Garrity
,
J.
Jiang
,
R.
Pachter
, and
F.
Tavazza
, “
Computational search for magnetic and non-magnetic 2d topological materials using unified spin–orbit spillage screening
,”
npj Comput. Mater.
6
,
49
(
2020
).
16.
K.
Choudhary
,
K. F.
Garrity
,
N. J.
Ghimire
,
N.
Anand
, and
F.
Tavazza
, “
High-throughput search for magnetic topological materials using spin-orbit spillage, machine learning, and experiments
,”
Phys. Rev. B
103
,
155131
(
2021
).
17.
K.
Choudhary
,
Q.
Zhang
,
A. C.
Reid
,
S.
Chowdhury
,
N.
Van Nguyen
,
Z.
Trautt
,
M. W.
Newrock
,
F. Y.
Congo
, and
F.
Tavazza
, “
Computational screening of high-performance optoelectronic materials using optb88vdw and TB-mBJ formalisms
,”
Sci. Data
5
,
1
12
(
2018
).
18.
K.
Choudhary
,
M.
Bercx
,
J.
Jiang
,
R.
Pachter
,
D.
Lamoen
, and
F.
Tavazza
, “
Accelerated discovery of efficient solar cell materials using quantum and machine-learning methods
,”
Chem. Mater.
31
,
5900
5908
(
2019
).
19.
K.
Choudhary
,
K. F.
Garrity
, and
F.
Tavazza
, “
Data-driven discovery of 3d and 2d thermoelectric materials
,”
J. Phys.: Condens. Matter
32
,
475501
(
2020
).
20.
K. F.
Garrity
and
K.
Choudhary
, “
Database of Wannier tight-binding Hamiltonians using high-throughput density functional theory
,”
Sci. Data
8
,
106
(
2021
).
21.
K. F.
Garrity
and
K.
Choudhary
, “
Fast and accurate prediction of material properties with three-body tight-binding model for the periodic table
,”
Phys. Rev. Mater.
7
,
044603
(
2023
).
22.
K.
Choudhary
and
F.
Tavazza
, “
Convergence and machine learning predictions of Monkhorst-Pack k-points and plane-wave cut-off in high-throughput DFT calculations
,”
Comput. Mater. Sci.
161
,
300
308
(
2019
).
23.
K.
Choudhary
,
A. J.
Biacchi
,
S.
Ghosh
,
L.
Hale
,
A. R. H.
Walker
, and
F.
Tavazza
, “
High-throughput assessment of vacancy formation and surface energies of materials using classical force-fields
,”
J. Phys.: Condens. Matter
30
,
395901
(
2018
).
24.
K.
Choudhary
,
B.
DeCost
, and
F.
Tavazza
, “
Machine learning with force-field-inspired descriptors for materials: Fast screening and mapping energy landscape
,”
Phys. Rev. Mater.
2
,
083801
(
2018
).
25.
K.
Choudhary
and
B.
DeCost
, “
Atomistic line graph neural network for improved materials property predictions
,”
npj Comput. Mater.
7
,
185
(
2021
).
26.
K.
Choudhary
,
R.
Gurunathan
,
B.
DeCost
, and
A.
Biacchi
, “
AtomVision: A machine vision library for atomistic images
,”
J. Chem. Inf. Model.
63
,
1708
1722
(
2023
).
27.
K.
Choudhary
and
M. L.
Kelley
, “
ChemNLP: A natural language-processing-based library for materials chemistry text data
,”
J. Phys. Chem. C
127
,
17545
17555
(
2023
).
28.
K.
Choudhary
, “
Quantum computation for predicting electron and phonon properties of solids
,”
J. Phys.: Condens. Matter
33
,
385501
(
2021
).
29.
See https://github.com/usnistgov/alignn for “
Jarvis-Tools-Notebooks GitHub Repository
;” accessed 23 February 2023.
30.
K.
Choudhary
,
D.
Wines
,
K.
Li
,
K. F.
Garrity
,
V.
Gupta
,
A. H.
Romero
,
J. T.
Krogel
,
K.
Saritas
,
A.
Fuhr
,
P.
Ganesh
,
P. R. C.
Kent
,
K.
Yan
,
Y.
Lin
,
S.
Ji
,
B.
Blaiszik
,
P.
Reiser
,
P.
Friederich
,
A.
Agrawal
,
P.
Tiwary
,
E.
Beyerle
,
P.
Minch
,
T. D.
Rhone
,
I.
Takeuchi
,
R. B.
Wexler
,
A.
Mannodi-Kanakkithodi
,
E.
Ertekin
,
A.
Mishra
,
N.
Mathew
,
S. G.
Baird
,
M.
Wood
,
A. D.
Rohskopf
,
J.
Hattrick-Simpers
,
S.-H.
Wang
,
L. E. K.
Achenie
,
H.
Xin
,
M.
Williams
,
A. J.
Biacchi
, and
F.
Tavazza
, “
Large scale benchmark of materials design methods
,” arXiv:2306.11688 [cond-mat.mtrl-sci] (
2023
).
31.
K.
Choudhary
and
F.
Tavazza
, “
Predicting anomalous quantum confinement effect in van der Waals materials
,”
Phys. Rev. Mater.
5
,
054602
(
2021
).
32.
K.
Choudhary
and
K.
Garrity
, “
Designing high-Tc superconductors with BCS-inspired screening, density functional theory, and deep-learning
,”
npj Comput. Mater.
8
,
244
(
2022
).
33.
D.
Wines
,
K.
Choudhary
,
A. J.
Biacchi
,
K. F.
Garrity
, and
F.
Tavazza
, “
High-throughput DFT-based discovery of next generation two-dimensional (2d) superconductors
,”
Nano Lett.
23
,
969
978
(
2023
).
34.
D.
Wines
,
K.
Choudhary
, and
F.
Tavazza
, “
Systematic DFT+U and quantum Monte Carlo benchmark of magnetic two-dimensional (2D) CrX3 (X = I, Br, Cl, F)
,”
J. Phys. Chem. C
127
,
1176
1188
(
2023
).
35.
D.
Wines
,
J.
Tiihonen
,
K.
Saritas
,
J. T.
Krogel
, and
C.
Ataca
, “
A quantum Monte Carlo study of the structural, energetic, and magnetic properties of two-dimensional H and T phase VSe2
,”
J. Phys. Chem. Lett.
14
,
3553
3560
(
2023
).
36.
P. R.
Kaundinya
,
K.
Choudhary
, and
S. R.
Kalidindi
, “
Prediction of the electron density of states for crystalline compounds with atomistic line graph neural networks (ALIGNN)
,”
JOM
74
,
1395
1405
(
2022
).
37.
R.
Gurunathan
,
K.
Choudhary
, and
F.
Tavazza
, “
Rapid prediction of phonon structure and properties using the atomistic line graph neural network (ALIGNN)
,”
Phys. Rev. Mater.
7
,
023803
(
2023
).
38.
K.
Choudhary
,
T.
Yildirim
,
D. W.
Siderius
,
A. G.
Kusne
,
A.
McDannald
, and
D. L.
Ortiz-Montalvo
, “
Graph neural network predictions of metal organic framework CO2 adsorption properties
,”
Comput. Mater. Sci.
210
,
111388
(
2022
).
39.
K.
Choudhary
and
B. G.
Sumpter
, “
Can a deep-learning model make fast predictions of vacancy formation in diverse materials?
,”
AIP Adv.
13
,
095109
(
2023
).
40.
K.
Choudhary
,
B.
DeCost
,
L.
Major
,
K.
Butler
,
J.
Thiyagalingam
, and
F.
Tavazza
, “
Unified graph neural network force-field for the periodic table: Solid state applications
,”
Digital Discovery
2
,
346
355
(
2023
).
41.
A.
Peruzzo
,
J.
McClean
,
P.
Shadbolt
,
M.-H.
Yung
,
X.-Q.
Zhou
,
P. J.
Love
,
A.
Aspuru-Guzik
, and
J. L.
O'Brien
, “
A variational eigenvalue solver on a photonic quantum processor
,”
Nat. Commun.
5
,
4213
(
2014
).
42.
O.
Higgott
,
D.
Wang
, and
S.
Brierley
, “
Variational quantum computation of excited states
,”
Quantum
3
,
156
(
2019
).
43.
J. P.
Perdew
,
K.
Burke
, and
M.
Ernzerhof
, “
Generalized gradient approximation made simple
,”
Phys. Rev. Lett.
77
,
3865
3868
(
1996
).
44.
J.
Sun
,
A.
Ruzsinszky
, and
J. P.
Perdew
, “
Strongly constrained and appropriately normed semilocal density functional
,”
Phys. Rev. Lett.
115
,
036402
(
2015
).
45.
J.
Liu
and
D.
Vanderbilt
, “
Spin-orbit spillage as a measure of band inversion in insulators
,”
Phys. Rev. B
90
,
125133
(
2014
).
46.
J.
Klimeš
,
D. R.
Bowler
, and
A.
Michaelides
, “
Chemical accuracy for the van der Waals density functional
,”
J. Phys.: Condens. Matter
22
,
022201
(
2009
).
47.
J.
Heyd
,
G. E.
Scuseria
, and
M.
Ernzerhof
, “
Hybrid functionals based on a screened coulomb potential
,”
J. Chem. Phys.
118
,
8207
8215
(
2003
).
48.
C.
Adamo
and
V.
Barone
, “
Toward reliable density functional methods without adjustable parameters: The PBE0 model
,”
J. Chem. Phys.
110
,
6158
6170
(
1999
).
49.
A.
Belsky
,
M.
Hellenbrandt
,
V. L.
Karen
, and
P.
Luksch
, “
New developments in the inorganic crystal structure database (ICSD): Accessibility in support of materials research and design
,”
Acta Crystallogr., Sect. B: Struct. Sci.
58
,
364
369
(
2002
).
50.
C. P.
Poole
,
H. A.
Farach
, and
R. J.
Creswick
,
Superconductivity
(
Academic Press
,
2013
).
51.
H.
Rogalla
and
P. H.
Kes
,
100 Years of Superconductivity
(
Taylor and Francis
,
2011
).
52.
H. K.
Onnes
,
The Resistance of Pure Mercury at Helium Temperatures
(
Communications from the Physical Laboratory of the University of Leiden
,
1911
), p.
120
.
53.
L. N.
Cooper
and
D.
Feldman
,
BCS: 50 Years
(
World Scientific
,
2010
).
54.
F.
Giustino
, “
Electron-phonon interactions from first principles
,”
Rev. Mod. Phys.
89
,
015003
(
2017
).
55.
M.
Kawamura
,
Y.
Hizume
, and
T.
Ozaki
, “
Benchmark of density functional theory for superconductors in elemental materials
,”
Phys. Rev. B
101
,
134511
(
2020
).
56.
W.
McMillan
, “
Transition temperature of strong-coupled superconductors
,”
Phys. Rev.
167
,
331
(
1968
).
57.
K.
Inumaru
,
T.
Nishikawa
,
K.
Nakamura
, and
S.
Yamanaka
, “
High-pressure synthesis of superconducting molybdenum nitride δ-mon by in situ nitridation
,”
Chem. Mater.
20
,
4756
4761
(
2008
).
58.
S.
Wang
,
D.
Antonio
,
X.
Yu
,
J.
Zhang
,
A. L.
Cornelius
,
D.
He
, and
Y.
Zhao
, “
The hardest superconducting metal nitride
,”
Sci. Rep.
5
,
1
8
(
2015
).
59.
D.
Campi
,
S.
Kumari
, and
N.
Marzari
, “
Prediction of phonon-mediated superconductivity with high critical temperature in the two-dimensional topological semimetal W2N3
,”
Nano Lett.
21
,
3435
3442
(
2021
).
60.
J.
Bekaert
,
A.
Aperis
,
B.
Partoens
,
P. M.
Oppeneer
, and
M. V.
Milošević
, “
Evolution of multigap superconductivity in the atomically thin limit: Strain-enhanced three-gap superconductivity in monolayer MgB 2
,”
Phys. Rev. B
96
,
094510
(
2017
).
61.
S.
Singh
,
A. H.
Romero
,
J.
Mella
,
V.
Eremeev
,
E.
Muñoz
,
A. N.
Alexandrova
,
K. M.
Rabe
,
D.
Vanderbilt
, and
F.
Muñoz
, “
High-temperature phonon-mediated superconductivity in monolayer Mg2B4C2
,”
npj Quantum Mater.
7
,
37
(
2022
).
62.
S.
De Franceschi
,
L.
Kouwenhoven
,
C.
Schönenberger
, and
W.
Wernsdorfer
, “
Hybrid superconductor–quantum dot devices
,”
Nat. Nanotechnol.
5
,
703
711
(
2010
).
63.
M.
Huefner
,
C.
May
,
S.
Kicin
,
K.
Ensslin
,
T.
Ihn
,
M.
Hilke
,
K.
Suter
,
N. F.
de Rooij
, and
U.
Staufer
, “
Scanning gate microscopy measurements on a superconducting single-electron transistor
,”
Phys. Rev. B
79
,
134530
(
2009
).
64.
J.
Delahaye
,
J.
Hassel
,
R.
Lindell
,
M.
Sillanpää
,
M.
Paalanen
,
H.
Seppä
, and
P.
Hakonen
, “
Low-noise current amplifier based on mesoscopic Josephson junction
,”
Science
299
,
1045
1048
(
2003
).
65.
E. J.
Romans
,
E. J.
Osley
,
L.
Young
,
P. A.
Warburton
, and
W.
Li
, “
Three-dimensional nanoscale superconducting quantum interference device pickup loops
,”
Appl. Phys. Lett.
97
,
222506
(
2010
).
66.
X.
Liu
and
M. C.
Hersam
, “
2d materials for quantum information science
,”
Nat. Rev. Mater.
4
,
669
684
(
2019
).
67.
M.
Elstner
,
D.
Porezag
,
G.
Jungnickel
,
J.
Elsner
,
M.
Haugk
,
T.
Frauenheim
,
S.
Suhai
, and
G.
Seifert
, “
Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties
,”
Phys. Rev. B
58
,
7260
(
1998
).
68.
T.
Frauenheim
,
G.
Seifert
,
M.
Elsterner
,
Z.
Hajnal
,
G.
Jungnickel
,
D.
Porezag
,
S.
Suhai
, and
R.
Scholz
, “
A self-consistent charge density-functional based tight-binding method for predictive materials simulations in physics, chemistry and biology
,”
Phys. Status Solidi B
217
,
41
62
(
2000
).
69.
P.
Koskinen
and
V.
Mäkinen
, “
Density-functional tight-binding for beginners
,”
Comput. Mater. Sci.
47
,
237
253
(
2009
).
70.
B.
Hourahine
,
B.
Aradi
,
V.
Blum
,
F.
Bonafé
,
A.
Buccheri
,
C.
Camacho
,
C.
Cevallos
,
M.
Deshaye
,
T.
Dumitrică
,
A.
Dominguez
et al, “
DFTB+, a software package for efficient approximate density functional theory based atomistic simulations
,”
J. Chem. Phys.
152
,
124101
(
2020
).
71.
C. J.
Pickard
and
R. J.
Needs
, “
Ab initio random structure searching
,”
J. Phys.: Condens. Matter
23
,
053201
(
2011
).
72.
S. L.
Dudarev
,
G. A.
Botton
,
S. Y.
Savrasov
,
C. J.
Humphreys
, and
A. P.
Sutton
, “
Electron-energy-loss spectra and the structural stability of nickel oxide: An LSDA+U study
,”
Phys. Rev. B
57
,
1505
1509
(
1998
).
73.
B.
Huang
,
G.
Clark
,
E.
Navarro-Moratalla
,
D. R.
Klein
,
R.
Cheng
,
K. L.
Seyler
,
D.
Zhong
,
E.
Schmidgall
,
M. A.
McGuire
,
D. H.
Cobden
,
W.
Yao
,
D.
Xiao
,
P.
Jarillo-Herrero
, and
X.
Xu
, “
Layer-dependent ferromagnetism in a van der Waals crystal down to the monolayer limit
,”
Nature
546
,
270
273
(
2017
).
74.
Z.
Zhang
,
J.
Shang
,
C.
Jiang
,
A.
Rasmita
,
W.
Gao
, and
T.
Yu
, “
Direct photoluminescence probing of ferromagnetism in monolayer two-dimensional CrBr3
,”
Nano Lett.
19
,
3138
3142
(
2019
).
75.
X.
Cai
,
T.
Song
,
N. P.
Wilson
,
G.
Clark
,
M.
He
,
X.
Zhang
,
T.
Taniguchi
,
K.
Watanabe
,
W.
Yao
,
D.
Xiao
,
M. A.
McGuire
,
D. H.
Cobden
, and
X.
Xu
, “
Atomically thin CrCl3: An in-plane layered antiferromagnetic insulator
,”
Nano Lett.
19
,
3993
3998
(
2019
).
76.
D.
Torelli
,
H.
Moustafa
,
K. W.
Jacobsen
, and
T.
Olsen
, “
High-throughput computational screening for two-dimensional magnetic materials based on experimental databases of three-dimensional compounds
,”
npj Comput. Mater.
6
,
158
(
2020
).
77.
D.
Torelli
and
T.
Olsen
, “
Calculating critical temperatures for ferromagnetic order in two-dimensional materials
,”
2D Mater.
6
,
015028
(
2018
).
78.
J. L.
Lado
and
J.
Fernández-Rossier
, “
On the origin of magnetic anisotropy in two dimensional CrI3
,”
2D Mater.
4
,
035002
(
2017
).
79.
M.
Bonilla
,
S.
Kolekar
,
Y.
Ma
,
H. C.
Diaz
,
V.
Kalappattil
,
R.
Das
,
T.
Eggers
,
H. R.
Gutierrez
,
M.-H.
Phan
, and
M.
Batzill
, “
Strong room-temperature ferromagnetism in VSe2 monolayers on van der Waals substrates
,”
Nat. Nanotechnol.
13
,
289
293
(
2018
).
80.
W.
Yu
,
J.
Li
,
T. S.
Herng
,
Z.
Wang
,
X.
Zhao
,
X.
Chi
,
W.
Fu
,
I.
Abdelwahab
,
J.
Zhou
,
J.
Dan
,
Z.
Chen
,
Z.
Chen
,
Z.
Li
,
J.
Lu
,
S. J.
Pennycook
,
Y. P.
Feng
,
J.
Ding
, and
K. P.
Loh
, “
Chemically exfoliated VSe2 monolayers with room-temperature ferromagnetism
,”
Adv. Mater.
31
,
1903779
(
2019
).
81.
X.
Wang
,
D.
Li
,
Z.
Li
,
C.
Wu
,
C.-M.
Che
,
G.
Chen
, and
X.
Cui
, “
Ferromagnetism in 2d vanadium diselenide
,”
ACS Nano
15
,
16236
16241
(
2021
).
82.
H.-R.
Fuh
,
C.-R.
Chang
,
Y.-K.
Wang
,
R. F. L.
Evans
,
R. W.
Chantrell
, and
H.-T.
Jeng
, “
Newtype single-layer magnetic semiconductor in transition-metal dichalcogenides VX2 (X = S, Se and Te)
,”
Sci. Rep.
6
,
32625
(
2016
).
83.
G.
Duvjir
,
B. K.
Choi
,
I.
Jang
,
S.
Ulstrup
,
S.
Kang
,
T.
Thi Ly
,
S.
Kim
,
Y. H.
Choi
,
C.
Jozwiak
,
A.
Bostwick
,
E.
Rotenberg
,
J.-G.
Park
,
R.
Sankar
,
K.-S.
Kim
,
J.
Kim
, and
Y. J.
Chang
, “
Emergence of a metal–insulator transition and high-temperature charge-density waves in VSe2 at the monolayer limit
,”
Nano Lett.
18
,
5432
5438
(
2018
).
84.
D.
Li
,
X.
Wang
,
C.-M.
Kan
,
D.
He
,
Z.
Li
,
Q.
Hao
,
H.
Zhao
,
C.
Wu
,
C.
Jin
, and
X.
Cui
, “
Structural phase transition of multilayer VSe2
,”
ACS Appl. Mater. Interfaces
12
,
25143
25149
(
2020
).
85.
G. V.
Pushkarev
,
V. G.
Mazurenko
,
V. V.
Mazurenko
, and
D. W.
Boukhvalov
, “
Structural phase transitions in VSe2: Energetics, electronic structure and magnetism
,”
Phys. Chem. Chem. Phys.
21
,
22647
22653
(
2019
).
86.
J.
Tiihonen
,
P. R. C.
Kent
, and
J. T.
Krogel
, “
Surrogate Hessian accelerated structural optimization for stochastic electronic structure theories
,”
J. Chem. Phys.
156
,
054104
(
2022
).
87.
K. T.
Schütt
,
H. E.
Sauceda
,
P.-J.
Kindermans
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
SchNet—A deep learning architecture for molecules and materials
,”
J. Chem. Phys.
148
,
241722
(
2018
).
88.
T.
Xie
and
J. C.
Grossman
, “
Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties
,”
Phys. Rev. Lett.
120
,
145301
(
2018
).
89.
C. W.
Park
and
C.
Wolverton
, “
Developing an improved crystal graph convolutional neural network framework for accelerated materials discovery
,”
Phys. Rev. Mater.
4
,
063801
(
2020
).
90.
C.
Chen
,
W.
Ye
,
Y.
Zuo
,
C.
Zheng
, and
S. P.
Ong
, “
Graph networks as a universal machine learning framework for molecules and crystals
,”
Chem. Mater.
31
,
3564
3572
(
2019
).
91.
Z.
Qiao
,
M.
Welborn
,
A.
Anandkumar
,
F. R.
Manby
, and
T. F.
Miller
, “
OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features
,”
J. Chem. Phys.
153
,
124111
(
2020
).
92.
Z.
Chen
,
L.
Li
, and
J.
Bruna
, “
Supervised community detection with line graph neural networks
,” in
Proceedings of the International Conference on Learning Representations
(
2019
).
93.
A.
Dunn
,
Q.
Wang
,
A.
Ganose
,
D.
Dopp
, and
A.
Jain
, “
Benchmarking materials property prediction methods: The matbench test set and automatminer reference algorithm
,”
npj Comput. Mater.
6
,
138
(
2020
).
94.
T.
Hsu
,
T. A.
Pham
,
N.
Keilbart
,
S.
Weitzner
,
J.
Chapman
,
P.
Xiao
,
S. R.
Qiu
,
X.
Chen
, and
B. C.
Wood
, “
Efficient and interpretable graph network representation for angle-dependent properties applied to optical spectroscopy
,”
npj Comput. Mater.
8
,
151
(
2022
).
95.
S.
Gong
,
T.
Xie
,
Y.
Shao-Horn
,
R.
Gomez-Bombarelli
, and
J. C.
Grossman
, “
Examining graph neural networks for crystal structures: Limitations and opportunities for capturing periodicity
,” arXiv:2208.05039 (
2022
).
96.
S.
Batzner
,
A.
Musaelian
,
L.
Sun
,
M.
Geiger
,
J. P.
Mailoa
,
M.
Kornbluth
,
N.
Molinari
,
T. E.
Smidt
, and
B.
Kozinsky
, “
E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials
,”
Nat. Commun.
13
,
2453
(
2022
).
97.
C.
Chen
and
S. P.
Ong
, “
A universal graph deep learning interatomic potential for the periodic table
,”
Nat. Comput. Sci.
2
,
718
728
(
2022
).
98.
J.
Gasteiger
,
M.
Shuaibi
,
A.
Sriram
,
S.
Günnemann
,
Z.
Ulissi
,
C. L.
Zitnick
, and
A.
Das
, “
GemNet-OC: Developing graph neural networks for large and diverse molecular simulation datasets
,” arXiv:2204.02782 [cs.LG] (
2022
).
99.
T.
Xie
,
X.
Fu
,
O.-E.
Ganea
,
R.
Barzilay
, and
T.
Jaakkola
, “
Crystal diffusion variational autoencoder for periodic material generation
,” arXiv:2110.06197 (
2021
).
100.
D.
Wines
,
T.
Xie
, and
K.
Choudhary
, “
Inverse design of next-generation superconductors using data-driven deep generative models
,”
J. Phys. Chem. Lett.
14
,
6630
6638
(
2023
).
101.
J.
Tersoff
and
D. R.
Hamann
, “
Theory and application for the scanning tunneling microscope
,”
Phys. Rev. Lett.
50
,
1998
2001
(
1983
).
102.
A. H.
Combs
,
J. J.
Maldonis
,
J.
Feng
,
Z.
Xu
,
P. M.
Voyles
, and
D.
Morgan
, “
Fast approximate stem image simulations from a machine learning model
,”
Adv. Struct. Chem. Imaging
5
,
2
(
2019
).
103.
S.
Yamashita
,
J.
Kikkawa
,
K.
Yanagisawa
,
T.
Nagai
,
K.
Ishizuka
, and
K.
Kimoto
, “
Atomic number dependence of z contrast in scanning transmission electron microscopy
,”
Sci. Rep.
8
,
12325
(
2018
).
104.
L.
van der Maaten
and
G.
Hinton
, “
Visualizing data using t-SNE
,”
J. Mach. Learn. Res.
9
,
2579
2605
(
2008
).
105.
O.
Ronneberger
,
P.
Fischer
, and
T.
Brox
, “
U-Net: Convolutional networks for biomedical image segmentation
,” in
Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention
(
Springer
,
2015
), pp.
234
241
.
106.
F.
Pedregosa
,
G.
Varoquaux
,
A.
Gramfort
,
V.
Michel
,
B.
Thirion
,
O.
Grisel
,
M.
Blondel
,
P.
Prettenhofer
,
R.
Weiss
,
V.
Dubourg
,
J.
Vanderplas
,
A.
Passos
,
D.
Cournapeau
,
M.
Brucher
,
M.
Perrot
, and
E.
Duchesnay
, “
Scikit-learn: Machine learning in python
,”
J. Mach. Learn. Res.
12
,
2825
2830
(
2011
).
107.
K.
Simonyan
and
A.
Zisserman
, “
Very deep convolutional networks for large-scale image recognition
,” arXiv:1409.1556 (
2014
).
108.
K.
He
,
X.
Zhang
,
S.
Ren
, and
J.
Sun
, “
Deep residual learning for image recognition
,” arXiv:1512.03385 (
2015
).
109.
G.
Huang
,
Z.
Liu
,
L.
van der Maaten
, and
K. Q.
Weinberger
, “
Densely connected convolutional networks
,” arXiv:1608.06993 (
2016
).
110.
C.
Ledig
,
L.
Theis
,
F.
Huszár
,
J.
Caballero
,
A.
Cunningham
,
A.
Acosta
,
A.
Aitken
,
A.
Tejani
,
J.
Totz
,
Z.
Wang
, and
W.
Shi
, “
Photo-realistic single image super-resolution using a generative adversarial network
,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2017
), pp.
4681
4690
.
111.
T.
Wolf
,
L.
Debut
,
V.
Sanh
,
J.
Chaumond
,
C.
Delangue
,
A.
Moi
,
P.
Cistac
,
T.
Rault
,
R.
Louf
,
M.
Funtowicz
,
J.
Davison
,
S.
Shleifer
,
P.
von Platen
,
C.
Ma
,
Y.
Jernite
,
J.
Plu
,
C.
Xu
,
T. L.
Scao
,
S.
Gugger
,
M.
Drame
,
Q.
Lhoest
, and
A. M.
Rush
, “
Huggingface's transformers: State-of-the-art natural language processing
,” arXiv:1910.03771 [cs.CL] (
2020
).
112.
C.
Raffel
,
N.
Shazeer
,
A.
Roberts
,
K.
Lee
,
S.
Narang
,
M.
Matena
,
Y.
Zhou
,
W.
Li
, and
P. J.
Liu
, “
Exploring the limits of transfer learning with a unified text-to-text transformer
,” arXiv:1910.10683 [cs.LG] (
2020
).
113.
T.
Brown
,
B.
Mann
,
N.
Ryder
,
M.
Subbiah
,
J. D.
Kaplan
,
P.
Dhariwal
,
A.
Neelakantan
,
P.
Shyam
,
G.
Sastry
,
A.
Askell
et al, “
Language models are few-shot learners
,”
Adv. Neural Inf. Process. Syst.
33
,
1877
1901
(
2020
).
114.
S.
Zhang
,
S.
Roller
,
N.
Goyal
,
M.
Artetxe
,
M.
Chen
,
S.
Chen
,
C.
Dewan
,
M.
Diab
,
X.
Li
,
X. V.
Lin
,
T.
Mihaylov
,
M.
Ott
,
S.
Shleifer
,
K.
Shuster
,
D.
Simig
,
P. S.
Koura
,
A.
Sridhar
,
T.
Wang
, and
L.
Zettlemoyer
, “
Opt: Open pre-trained transformer language models
,” arXiv:2205.01068 [cs.CL] (
2022
).
115.
L.
Weston
,
V.
Tshitoyan
,
J.
Dagdelen
,
O.
Kononova
,
A.
Trewartha
,
K. A.
Persson
,
G.
Ceder
, and
A.
Jain
, “
Named entity recognition and normalization applied to large-scale information extraction from the materials science literature
,”
J. Chem. Inf. Model.
59
,
3692
3702
(
2019
).
116.
Z.
Yang
,
Z.
Dai
,
Y.
Yang
,
J.
Carbonell
,
R.
Salakhutdinov
, and
Q. V.
Le
, “
XlNet: Generalized autoregressive pretraining for language understanding
,” arXiv:1906.08237 [cs.CL] (
2020
).
117.
D.
Morgan
and
R.
Jacobs
, “
Opportunities and challenges for machine learning in materials science
,”
Annu. Rev. Mater. Res.
50
,
71
103
(
2020
).
118.
Z.
Wang
,
Z.
Sun
,
H.
Yin
,
X.
Liu
,
J.
Wang
,
H.
Zhao
,
C. H.
Pang
,
T.
Wu
,
S.
Li
,
Z.
Yin
, and
X.-F.
Yu
, “
Data-driven materials innovation and applications
,”
Adv. Mater.
34
,
2104113
(
2022
).
119.
F.
Tavazza
,
B.
DeCost
, and
K.
Choudhary
, “
Uncertainty prediction for machine learning models of material properties
,”
ACS Omega
6
,
32431
32440
(
2021
).
120.
H. G. T.
Nguyen
,
L.
Espinal
,
R. D.
van Zee
,
M.
Thommes
,
B.
Toman
,
M. S. L.
Hudson
,
E.
Mangano
,
S.
Brandani
,
D. P.
Broom
,
M. J.
Benham
,
K.
Cychosz
,
P.
Bertier
,
F.
Yang
,
B. M.
Krooss
,
R. L.
Siegelman
,
M.
Hakuman
,
K.
Nakai
,
A. D.
Ebner
,
L.
Erden
,
J. A.
Ritter
,
A.
Moran
,
O.
Talu
,
Y.
Huang
,
K. S.
Walton
,
P.
Billemont
, and
G.
De Weireld
, “
A reference high-pressure CO2 adsorption isotherm for ammonium ZSM-5 zeolite: Results of an interlaboratory study
,”
Adsorption
24
,
531
539
(
2018
).
121.
C.
Draxl
and
M.
Scheffler
, “
NOMAD: The FAIR concept for big data-driven materials science
,”
MRS Bull.
43
,
676
682
(
2018
).
122.
C. W.
Andersen
,
R.
Armiento
,
E.
Blokhin
,
G. J.
Conduit
,
S.
Dwaraknath
,
M. L.
Evans
,
Á.
Fekete
,
A.
Gopakumar
,
S.
Gražulis
,
A.
Merkys
et al, “
OPTIMADE, an API for exchanging materials data
,”
Sci. Data
8
,
217
(
2021
).
123.
See https://quantum-computing.ibm.com for “
IBM Quantum
,
2021
.”
124.
L.
Chanussot
,
A.
Das
,
S.
Goyal
,
T.
Lavril
,
M.
Shuaibi
,
M.
Riviere
,
K.
Tran
,
J.
Heras-Domingo
,
C.
Ho
,
W.
Hu
et al, “
Open catalyst 2020 (OC20) dataset and community challenges
,”
ACS Catal.
11
,
6059
6072
(
2021
).
125.
G.
Kresse
and
J.
Furthmüller
, “
Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set
,”
Phys. Rev. B
54
,
11169
(
1996
).
126.
G.
Kresse
and
J.
Furthmüller
, “
Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set
,”
Comput. Mater. Sci.
6
,
15
50
(
1996
).
127.
P.
Giannozzi
,
S.
Baroni
,
N.
Bonini
,
M.
Calandra
,
R.
Car
,
C.
Cavazzoni
,
D.
Ceresoli
,
G. L.
Chiarotti
,
M.
Cococcioni
,
I.
Dabo
et al, “
Quantum espresso: A modular and open-source software project for quantum simulations of materials
,”
J. Phys.: Condens. Matter
21
,
395502
(
2009
).
128.
P.
Giannozzi
,
O.
Baseggio
,
P.
Bonfà
,
D.
Brunato
,
R.
Car
,
I.
Carnimeo
,
C.
Cavazzoni
,
S.
De Gironcoli
,
P.
Delugas
,
F.
Ferrari Ruffino
et al, “
Quantum espresso toward the exascale
,”
J. Chem. Phys.
152
,
154105
(
2020
).
129.
P.
Blaha
,
K.
Schwarz
,
F.
Tran
,
R.
Laskowski
,
G. K. H.
Madsen
, and
L. D.
Marks
, “
WIEN2K: An APW+lo program for calculating the properties of solids
,”
J. Chem. Phys.
152
,
074101
(
2020
).
130.
G. K.
Madsen
and
D. J.
Singh
, “
BoltzTraP. A code for calculating band-structure dependent quantities
,”
Comput. Phys. Commun.
175
,
67
71
(
2006
).
131.
A. A.
Mostofi
,
J. R.
Yates
,
G.
Pizzi
,
Y.-S.
Lee
,
I.
Souza
,
D.
Vanderbilt
, and
N.
Marzari
, “
An updated version of wannier90: A tool for obtaining maximally-localised Wannier functions
,”
Comput. Phys. Commun.
185
,
2309
2310
(
2014
).
132.
J.
Kim
,
A. D.
Baczewski
,
T. D.
Beaudet
,
A.
Benali
,
M. C.
Bennett
,
M. A.
Berrill
,
N. S.
Blunt
,
E. J. L.
Borda
,
M.
Casula
,
D. M.
Ceperley
,
S.
Chiesa
,
B. K.
Clark
,
R. C.
Clay
,
K. T.
Delaney
,
M.
Dewing
,
K. P.
Esler
,
H.
Hao
,
O.
Heinonen
,
P. R. C.
Kent
,
J. T.
Krogel
,
I.
Kylänpää
,
Y. W.
Li
,
M. G.
Lopez
,
Y.
Luo
,
F. D.
Malone
,
R. M.
Martin
,
A.
Mathuriya
,
J.
McMinis
,
C. A.
Melton
,
L.
Mitas
,
M. A.
Morales
,
E.
Neuscamman
,
W. D.
Parker
,
S. D. P.
Flores
,
N. A.
Romero
,
B. M.
Rubenstein
,
J. A. R.
Shea
,
H.
Shin
,
L.
Shulenburger
,
A. F.
Tillack
,
J. P.
Townsend
,
N. M.
Tubman
,
B. V. D.
Goetz
,
J. E.
Vincent
,
D. C.
Yang
,
Y.
Yang
,
S.
Zhang
, and
L.
Zhao
, “
QMCPACK: An open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids
,”
J. Phys.: Condens. Matter
30
,
195901
(
2018
).
133.
P. R. C.
Kent
,
A.
Annaberdiyev
,
A.
Benali
,
M. C.
Bennett
,
E. J.
Landinez Borda
,
P.
Doak
,
H.
Hao
,
K. D.
Jordan
,
J. T.
Krogel
,
I.
Kylänpää
,
J.
Lee
,
Y.
Luo
,
F. D.
Malone
,
C. A.
Melton
,
L.
Mitas
,
M. A.
Morales
,
E.
Neuscamman
,
F. A.
Reboredo
,
B.
Rubenstein
,
K.
Saritas
,
S.
Upadhyay
,
G.
Wang
,
S.
Zhang
, and
L.
Zhao
, “
QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion quantum Monte Carlo
,”
J. Chem. Phys.
152
,
174105
(
2020
).
134.
A. P.
Thompson
,
H. M.
Aktulga
,
R.
Berger
,
D. S.
Bolintineanu
,
W. M.
Brown
,
P. S.
Crozier
,
P. J.
in 't Veld
,
A.
Kohlmeyer
,
S. G.
Moore
,
T. D.
Nguyen
,
R.
Shan
,
M. J.
Stevens
,
J.
Tranchida
,
C.
Trott
, and
S. J.
Plimpton
, “
LAMMPS—A flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales
,”
Comput. Phys. Commun.
271
,
108171
(
2022
).
135.
M.
Abadi
,
A.
Agarwal
,
P.
Barham
,
E.
Brevdo
,
Z.
Chen
,
C.
Citro
,
G. S.
Corrado
,
A.
Davis
,
J.
Dean
,
M.
Devin
,
S.
Ghemawat
,
I.
Goodfellow
,
A.
Harp
,
G.
Irving
,
M.
Isard
,
Y.
Jia
,
R.
Jozefowicz
,
L.
Kaiser
,
M.
Kudlur
,
J.
Levenberg
,
D.
Mané
,
R.
Monga
,
S.
Moore
,
D.
Murray
,
C.
Olah
,
M.
Schuster
,
J.
Shlens
,
B.
Steiner
,
I.
Sutskever
,
K.
Talwar
,
P.
Tucker
,
V.
Vanhoucke
,
V.
Vasudevan
,
F.
Viégas
,
O.
Vinyals
,
P.
Warden
,
M.
Wattenberg
,
M.
Wicke
,
Y.
Yu
, and
X.
Zheng
, see tensorflow.org for “
TensorFlow: Large-scale machine learning on heterogeneous systems
,
2015
.”
136.
G.
Ke
,
Q.
Meng
,
T.
Finley
,
T.
Wang
,
W.
Chen
,
W.
Ma
,
Q.
Ye
, and
T.-Y.
Liu
, “
LightGBM: A highly efficient gradient boosting decision tree
,”
Adv. Neural Inf. Process. Syst.
30
,
3146
3154
(
2017
).
137.
J. S.
Kottmann
,
S.
Alperin-Lea
,
T.
Tamayo-Mendoza
,
A.
Cervera-Lierta
,
C.
Lavigne
,
T.-C.
Yen
,
V.
Verteletskyi
,
P.
Schleich
,
A.
Anand
,
M.
Degroote
,
S.
Chaney
,
M.
Kesibi
,
N. G.
Curnow
,
B.
Solo
,
G.
Tsilimigkounakis
,
C.
Zendejas-Morales
,
A. F.
Izmaylov
, and
A.
Aspuru-Guzik
, “
TEQUILA: A platform for rapid development of quantum algorithms
,”
Quantum Sci. Technol.
6
,
024009
(
2021
).
138.
V.
Bergholm
,
J.
Izaac
,
M.
Schuld
,
C.
Gogolin
,
S.
Ahmed
,
V.
Ajith
,
M. S.
Alam
,
G.
Alonso-Linaje
,
B.
AkashNarayanan
,
A.
Asadi
,
J. M.
Arrazola
,
U.
Azad
,
S.
Banning
,
C.
Blank
,
T. R.
Bromley
,
B. A.
Cordier
,
J.
Ceroni
,
A.
Delgado
,
O. D.
Matteo
,
A.
Dusko
,
T.
Garg
,
D.
Guala
,
A.
Hayes
,
R.
Hill
,
A.
Ijaz
,
T.
Isacsson
,
D.
Ittah
,
S.
Jahangiri
,
P.
Jain
,
E.
Jiang
,
A.
Khandelwal
,
K.
Kottmann
,
R. A.
Lang
,
C.
Lee
,
T.
Loke
,
A.
Lowe
,
K.
McKiernan
,
J. J.
Meyer
,
J. A.
Montañez-Barrera
,
R.
Moyard
,
Z.
Niu
,
L. J.
O'Riordan
,
S.
Oud
,
A.
Panigrahi
,
C.-Y.
Park
,
D.
Polatajko
,
N.
Quesada
,
C.
Roberts
,
N.
,
I.
Schoch
,
B.
Shi
,
S.
Shu
,
S.
Sim
,
A.
Singh
,
I.
Strandberg
,
J.
Soni
,
A.
Száva
,
S.
Thabet
,
R. A.
Vargas-Hernández
,
T.
Vincent
,
N.
Vitucci
,
M.
Weber
,
D.
Wierichs
,
R.
Wiersema
,
M.
Willmann
,
V.
Wong
,
S.
Zhang
, and
N.
Killoran
, “
PennyLane: Automatic differentiation of hybrid quantum-classical computations
,” arXiv:1811.04968 [quant-ph] (
2022
).
139.
J. M.
Arrazola
,
S.
Jahangiri
,
A.
Delgado
,
J.
Ceroni
,
J.
Izaac
,
A.
Száva
,
U.
Azad
,
R. A.
Lang
,
Z.
Niu
,
O. D.
Matteo
,
R.
Moyard
,
J.
Soni
,
M.
Schuld
,
R. A.
Vargas-Hernández
,
T.
Tamayo-Mendoza
,
C. Y.-Y.
Lin
,
A.
Aspuru-Guzik
, and
N.
Killoran
, “
Differentiable quantum computational chemistry with pennylane
,” arXiv:2111.09967 [quant-ph] (
2023
).
140.
M.
Wang
,
D.
Zheng
,
Z.
Ye
,
Q.
Gan
,
M.
Li
,
X.
Song
,
J.
Zhou
,
C.
Ma
,
L.
Yu
,
Y.
Gai
,
T.
Xiao
,
T.
He
,
G.
Karypis
,
J.
Li
, and
Z.
Zhang
, “
Deep graph library: A graph-centric, highly-performant package for graph neural networks
,” arXiv:1909.01315 [cs.LG] (
2020
).
141.
A.
Paszke
,
S.
Gross
,
F.
Massa
,
A.
Lerer
,
J.
Bradbury
,
G.
Chanan
,
T.
Killeen
,
Z.
Lin
,
N.
Gimelshein
,
L.
Antiga
,
A.
Desmaison
,
A.
Köpf
,
E.
Yang
,
Z.
DeVito
,
M.
Raison
,
A.
Tejani
,
S.
Chilamkurthy
,
B.
Steiner
,
L.
Fang
,
J.
Bai
, and
S.
Chintala
, “
PyTorch: An imperative style, high-performance deep learning library
,” arXiv:1912.01703 [cs.LG] (
2019
).