Recent machine learning models for bandgap prediction that explicitly encode the structure information to the model feature set significantly improve the model accuracy compared to both traditional machine learning and non-graph-based deep learning methods. The ongoing rapid growth of open-access bandgap databases can benefit such model construction not only by expanding their domain of applicability but also by requiring constant updating of the model. Here, we build a new state-of-the-art multi-fidelity graph network model for bandgap prediction of crystalline compounds from a large bandgap database of experimental and density functional theory (DFT) computed bandgaps with over 806 600 entries (1500 experimental, 775 700 low-fidelity DFT, and 29 400 high-fidelity DFT). The model predicts bandgaps with a 0.23 eV mean absolute error in cross validation for high-fidelity data, and including the mixed data from all different fidelities improves the prediction of the high-fidelity data. The prediction error is smaller for high-symmetry crystals than for low symmetry crystals. Our data are published through a new cloud-based computing environment, called the “Foundry,” which supports easy creation and revision of standardized data structures and will enable cloud accessible containerized models, allowing for continuous model development and data accumulation in the future.

1.
K. T.
Butler
,
D. W.
Davies
,
H.
Cartwright
,
O.
Isayev
, and
A.
Walsh
, “
Machine learning for molecular and materials science
,”
Nature
559
,
547
555
(
2018
).
2.
D.
Morgan
and
R.
Jacobs
, “
Opportunities and challenges for machine learning in materials science
,”
Annu. Rev. Mater. Res.
50
,
71
103
(
2020
).
3.
S. M.
Sze
,
Y.
Li
, and
K. K.
Ng
,
Physics of Semiconductor Devices
(
John Wiley & Sons
,
2021
).
4.
T.
Gu
,
W.
Lu
,
X.
Bao
, and
N.
Chen
, “
Using support vector regression for the prediction of the band gap and melting point of binary and ternary compound semiconductors
,”
Solid State Sci.
8
,
129
136
(
2006
).
5.
P.
Dey
,
J.
Bible
,
S.
Datta
,
S.
Broderick
,
J.
Jasinski
,
M.
Sunkara
,
M.
Menon
, and
K.
Rajan
, “
Informatics-aided bandgap engineering for solar materials
,”
Comput. Mater. Sci.
83
,
185
195
(
2014
).
6.
G.
Pilania
,
A.
Mannodi-Kanakkithodi
,
B. P.
Uberuaga
,
R.
Ramprasad
,
J. E.
Gubernatis
, and
T.
Lookman
, “
Machine learning bandgaps of double perovskites
,”
Sci. Rep.
6
,
19375
(
2016
).
7.
S.
Lu
,
Q.
Zhou
,
Y.
Ouyang
,
Y.
Guo
,
Q.
Li
, and
J.
Wang
, “
Accelerated discovery of stable lead-free hybrid organic-inorganic perovskites via machine learning
,”
Nat. Commun.
9
,
3405
(
2018
).
8.
J.
Im
,
S.
Lee
,
T.-W.
Ko
,
H. W.
Kim
,
Y.
Hyon
, and
H.
Chang
, “
Identifying Pb-free perovskites for solar cells by machine learning
,”
npj Comput. Mater.
5
,
37
(
2019
).
9.
V.
Gladkikh
,
D. Y.
Kim
,
A.
Hajibabaei
,
A.
Jana
,
C. W.
Myung
, and
K. S.
Kim
, “
Machine learning for predicting the band gaps of ABX3 perovskites from elemental properties
,”
J. Phys. Chem. C
124
,
8905
8918
(
2020
).
10.
T.
Xie
and
J. C.
Grossman
, “
Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties
,”
Phys. Rev. Lett.
120
,
145301
(
2018
).
11.
K. T.
Schütt
,
H. E.
Sauceda
,
P.-J.
Kindermans
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
SchNet—A deep learning architecture for molecules and materials
,”
J. Chem. Phys.
148
,
241722
(
2018
).
12.
C.
Chen
,
W.
Ye
,
Y.
Zuo
,
C.
Zheng
, and
S. P.
Ong
, “
Graph networks as a universal machine learning framework for molecules and crystals
,”
Chem. Mater.
31
,
3564
3572
(
2019
).
13.
C.
Chen
,
Y.
Zuo
,
W.
Ye
,
X.
Li
, and
S. P.
Ong
, “
Learning properties of ordered and disordered materials from multi-fidelity data
,”
Nat. Comput. Sci.
1
,
46
53
(
2021
).
14.
J. P.
Perdew
,
K.
Burke
, and
M.
Ernzerhof
, “
Generalized gradient approximation made simple
,”
Phys. Rev. Lett.
77
,
3865
(
1996
).
15.
A.
Jain
,
S. P.
Ong
,
G.
Hautier
,
W.
Chen
,
W. D.
Richards
,
S.
Dacek
,
S.
Cholia
,
D.
Gunter
,
D.
Skinner
,
G.
Ceder
 et al., “
Commentary: The materials project: A materials genome approach to accelerating materials innovation
,”
APL Mater.
1
,
011002
(
2013
).
16.
S.
Curtarolo
,
W.
Setyawan
,
G. L. W.
Hart
,
M.
Jahnatek
,
R. V.
Chepulskii
,
R. H.
Taylor
,
S.
Wang
,
J.
Xue
,
K.
Yang
,
O.
Levy
 et al., “
AFLOW: An automatic framework for high-throughput materials discovery
,”
Comput. Mater. Sci.
58
,
218
226
(
2012
).
17.
S.
Kirklin
,
J. E.
Saal
,
B.
Meredig
,
A.
Thompson
,
J. W.
Doak
,
M.
Aykol
,
S.
Rühl
, and
C.
Wolverton
, “
The open quantum materials database (OQMD): Assessing the accuracy of DFT formation energies
,”
npj Comput. Mater.
1
,
15010
(
2015
).
18.
M.
Kuisma
,
J.
Ojanen
,
J.
Enkovaara
, and
T. T.
Rantala
, “
Kohn-Sham potential with discontinuity for band gap materials
,”
Phys. Rev. B
82
,
115106
(
2010
).
19.
I. E.
Castelli
,
D. D.
Landis
,
K. S.
Thygesen
,
S.
Dahl
,
I.
Chorkendorff
,
T. F.
Jaramillo
, and
K. W.
Jacobsen
, “
New cubic perovskites for one- and two-photon water splitting using the computational materials repository
,”
Energy Environ. Sci.
5
,
9034
9043
(
2012
).
20.
I. E.
Castelli
,
J. M.
García-Lastra
,
K. S.
Thygesen
, and
K. W.
Jacobsen
, “
Bandgap calculations and trends of organometal halide perovskites
,”
APL Mater.
2
,
081514
(
2014
).
21.
I. E.
Castelli
,
F.
Hüser
,
M.
Pandey
,
H.
Li
,
K. S.
Thygesen
,
B.
Seger
,
A.
Jain
,
K. A.
Persson
,
G.
Ceder
, and
K. W.
Jacobsen
, “
New light-harvesting materials using accurate and efficient bandgap calculations
,”
Adv. Energy Mater.
5
,
1400915
(
2015
).
22.
M.
Pandey
and
K. W.
Jacobsen
, “
Promising quaternary chalcogenides as high-band-gap semiconductors for tandem photoelectrochemical water splitting devices: A computational screening approach
,”
Phys. Rev. Mater.
2
,
105402
(
2018
).
23.
K.
Kuhar
,
A.
Crovetto
,
M.
Pandey
,
K. S.
Thygesen
,
B.
Seger
,
P. C. K.
Vesborg
,
O.
Hansen
,
I.
Chorkendorff
, and
K. W.
Jacobsen
, “
Sulfide perovskites for solar energy conversion applications: Computational screening and synthesis of the selected compound LaYS3
,”
Energy Environ. Sci.
10
,
2579
2593
(
2017
).
24.
K.
Choudhary
,
K. F.
Garrity
,
A. C.
Reid
,
B.
DeCost
,
A. J.
Biacchi
,
A. R. H.
Walker
,
Z.
Trautt
,
J.
Hattrick-Simpers
,
A. G.
Kusne
,
A.
Centrone
 et al., “
The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design
,”
npj Comput. Mater.
6
,
173
(
2020
).
25.
F.
Tran
and
P.
Blaha
, “
Accurate band gaps of semiconductors and insulators with a semilocal exchange-correlation potential
,”
Phys. Rev. Lett.
102
,
226401
(
2009
).
26.
J.
Heyd
,
G. E.
Scuseria
, and
M.
Ernzerhof
, “
Hybrid functionals based on a screened Coulomb potential
,”
J. Chem. Phys.
118
,
8207
8215
(
2003
).
27.
J.
Jie
,
M.
Weng
,
S.
Li
,
D.
Chen
,
S.
Li
,
W.
Xiao
,
J.
Zheng
,
F.
Pan
, and
L.
Wang
, “
A new MaterialGo database and its comparison with other high-throughput electronic structure databases for their predicted energy band gaps
,”
Sci. China: Technol. Sci.
62
,
1423
1430
(
2019
).
28.
C.
Kim
,
T. D.
Huan
,
S.
Krishnan
, and
R.
Ramprasad
, “
A hybrid organic-inorganic perovskite dataset
,”
Sci. Data
4
,
170057
(
2017
).
29.
M.
van Schilfgaarde
,
T.
Kotani
, and
S.
Faleev
, “
Quasiparticle self-consistent GW theory
,”
Phys. Rev. Lett.
96
,
226402
(
2006
).
30.
S.
Lany
, “
Band-structure calculations for the 3d transition metal oxides in GW
,”
Phys. Rev. B
87
,
085112
(
2013
).
31.
J.
Lee
,
A.
Seko
,
K.
Shitara
,
K.
Nakayama
, and
I.
Tanaka
, “
Prediction model of band gap for inorganic compounds by combination of density functional theory calculations and machine learning techniques
,”
Phys. Rev. B
93
,
115104
(
2016
).
32.
Y.
Zhuo
,
A.
Mansouri Tehrani
, and
J.
Brgoch
, “
Predicting the band gaps of inorganic solids by machine learning
,”
J. Phys. Chem. Lett.
9
,
1668
1673
(
2018
).
33.
Z.
Li
,
R.
Chard
,
L.
Ward
,
K.
Chard
,
T. J.
Skluzacek
,
Y.
Babuji
,
A.
Woodard
,
S.
Tuecke
,
B.
Blaiszik
,
M. J.
Franklin
 et al., “
DLHub: Simplifying publication, discovery, and use of machine learning models in science
,”
J. Parallel Distrib. Comput.
147
,
64
76
(
2021
).
34.
O.
Madelung
,
Semiconductors: Data Handbook
(
Springer Science & Business Media
,
2004
).
35.
P.
Villars
,
M.
Berndt
,
K.
Brandenburg
,
K.
Cenzual
,
J.
Daams
,
F.
Hulliger
,
T.
Massalski
,
H.
Okamoto
,
K.
Osaki
,
A.
Prince
 et al., “
The Pauling file
,”
J. Alloys Compd.
367
,
293
297
(
2004
).
36.
N. N.
Kiselyova
,
V. A.
Dudarev
, and
M. A.
Korzhuyev
, “
Database on the bandgap of inorganic substances and materials
,”
Inorg. Mater.: Appl. Res.
7
,
34
39
(
2016
).
37.
Y.
Xu
,
M.
Yamazaki
, and
P.
Villars
, “
Inorganic materials database for exploring the nature of material
,”
Jpn. J. Appl. Phys., Part 1
50
,
11RH02
(
2011
).
38.
S.
Kanno
,
Y.
Imamura
, and
M.
Hada
, “
Alternative materials for perovskite solar cells from materials informatics
,”
Phys. Rev. Mater.
3
,
075403
(
2019
).
39.
M.
Hellenbrandt
, “
The inorganic crystal structure database (ICSD)—Present and future
,”
Crystallogr. Rev.
10
,
17
22
(
2004
).
40.
J.
Klimeš
,
D. R.
Bowler
, and
A.
Michaelides
, “
Chemical accuracy for the van der Waals density functional
,”
J. Phys.: Condens. Matter
22
,
022201
(
2009
).
41.
G. S.
Na
,
S.
Jang
,
Y.-L.
Lee
, and
H.
Chang
, “
Tuplewise material representation based machine learning for accurate band gap prediction
,”
J. Phys. Chem. A
124
,
10616
(
2020
).
42.
P.
Omprakash
,
B.
Manikandan
,
A.
Sandeep
,
R.
Shrivastava
,
V.
P.
, and
D. B.
Panemangalore
, “
Graph representational learning for bandgap prediction in varied perovskite crystals
,”
Comput. Mater. Sci.
196
,
110530
(
2021
).
43.
T.
Chen
and
C.
Guestrin
, “
XGBoost: A scalable tree boosting system
,” in
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(
ACM
,
2016
), pp.
785
794
.
44.
B.
Blaiszik
,
L.
Ward
,
M.
Schwarting
,
J.
Gaff
,
R.
Chard
,
D.
Pike
,
K.
Chard
, and
I.
Foster
, “
A data ecosystem to support machine learning in materials science
,”
MRS Commun.
9
,
1125
1133
(
2019
).

Supplementary Material

You do not currently have access to this content.