Graph neural networks trained on experimental or calculated data are becoming an increasingly important tool in computational materials science. Networks once trained are able to make highly accurate predictions at a fraction of the cost of experiments or first-principles calculations of comparable accuracy. However, these networks typically rely on large databases of labeled experiments to train the model. In scenarios where data are scarce or expensive to obtain, this can be prohibitive. By building a neural network that provides confidence on the predicted properties, we are able to develop an active learning scheme that can reduce the amount of labeled data required by identifying the areas of chemical space where the model is most uncertain. We present a scheme for coupling a graph neural network with a Gaussian process to featurize solid-state materials and predict properties including a measure of confidence in the prediction. We then demonstrate that this scheme can be used in an active learning context to speed up the training of the model by selecting the optimal next experiment for obtaining a data label. Our active learning scheme can double the rate at which the performance of the model on a test dataset improves with additional data compared to choosing the next sample at random. This type of uncertainty quantification and active learning has the potential to open up new areas of materials science, where data are scarce and expensive to obtain, to the transformative power of graph neural networks.

1.
K. T.
Butler
,
D. W.
Davies
,
H.
Cartwright
,
O.
Isayev
, and
A.
Walsh
, “
Machine learning for molecular and materials science
,”
Nature
559
,
547
(
2018
).
2.
C.
Chen
,
W.
Ye
,
Y.
Zuo
,
C.
Zheng
, and
S. P.
Ong
, “
Graph networks as a universal machine learning framework for molecules and crystals
,”
Chem. Mater.
31
,
3564
(
2019
).
3.
T.
Xie
and
J. C.
Grossman
, “
Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties
,”
Phys. Rev. Lett.
120
,
145301
(
2018
).
4.
C. W.
Park
and
C.
Wolverton
, “
Developing an improved crystal graph convolutional neural network framework for accelerated materials discovery
,”
Phys. Rev. Mater.
4
,
063801
(
2020
).
5.
J.
Lee
and
R.
Asahi
, “
Transfer learning for materials informatics using crystal graph convolutional neural network
,”
Comput. Mater. Sci.
190
,
110314
(
2021
).
6.
A.
Raza
,
A.
Sturluson
,
C. M.
Simon
, and
X.
Fern
, “
Message passing neural networks for partial charge assignment to metal–organic frameworks
,”
J. Phys. Chem. C
124
,
19070
(
2020
).
7.
A.
Mallick
,
C.
Dwivedi
,
B.
Kailkhura
,
G.
Joshi
, and
Y.
Han
, “
Sample efficient uncertainty estimation using probabilistic neighborhood component analysis
,” Technical Report (
Lawrence Livermore National Lab (LLNL)
,
Livermore, CA
,
2020
).
8.
D. J. C.
MacKay
, “
A practical Bayesian framework for backpropagation networks
,”
Neural Comput.
4
,
448
(
1992
).
9.
C.
Blundell
,
J.
Cornebise
,
K.
Kavukcuoglu
, and
D.
Wierstra
, “
Weight uncertainty in neural networks
,” arXiv:1505.05424 (
2015
).
10.
Y.
Gal
and
Z.
Ghahramani
, “
Dropout as a Bayesian approximation: Representing model uncertainty in deep learning
,” in
International Conference on Machine Learning
(
PMLR
,
2016
), pp.
1050
1059
.
11.
I.
Bilionis
and
N.
Zabaras
, “
Bayesian uncertainty propagation using Gaussian processes
,” in
Handbook of Uncertainty Quantification
, edited by
R.
Ghanem
,
D.
Higdon
, and
H.
Owhadi
(
Springer International Publishing
,
Cham
,
2017
), pp.
555
599
.
12.
N.
Raimbault
,
A.
Grisafi
,
M.
Ceriotti
, and
M.
Rossi
, “
Using Gaussian process regression to simulate the vibrational Raman spectra of molecular crystals
,”
New J. Phys.
21
,
105001
(
2019
).
13.
R.
Meyer
and
A. W.
Hauser
, “
Geometry optimization using Gaussian process regression in internal coordinate systems
,”
J. Chem. Phys.
152
,
084112
(
2020
).
14.
F.
Musil
,
A.
Grisafi
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
, “
Physics-inspired structural representations for molecules and materials
,”
Chem. Rev.
121
,
9759
(
2021
).
15.
M.
Tschannen
,
O.
Bachem
, and
M.
Lucic
, “
Recent advances in autoencoder-based representation learning
,” arXiv:1812.05069 (
2018
).
16.
Y.
Bengio
,
A.
Courville
, and
P.
Vincent
, “
Representation learning: A review and new perspectives
,”
IEEE Trans. Pattern Anal. Mach. Intell.
35
,
1798
(
2013
).
17.
S.
Gidaris
,
P.
Singh
, and
N.
Komodakis
, “
Unsupervised representation learning by predicting image rotations
,” in
International Conference on Learning Representations (ICLR), 2018
; available at https://arxiv.org/pdf/1803.07728.pdf.
18.
A.
Kopf
and
M.
Claassen
, “
Latent representation learning in biology and translational medicine
,”
Patterns
2
,
100198
(
2021
).
19.
K.
Tran
,
W.
Neiswanger
,
J.
Yoon
,
Q.
Zhang
,
E.
Xing
, and
Z. W.
Ulissi
, “
Methods for comparing uncertainty quantifications for material property predictions
,”
Mach. Learn.: Sci. Technol.
1
,
025006
(
2020
).
20.
J. P.
Janet
,
C.
Duan
,
T.
Yang
,
A.
Nandy
, and
H. J.
Kulik
, “
A quantitative uncertainty metric controls error in neural network-driven chemical discovery
,”
Chem. Sci.
10
,
7913
(
2019
).
21.
G.
Scalia
,
C. A.
Grambow
,
B.
Pernici
,
Y.-P.
Li
, and
W. H.
Green
, “
Evaluating scalable uncertainty estimation methods for DNN-based molecular property prediction
,”
J. Chem. Inf. Model.
60
,
2697
(
2019
).
22.
K.
Morita
,
D. W.
Davies
,
K. T.
Butler
, and
A.
Walsh
, “
Modelling the dielectric constants of crystals using machine learning
,”
J. Chem. Phys.
153
,
024503
(
2020
).
23.
J. S.
Smith
,
B. T.
Nebgen
,
R.
Zubatyuk
,
N.
Lubbers
,
C.
Devereux
,
K.
Barros
,
S.
Tretiak
,
O.
Isayev
, and
A. E.
Roitberg
, “
Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning
,”
Nat. Commun.
10
,
2903
(
2019
).
24.
G.
Hinton
and
S. T.
Roweis
, “
Stochastic neighbor embedding
,” in
Conference on Neural Information Processing Systems (NIPS)
(
Citeseer
,
2002
), Vol. 15, pp.
833
840
.
25.
A.
Dunn
,
Q.
Wang
,
A.
Ganose
,
D.
Dopp
, and
A.
Jain
, “
Benchmarking materials property prediction methods: The Matbench test set and Automatminer reference algorithm
,”
npj Comput. Mater.
6
,
138
(
2020
); available at https://www.nature.com/articles/s41524-020-00406-3.
26.
V.
Fung
,
J.
Zhang
,
E.
Juarez
, and
B. G.
Sumpter
, “
Bench-marking graph neural networks for materials chemistry
,”
npj Comput. Mater.
7
,
84
(
2021
).
27.
V.
Kuleshov
,
N.
Fenner
, and
S.
Ermon
, “
Accurate uncertainties for deep learning using calibrated regression
,” in
Proceedings of the 35th International Conference on Machine Learning
(
PMLR
,
2018
), Vol. 80, pp.
2796
2804
.
28.
D.
Levi
,
L.
Gispan
,
N.
Giladi
, and
E.
Fetaya
, “
Evaluating and calibrating uncertainty prediction in regression tasks
,” arXiv:1905.11659 (
2019
).
29.
Z.
Del Rosario
,
M.
Rupp
,
Y.
Kim
,
E.
Antono
, and
J.
Ling
, “
Assessing the Frontier: Active learning, model accuracy, and multi-objective candidate discovery and optimization
,”
J. Chem. Phys.
153
,
024112
(
2020
).
30.
D. W.
Davies
,
K. T.
Butler
,
A. J.
Jackson
,
A.
Morris
,
J. M.
Frost
,
J. M.
Skelton
, and
A.
Walsh
, “
Computational screening of all stoichiometric inorganic materials
,”
Chem.
1
,
617
(
2016
).
31.
D. W.
Davies
,
K. T.
Butler
, and
A.
Walsh
, “
Data-driven discovery of photoactive quaternary oxides using first-principles machine learning
,”
Chem. Mater.
31
,
7221
(
2019
).
32.
C.
Nyshadham
,
M.
Rupp
,
B.
Bekker
,
A. V.
Shapeev
,
T.
Mueller
,
C. W.
Rosenbrock
,
G.
Csányi
,
D. W.
Wingate
, and
G. L.
Hart
, “
Machine-learned multi-system surrogate models for materials prediction
,”
npj Comput. Mater.
5
,
51
(
2019
).
33.
N. S.
Bobbitt
and
R. Q.
Snurr
, “
Molecular modelling and machine learning for high-throughput screening of metalorganic frameworks for hydrogen storage
,”
Mol. Simul.
45
,
1069
(
2019
).
34.
Z.
Li
,
S.
Wang
,
W. S.
Chin
,
L. E.
Achenie
, and
H.
Xin
, “
High-throughput screening of bimetallic catalysts enabled by machine learning
,”
J. Mater. Chem. A
5
,
24131
(
2017
).
35.
T.
Lookman
,
P. V.
Balachandran
,
D.
Xue
, and
R.
Yuan
, “
Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design
,”
npj Comput. Mater.
5
,
21
(
2019
).
36.
Y.
Zhang
 et al, “
Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning
,”
Chem. Sci.
10
,
8154
(
2019
).
37.
E. V.
Podryabinkin
and
A. V.
Shapeev
, “
Active learning of linearly parametrized interatomic potentials
,”
Comput. Mater. Sci.
140
,
171
(
2017
).
38.
J. S.
Smith
,
B.
Nebgen
,
N.
Lubbers
,
O.
Isayev
, and
A. E.
Roitberg
, “
Less is more: Sampling chemical space with active learning
,”
J. Chem. Phys.
148
,
241733
(
2018
).
39.
K.
Tran
and
Z. W.
Ulissi
, “
Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution
,”
Nat. Catal.
1
,
696
(
2018
).
40.
G.
Pilania
,
J. E.
Gubernatis
, and
T.
Lookman
, “
Multifidelity machine learning models for accurate bandgap predictions of solids
,”
Comput. Mater. Sci.
129
,
156
(
2017
).
41.
I.
Petousis
,
D.
Mrdjenovich
,
E.
Ballouz
,
M.
Liu
,
D.
Winston
,
W.
Chen
,
T.
Graf
,
T. D.
Schladt
,
K. A.
Persson
, and
F. B.
Prinz
, “
High-throughput screening of inorganic compounds for the discovery of novel dielectric and optical materials
,”
Sci. Data
4
,
160134
(
2017
).
42.
Y.
Noda
,
M.
Otake
, and
M.
Nakayama
, “
Descriptors for dielectric constants of perovskite-type oxides by materials informatics with first-principles density functional theory
,”
Sci. Technol. Adv. Mater.
21
,
92
(
2020
).
43.
S. A.
Tawfik
,
O.
Isayev
,
M. J. S.
Spencer
, and
D. A.
Winkler
, “
Predicting thermal properties of crystals using machine learning
,”
Adv. Theory Simul.
3
,
1900208
(
2020
).
44.
D. W.
Davies
,
C. N.
Savory
,
J. M.
Frost
,
D. O.
Scanlon
,
B. J.
Morgan
, and
A.
Walsh
, “
Descriptors for electron and hole charge carriers in metal oxides
,”
J. Phys. Chem. Lett.
11
,
438
(
2019
).
45.
V.
Sharma
,
P.
Kumar
,
P.
Dev
, and
G.
Pilania
, “
Machine learning substitutional defect formation energies in ABO3 perovskites
,”
J. Appl. Phys.
128
,
034902
(
2020
).
46.
G. H.
Gu
,
J.
Noh
,
I.
Kim
, and
Y.
Jung
, “
Machine learning for renewable energy materials
,”
J. Mater. Chem. A
7
,
17096
(
2019
).
47.
M. K.
Bisbo
and
B.
Hammer
, “
Efficient global structure optimization with a machine-learned surrogate model
,”
Phys. Rev. Lett.
124
,
086102
(
2020
).
48.
M. S.
Jørgensen
,
U. F.
Larsen
,
K. W.
Jacobsen
, and
B.
Hammer
, “
Exploration versus exploitation in global atomistic structure optimization
,”
J. Phys. Chem. A
122
,
1504
(
2018
).
49.
D.
Bash
,
Y.
Cai
,
V.
Chellappan
,
S. L.
Wong
,
X.
Yang
,
P.
Kumar
,
J. D.
Tan
,
A.
Abutaha
,
J. J.
Cheng
,
Y. F.
Lim
 et al, “
Multi-fidelity high-throughput optimization of electrical conductivity in P3HT-CNT composites
,”
Adv. Funct. Mater.
31
,
2102606
(
2021
).
50.
A.
Jain
,
S. P.
Ong
,
G.
Hautier
,
W.
Chen
,
W. D.
Richards
,
S.
Dacek
,
S.
Cholia
,
D.
Gunter
,
D.
Skinner
,
G.
Ceder
, and
K. a.
Persson
, “
The materials project: A materials genome approach to accelerating materials innovation
,”
APL Mater.
1
,
011002
(
2013
).
51.
J.
Allotey
and
K. T.
Butler
, “
GP-net
,” http://github.com/keeeto/gp-net (
2021
).
52.
J.
Allotey
and
K. T.
Butler
, “
Data and models for: Entropy based active learning of graph neural networks for materials properties
,” http://zenodo.org/record/4922828#.YMHksBIo-xI (
2021
).
53.
O.
Vinyals
,
S.
Bengio
, and
M.
Kudlur
, “
Order matters: Sequence to sequence for sets
,” arXiv:1511.06391 (
2015
).
54.
J. V.
Dillon
,
I.
Langmore
,
D.
Tran
,
E.
Brevdo
,
S.
Vasudevan
,
D.
Moore
,
B.
Patton
,
A.
Alemi
,
M.
Hoffman
, and
R. A.
Saurous
, “
Tensorflow distributions
,” arXiv:1711.10604 (
2017
).
You do not currently have access to this content.