Recent advances in Graph Neural Networks (GNNs) have transformed the space of molecular and catalyst discovery. Despite the fact that the underlying physics across these domains remain the same, most prior work has focused on building domain-specific models either in small molecules or in materials. However, building large datasets across all domains is computationally expensive; therefore, the use of transfer learning (TL) to generalize to different domains is a promising but under-explored approach to this problem. To evaluate this hypothesis, we use a model that is pretrained on the Open Catalyst Dataset (OC20), and we study the model’s behavior when fine-tuned for a set of different datasets and tasks. This includes MD17, the *CO adsorbate dataset, and OC20 across different tasks. Through extensive TL experiments, we demonstrate that the initial layers of GNNs learn a more basic representation that is consistent across domains, whereas the final layers learn more task-specific features. Moreover, these well-known strategies show significant improvement over the non-pretrained models for in-domain tasks with improvements of 53% and 17% for the *CO dataset and across the Open Catalyst Project (OCP) task, respectively. TL approaches result in up to 4× speedup in model training depending on the target data and task. However, these do not perform well for the MD17 dataset, resulting in worse performance than the non-pretrained model for few molecules. Based on these observations, we propose transfer learning using attentions across atomic systems with graph Neural Networks (TAAG), an attention-based approach that adapts to prioritize and transfer important features from the interaction layers of GNNs. The proposed method outperforms the best TL approach for out-of-domain datasets, such as MD17, and gives a mean improvement of 6% over a model trained from scratch.

1.
K. T.
Butler
,
D. W.
Davies
,
H.
Cartwright
,
O.
Isayev
, and
A.
Walsh
, “
Machine learning for molecular and materials science
,”
Nature
559
,
547
555
(
2018
).
2.
O. A.
von Lilienfeld
and
K.
Burke
, “
Retrospective on a decade of machine learning for chemical discovery
,”
Nat. Commun.
11
,
4895
(
2020
).
3.
S.
Hochreiter
,
G.
Klambauer
, and
M.
Rarey
, “
Machine learning in drug discovery
,”
J. Chem. Inf. Model.
58
,
1723
(
2018
).
4.
J.
Vamathevan
,
D.
Clark
,
P.
Czodrowski
,
I.
Dunham
,
E.
Ferran
,
G.
Lee
,
B.
Li
,
A.
Madabhushi
,
P.
Shah
,
M.
Spitzer
 et al, “
Applications of machine learning in drug discovery and development
,”
Nat. Rev. Drug Discovery
18
,
463
477
(
2019
).
5.
A.
Thakkar
and
P.
Schwaller
, “
How AI for synthesis can help tackle challenges in molecular discovery: Medicinal chemistry and chemical biology highlights
,”
Chimia
75
,
677
(
2021
).
6.
D.
Hecht
, “
Applications of machine learning and computational intelligence to drug discovery and development
,”
Drug Dev. Res.
72
,
53
65
(
2011
).
7.
J. S.
Smith
,
A. E.
Roitberg
, and
O.
Isayev
, “
Transforming computational drug discovery with machine learning and AI
,”
ACS Med. Chem. Lett.
9
,
1065
(
2018
).
8.
Y.-C.
Lo
,
S. E.
Rensi
,
W.
Torng
, and
R. B.
Altman
, “
Machine learning in chemoinformatics and drug discovery
,”
Drug Discovery Today
23
,
1538
1546
(
2018
).
9.
B. R.
Goldsmith
,
J.
Esterhuizen
,
J.-X.
Liu
,
C. J.
Bartel
, and
C.
Sutton
, “
Machine learning for heterogeneous catalyst design and discovery
,”
AIChE J.
64
,
2311
(
2018
).
10.
T.
Williams
,
K.
McCullough
, and
J. A.
Lauterbach
, “
Enabling catalyst discovery through machine learning and high-throughput experimentation
,”
Chem. Mater.
32
,
157
165
(
2019
).
11.
G. A.
Landrum
,
J. E.
Penzotti
, and
S.
Putta
, “
Machine-learning models for combinatorial catalyst discovery
,”
Meas. Sci. Technol.
16
,
270
(
2004
).
12.
G.
dos Passos Gomes
,
R.
Pollice
, and
A.
Aspuru-Guzik
, “
Navigating through the maze of homogeneous catalyst design with machine learning
,”
Trends Chem.
3
,
96
110
(
2021
).
13.
M.
Zhong
,
K.
Tran
,
Y.
Min
,
C.
Wang
,
Z.
Wang
,
C.-T.
Dinh
,
P.
De Luna
,
Z.
Yu
,
A. S.
Rasouli
,
P.
Brodersen
 et al, “
Accelerated discovery of CO2 electrocatalysts using active machine learning
,”
Nature
581
,
178
183
(
2020
).
14.
G. H.
Gu
,
J.
Noh
,
I.
Kim
, and
Y.
Jung
, “
Machine learning for renewable energy materials
,”
J. Mater. Chem. A
7
,
17096
17117
(
2019
).
15.
Y.
Liu
,
O. C.
Esan
,
Z.
Pan
, and
L.
An
, “
Machine learning for advanced energy materials
,”
Energy AI
3
,
100049
(
2021
).
16.
S.
Chmiela
,
A.
Tkatchenko
,
H. E.
Sauceda
,
I.
Poltavsky
,
K. T.
Schütt
, and
K.-R.
Müller
, “
Machine learning of accurate energy-conserving molecular force fields
,”
Sci. Adv.
3
,
e1603015
(
2017
).
17.
L.
Chanussot
,
A.
Das
,
S.
Goyal
,
T.
Lavril
,
M.
Shuaibi
,
M.
Riviere
,
K.
Tran
,
J.
Heras-Domingo
,
C.
Ho
,
W.
Hu
 et al, “
Open catalyst 2020 (OC20) dataset and community challenges
,”
ACS Catal.
11
,
6059
6072
(
2021
).
18.
K. T.
Winther
,
M. J.
Hoffmann
,
J. R.
Boes
,
O.
Mamun
,
M.
Bajdich
, and
T.
Bligaard
, “
Catalysis-hub.org: An open electronic structure database for surface reactions
,”
Sci. Data
6
,
75
10
(
2019
).
19.
A.
Jain
,
S. P.
Ong
,
G.
Hautier
,
W.
Chen
,
W. D.
Richards
,
S.
Dacek
,
S.
Cholia
,
D.
Gunter
,
D.
Skinner
,
G.
Ceder
 et al, “
Commentary: The materials project: A materials genome approach to accelerating materials innovation
,”
APL Mater.
1
,
011002
(
2013
).
20.
W.
Hu
,
M.
Fey
,
H.
Ren
,
M.
Nakata
,
Y.
Dong
, and
J.
Leskovec
, “
OGB-LSC: A large-scale challenge for machine learning on graphs
,” arXiv:2103.09430 (
2021
).
21.
S.
Ruder
,
M. E.
Peters
,
S.
Swayamdipta
, and
T.
Wolf
, “
Transfer learning in natural language processing
,” in
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials
(
Association for Computational Linguistics
,
2019
), pp.
15
18
.
22.
C.
Tan
,
F.
Sun
,
T.
Kong
,
W.
Zhang
,
C.
Yang
, and
C.
Liu
, “
A survey on deep transfer learning
,” in
International conference on artificial neural networks
(
Springer
,
2018
), pp.
270
279
.
23.
J. S.
Smith
,
B. T.
Nebgen
,
R.
Zubatyuk
,
N.
Lubbers
,
C.
Devereux
,
K.
Barros
,
S.
Tretiak
,
O.
Isayev
, and
A. E.
Roitberg
, “
Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning
,”
Nat. Commun.
10
,
2903
(
2019
).
24.
V.
Gupta
,
K.
Choudhary
,
F.
Tavazza
,
C.
Campbell
,
W.-k.
Liao
,
A.
Choudhary
, and
A.
Agrawal
, “
Cross-property deep transfer learning framework for enhanced predictive analytics on small materials data
,”
Nat. Commun.
12
,
6595
(
2021
).
25.
C.
Chen
and
S. P.
Ong
, “
AtomSets – A hierarchical transfer learning framework for small and large materials datasets
,” arXiv:2102.02401 (
2021
).
26.
H.
Yamada
,
C.
Liu
,
S.
Wu
,
Y.
Koyama
,
S.
Ju
,
J.
Shiomi
,
J.
Morikawa
, and
R.
Yoshida
, “
Predicting materials properties with little data using shotgun transfer learning
,”
ACS Cent. Sci.
5
,
1717
1730
(
2019
).
27.
D.
Jha
,
K.
Choudhary
,
F.
Tavazza
,
W.-k.
Liao
,
A.
Choudhary
,
C.
Campbell
, and
A.
Agrawal
, “
Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning
,”
Nat. Commun.
10
,
5316
(
2019
).
28.
K.
Tran
and
Z. W.
Ulissi
, “
Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution
,”
Nat. Catal.
1
,
696
703
(
2018
).
29.
D.
Girish
,
V.
Singh
, and
A.
Ralescu
, in “
Unsupervised Clustering Based Understanding of CNN
” (
CVPR Workshops
,
2019
), pp.
9
11
.
30.
M.
Hussain
,
J. J.
Bird
, and
D. R.
Faria
, “
A study on CNN transfer learning for image classification
,”
UK Workshop on Computational Intelligence
(
Springer
,
2018
), pp.
191
202
.
31.
K.
Palanisamy
,
D.
Singhania
, and
A.
Yao
, “
Rethinking CNN models for audio classification
,” arXiv:2007.11154 (
2020
).
32.
S. U.
Amin
,
M.
Alsulaiman
,
G.
Muhammad
,
M. A.
Mekhtiche
, and
M.
Shamim Hossain
, “
Deep learning for EEG motor imagery classification based on multi-layer CNNs feature fusion
,”
Future Gener. Comput. Syst.
101
,
542
554
(
2019
).
33.
M. A.
González
, “
Force fields and molecular dynamics simulations
,”
Éc. Thématique Soc. Fr. Neutronique
12
,
169
200
(
2011
).
34.
T.
Sundius
, “
Scaling of ab initio force fields by MOLVIB
,”
Vib. Spectrosc.
29
,
89
95
(
2002
).
35.
U.
Dinur
and
A. T.
Hagler
, “
New approaches to empirical force fields
,”
Rev. Comput. Chem.
2
,
99
164
(
1991
).
36.
J.
Behler
and
M.
Parrinello
, “
Generalized neural-network representation of high-dimensional potential-energy surfaces
,”
Phys. Rev. Lett.
98
,
146401
(
2007
).
37.
T.
Xie
and
J. C.
Grossman
, “
Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties
,”
Phys. Rev. Lett.
120
,
145301
(
2018
).
38.
J.
Klicpera
,
J.
Groß
, and
S.
Günnemann
, “
Directional message passing for molecular graphs
,” arXiv:2003.03123 (Published online
2020
).
39.
J.
Klicpera
,
F.
Becker
, and
S.
Günnemann
, “
GemNet: Universal directional graph neural networks for molecules
,” arXiv:2106.08903 (
2021
).
40.
K.
Schütt
,
P.-J.
Kindermans
,
H. E. S.
Felix
,
S.
Chmiela
,
A.
Tkatchenko
, and
K.-R.
Müller
, “
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions
,”
J. Chem. Phys.
148
,
241722
(
2018
).
41.
M.
Shuaibi
,
A.
Kolluru
,
A.
Das
,
A.
Grover
,
A.
Sriram
,
Z.
Ulissi
, and
C. L.
Zitnick
, “
Rotation invariant graph neural networks using spin convolutions
,” arXiv:2106.09575 (
2021
).
42.
Z.
Qiao
,
M.
Welborn
,
A.
Anandkumar
,
F. R.
Manby
, and
T. F.
Miller
 III
, “
OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features
,”
J. Chem. Phys.
153
,
124111
(
2020
).
43.
J.
Devlin
,
M.-W.
Chang
,
K.
Lee
, and
K.
Toutanova
, “
BERT: Pre-training of deep bidirectional transformers for language understanding
,” in
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
(
Association for Computational Linguistics
,
Minneapolis, Minnesota
,
2019
), pp.
4171
4186
.
44.
C.
Raffel
,
N.
Shazeer
,
A.
Roberts
,
K.
Lee
,
S.
Narang
,
M.
Matena
,
Y.
Zhou
,
W.
Li
, and
P. J.
Liu
, “
Exploring the limits of transfer learning with a unified text-to-text transformer
,”
J. Mach. Learn. Res.
21
,
1
67
(
2020
).
45.
T. B.
Brown
,
B.
Mann
,
N.
Ryder
,
M.
Subbiah
,
J.
Kaplan
,
P.
Dhariwal
,
A.
Neelakantan
,
P.
Shyam
,
G.
Sastry
,
A.
Askell
, et al, “
Language models are few-shot learners
,” arXiv:2005.14165 (
2020
).
46.
K.
He
,
R.
Girshick
, and
P.
Dollár
, “
Rethinking ImageNet pre-training
,” in
Proceedings of the IEEE/CVF International Conference on Computer Vision
(
IEEE
,
2019
), pp.
4918
4927
.
47.
Z.
Shen
,
Z.
Liu
,
J.
Li
,
Y.-G.
Jiang
,
Y.
Chen
, and
X.
Xue
, “
Object detection from scratch with deep supervision
,”
IEEE Trans. Pattern Anal. Mach. Intell.
42
,
398
412
(
2020
).
48.
Z.
Alyafeai
,
M. S.
AlShaibani
, and
I.
Ahmad
, “
A survey on transfer learning in natural language processing
,” arXiv:2007.04239 (
2020
).
49.
X.
Li
,
Y.
Grandvalet
,
F.
Davoine
,
J.
Cheng
,
Y.
Cui
,
H.
Zhang
,
S.
Belongie
,
Y.-H.
Tsai
, and
M.-H.
Yang
, “
Transfer learning in computer vision tasks: Remember where you come from
,”
Image Vision Comput.
93
,
103853
(
2020
).
50.
W.
Hu
,
B.
Liu
,
J.
Gomes
,
M.
Zitnik
,
P.
Liang
,
V.
Pande
, and
J.
Leskovec
, “
Strategies for pre-training graph neural networks
,” arXiv:1905.12265 (
2019
).
51.
Y.
Lu
,
X.
Jiang
,
Y.
Fang
, and
C.
Shi
, “
Learning to pre-train graph neural networks
,” in
Proceedings of the AAAI Conference on Artificial Intelligence
(
AAAI Press
,
2021
), Vol. 35, pp.
4276
4284
.
52.
M.
Xu
,
H.
Wang
,
B.
Ni
,
H.
Guo
, and
J.
Tang
, “
Self-supervised graph-level representation learning with local and global structure
,” arXiv:2106.04113 [cs.LG] (
2021
).
53.
W.
Jin
,
T.
Derr
,
H.
Liu
,
Y.
Wang
,
S.
Wang
,
Z.
Liu
, and
J.
Tang
, “
Self-supervised learning on graphs: Deep insights and new direction
,” arXiv:2006.10141 (
2020
).
54.
Y.
Rong
,
Y.
Bian
,
T.
Xu
,
W.
Xie
,
Y.
Wei
,
W.
Huang
, and
J.
Huang
, “
Self-supervised graph transformer on large-scale molecular data
,” arXiv:2007.02835 (
2020
).
55.
M.
Tsubaki
and
T.
Mizoguchi
, “
Quantum deep descriptor: Physically informed transfer learning from small molecules to polymers
,”
J. Chem. Theory Comput.
17
,
7814
7821
(
2021
).
56.
S. J.
Pan
and
Q.
Yang
, “
A survey on transfer learning
,”
IEEE Trans. Knowl. Data Eng.
22
,
1345
1359
(
2009
).
57.
S.
Chmiela
,
H. E.
Sauceda
,
K.-R.
Müller
, and
A.
Tkatchenko
, “
Towards exact molecular dynamics simulations with machine-learned force fields
,”
Nat. Commun.
9
,
3887
(
2018
).
58.
A.
Vaswani
,
N.
Shazeer
,
N.
Parmar
,
J.
Uszkoreit
,
L.
Jones
,
A. N.
Gomez
,
Ł.
Kaiser
, and
I.
Polosukhin
, “
Attention is all you need
,” in
Advances in Neural Information Processing Systems
(
MIT Press
,
2017
), pp.
5998
6008
.
59.
E.
Voita
,
D.
Talbot
,
F.
Moiseev
,
R.
Sennrich
, and
I.
Titov
, “
Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned
,” arXiv:1905.09418 (
2019
).

Supplementary Material

You do not currently have access to this content.