Crystallographic group is an important character to describe the crystal structure, but it is difficult to identify the crystallographic group of crystal when only chemical composition is given. Here, we present a machine-learning method to predict the crystallographic group of crystal structure from its chemical formula. 34528 stable compounds in 230 crystallographic groups are investigated, of which 72% of data set are used as training set, 8% as validation set, and 20% as test set. Based on the results of machine learning, we present a model which can obtain correct crystallographic group in the top-1, top-5, and top-10 results with the estimated accuracy of 60.8%, 76.5%, and 82.6%, respectively. In particular, the performance of deep-learning model presents high generalization through comparison between validation set and test set. Additionally, 230 crystallographic groups are classified into 19 new labels, denoting 18 heavily represented crystallographic groups with each containing more than 400 compounds and one combination group of remaining compounds in other 212 crystallographic groups. A deep-learning model trained on 19 new labels yields a promising result to identify crystallographic group with the estimated accuracy of 72.2%. Our results provide a promising approach to identify crystallographic group of crystal structures only from their chemical composition.

[2]
J. F.
Nye
,
Physical Properties of Crystals: Their Representation by Tensors and Matrices
,
Oxford
:
Clarendon Press
, (
1972
).
[4]
S.
Curtarolo
,
D.
Morgan
, and
G.
Ceder
,
CALPHAD
29
,
163
(
2005
).
[5]
C.
Wolverton
,
X. Y.
Yan
,
R.
Vijayaraghavan
, and
V.
Ozolins
,
Acta Mater.
50
,
2187
(
2002
).
[6]
S.
Kirkpatrick
,
C. D.
Gelatt
, and
M. P.
Vecchi
,
Science
220
,
671
(
1983
).
[7]
J.
Pannetier
,
J.
Bassasalsina
,
J.
Rodriguezcarvajal
, and
V.
Caignaert
,
Nature
346
,
343
(
1990
).
[8]
C. W.
Glass
,
A. R.
Oganov
, and
N.
Hansen
,
Comput. Phys. Commun.
175
,
713
(
2006
).
[9]
Y. M.
Ma
,
A. R.
Oganov
, and
C. W.
Glass
,
Phys. Rev. B
76
,
5
(
2007
).
[10]
S. M.
Woodley
,
P. D.
Battle
,
J. D.
Gale
, and
C. R. A.
Catlow
,
Phys. Chem. Chem. Phys.
6
,
1815
(
2004
).
[11]
D. C.
Lonie
and
E.
Zurek
,
Comput. Phys. Commun.
182
,
372
(
2011
).
[12]
Y.
Yao
,
J. S.
Tse
, and
K.
Tanaka
,
Phys. Rev. B
77
,
4
(
2008
).
[13]
L. J.
Zhang
,
Y. C.
Wang
,
X. X.
Zhang
, and
Y. M.
Ma
,
Phys. Rev. B
82
,
8
(
2010
).
[14]
A.
Laio
,
A.
Rodriguez-Fortea
,
F. L.
Gervasio
,
M.
Ceccarelli
, and
M.
Parrinello
,
J. Phys. Chem. B
109
,
6714
(
2005
).
[15]
Y. C.
Wang
,
J.
Lv
,
L.
Zhu
, and
Y. M.
Ma
,
Comput. Phys. Commun.
183
,
2063
(
2012
).
[16]
Y. C.
Wang
,
J. A.
Lv
,
L.
Zhu
, and
Y. M.
Ma
,
Phys. Rev. B
82
,
8
(
2010
).
[17]
Y.
Zhuo
,
A. M.
Tehrani
, and
J.
Brgoch
,
J. Phys. Chem. Lett.
9
,
1668
(
2018
).
[18]
Y.
Dong
,
C. H.
Wu
,
C.
Zhang
,
Y. D.
Liu
,
J. L.
Cheng
, and
J.
Lin
,
npj Comput. Mater.
5
,
8
(
2019
).
[19]
J.
Behler
and
M.
Parrinello
,
Phys. Rev. Lett.
98
,
4
(
2007
).
[20]
K. V. J.
Jose
,
N.
Artrith
, and
J.
Behler
,
J. Chem. Phys.
136
,
15
(
2012
).
[21]
J.
Carrasquilla
and
R. G.
Melko
,
Nat. Phys.
13
,
431
(
2017
).
[22]
J.
Behler
,
R.
Martonak
,
D.
Donadio
, and
M.
Parrinello
,
Phys. Status Solidi (b)
245
,
2618
(
2008
).
[23]
G.
Pilania
,
J. E.
Gubernatis
, and
T.
Lookman
,
Phys. Rev. B
91
,
214302
(
2015
).
[24]
G.
Pilania
,
J. E.
Gubernatis
, and
T.
Lookman
,
Sci. Rep.
5
, (
2015
).
[25]
F. A.
Faber
,
A.
Lindmaa
,
O. A.
von Lilienfeld
, and
R.
Armiento
,
Phys. Rev. Lett.
117
,
6
(
2016
).
[26]
K.
Tran
and
Z. W.
Ulissi
,
Nat. Catal.
1
,
696
(
2018
).
[27]
P.
Raccuglia
,
K. C.
Elbert
,
P. D. F.
Adler
,
C.
Falk
,
M. B.
Wenny
,
A.
Mollo
,
M.
Zeller
,
S. A.
Friedler
,
J.
Schrier
, and
A. J.
Norquist
,
Nature
533
,
73
(
2016
).
[28]
W. B.
Park
,
J.
Chung
,
J.
Jung
,
K.
Sohn
,
S. P.
Singh
,
M.
Pyo
,
N.
Shin
, and
K. S.
Sohn
,
IUCrJ
4
,
486
(
2017
).
[29]
Y.
Suzuki
,
H.
Hino
,
T.
Hawai
,
K.
Saito
,
M.
Kotsugi
, and
K.
Ono
,
Sci. Rep.
10
,
21790
(
2020
).
[30]
C. H.
Liu
,
Y. Z.
Tao
,
D.
Hsu
,
Q.
Du
, and
S. J. L.
Billinge
,
Acta Crystallogr. Sect. A
75
,
633
(
2019
).
[31]
K.
Kaufmann
,
C. Y.
Zhu
,
A. S.
Rosengarten
, and
K. S.
Vecchio
,
Microsc. Microanal.
26
,
447
(
2020
).
[32]
Y.
Zhao
,
Y. X.
Cui
,
Z.
Xiong
,
J.
Jin
,
Z. H.
Liu
,
R. Z.
Dong
, and
J. J.
Hu
,
ACS Omega
5
,
3596
(
2020
).
[33]
H. T.
Liang
,
V.
Stanev
,
A. G.
Kusne
, and
I.
Takeuchi
,
Phys. Rev. Mater.
4
,
123802
(
2020
).
[34]
A.
Jain
,
S. P.
Ong
,
G.
Hautier
,
W.
Chen
,
W. D.
Richards
,
S.
Dacek
,
S.
Cholia
,
D.
Gunter
,
D.
Skinner
,
G.
Ceder
, and
K. A.
Persson
,
APL Mater.
1
,
11
(
2013
).
[35]
H.
Yamada
,
C.
Liu
,
S.
Wu
,
Y.
Koyama
,
S. H.
Ju
,
J.
Shiomi
,
J.
Morikawa
, and
R.
Yoshida
,
ACS Central Sci.
5
,
1717
(
2019
).
[36]
Y.
LeCun
,
Y.
Bengio
, and
G.
Hinton
,
Nature
521
,
436
(
2015
).
[37]
Y.
Bengio
,
Found. Trends Mach. Learn.
2
,
1
(
2009
).
[39]
I. A.
Basheer
and
M.
Hajmeer
,
J. Microbiol. Methods
43
,
3
(
2000
).
[40]
Y.
LeCun
,
B.
Boser
,
J. S.
Denker
,
D.
Henderson
,
R. E.
Howard
,
W.
Hubbard
, and
L. D.
Jackel
,
Neural Comput.
1
,
541
(
1989
).
[41]
[42]
I. J.
Goodfellow
,
J.
Pouget-Abadie
,
M.
Mirza
,
B.
Xu
,
D.
Warde-Farley
,
S.
Ozair
,
A.
Courville
, and
Y.
Bengio
, in Advances in Neural Information Processing Systems 27,
Z.
Ghahramani
,
M.
Welling
,
C.
Cortes
,
N. D.
Lawrence
,
K. Q.
Weinberger
, Eds.,
La Jolla
:
Neural Information Processing Systems (Nips
),
27
(
2014
).
[43]
D. E.
Rumelhart
,
G. E.
Hinton
, and
R. J.
Williams
,
Nature
323
,
533
(
1986
).
[44]
R. H. R.
Hahnloser
,
R.
Sarpeshkar
,
M. A.
Mahowald
,
R. J.
Douglas
, and
H. S.
Seung
,
Nature
405
,
947
(
2000
).
[45]
N.
Srivastava
,
G.
Hinton
,
A.
Krizhevsky
,
I.
Sutskever
, and
R.
Salakhutdinov
,
J. Mach. Learn. Res.
15
,
1929
(
2014
).
[46]
E.
Kreyszig
,
Advanced Engineering Mathematics : Maple Computer Guide
,
New Jersey
:
John Wiley & Sons, Inc
., (
2000
).
[47]
Christopher M.
Bishop
,
Pattern Recognition and Machine Learning (Information Science and Statistics
),
Springer-Verlag
, (
2006
).
[48]
D.
Kingma
and
J.
Ba
,
In 3rd International Conference on Learning Representations
(
2014
) arXiv:1412.6980.
[49]
N. V.
Chawla
,
K. W.
Bowyer
,
L. O.
Hall
, and
W. P.
Kegelmeyer
,
J. Artif. Intell. Res.
16
,
321
(
2002
).
[50]
G.
Chuan
,
G.
Pleiss
,
S.
Yu
, and
K. Q.
Weinberger
,
In Proceedings of the 34th International Conference on Machine Learning
70
,
1321
(
2017
) arXiv:1706.04599.
This content is only available via PDF.

Supplementary Material