Neural network-based first-principles method for predicting heat of formation (HOF) was previously demonstrated to be able to achieve chemical accuracy in a broad spectrum of target molecules [L. H. Hu et al., J. Chem. Phys. 119, 11501 (2003)]. However, its accuracy deteriorates with the increase in molecular size. A closer inspection reveals a systematic correlation between the prediction error and the molecular size, which appears correctable by further statistical analysis, calling for a more sophisticated machine learning algorithm. Despite the apparent difference between simple and complex molecules, all the essential physical information is already present in a carefully selected set of small molecule representatives. A model that can capture the fundamental physics would be able to predict large and complex molecules from information extracted only from a small molecules database. To this end, a size-independent, multi-step multi-variable linear regression-neural network–B3LYP method is developed in this work, which successfully improves the overall prediction accuracy by training with smaller molecules only. And in particular, the calculation errors for larger molecules are drastically reduced to the same magnitudes as those of the smaller molecules. Specifically, the method is based on a 164-molecule database that consists of molecules made of hydrogen and carbon elements. 4 molecular descriptors were selected to encode molecule’s characteristics, among which raw HOF calculated from B3LYP and the molecular size are also included. Upon the size-independent machine learning correction, the mean absolute deviation (MAD) of the B3LYP/6-311+G(3df,2p)-calculated HOF is reduced from 16.58 to 1.43 kcal/mol and from 17.33 to 1.69 kcal/mol for the training and testing sets (small molecules), respectively. Furthermore, the MAD of the testing set (large molecules) is reduced from 28.75 to 1.67 kcal/mol.

1.
L. H.
Chen
,
G. L.
Kenyon
,
F.
Curtin
,
S.
Harayama
,
M. E.
Bembenek
,
G.
Hajipour
, and
C. P.
Whitman
,
J. Biol. Chem.
267
,
17716
(
1992
).
2.
J. B.
Foresman
,
A. E.
Frisch
, and
I.
Gaussian
,
Exploring Chemistry with Electronic Structure Methods
(
Gaussian, Inc.
,
1996
).
3.
B. K.
Raghavachari
,
B. B.
Stefanov
, and
L. A.
Curtiss
,
Mol. Phys.
91
,
555
(
1997
).
4.
L. A.
Curtiss
,
K.
Raghavachari
,
P. C.
Redfern
, and
J. A.
Pople
,
J. Chem. Phys.
106
,
1063
(
1997
).
5.
P.
Winget
and
T.
Clark
,
J. Comput. Chem.
25
,
725
(
2004
).
6.
M.
Saeys
,
M.
Reyniers
,
G. B.
Marin
,
V.
Van Speybroeck
, and
M.
Waroquier
,
J. Phys. Chem. A
107
,
9147
(
2003
).
7.
Y.
Feng
,
L.
Liu
,
J. T.
Wang
,
H.
Huang
, and
Q. X.
Guo
,
J. Chem. Inf. Comput. Sci.
43
,
2005
(
2003
).
8.
F.
Yao
,
X. Y.
Dong
,
Y. M.
Wang
,
L.
Liu
, and
Q. X.
Guo
,
Chin. J. Chem.
23
,
474
(
2005
).
9.
E. J.
Brändas
and
E. S.
Kryachko
,
Fundamental World of Quantum Chemistry: A Tribute to the Memory of Per-Olov Löwdin Volume III
(
Springer Netherlands
,
2004
).
10.
E. I.
Izgorodina
,
M. L.
Coote
, and
L.
Radom
,
J. Phys. Chem. A
109
,
7558
(
2005
).
11.
M. D.
Wodrich
,
C.
Corminboeuf
, and
P.
von Ragué Schleyer
,
Org. Lett.
8
,
3631
(
2006
).
12.
P.
Hohenberg
and
W.
Kohn
,
Phys. Rev.
136
,
B864
(
1964
).
13.
W.
Kohn
and
L. J.
Sham
,
Phys. Rev.
140
,
A1133
(
1965
).
14.
L. H.
Hu
,
X. J.
Wang
,
L. H.
Wong
, and
G. H.
Chen
,
J. Chem. Phys.
119
,
11501
(
2003
).
15.
X.
Zheng
,
L. H.
Hu
,
X. J.
Wang
, and
G. H.
Chen
,
Chem. Phys. Lett.
390
,
186
(
2004
).
16.
X. J.
Wang
,
L. H.
Hu
,
L. H.
Wong
, and
G. H.
Chen
,
Mol. Simulat.
30
,
9
(
2004
).
17.
H.
Li
,
L. L.
Shi
,
M.
Zhang
,
Z. M.
Su
,
X. J.
Wang
,
L. H.
Hu
, and
G. H.
Chen
,
J. Chem. Phys.
126
,
144101
(
2007
).
18.
J.
Wu
and
X.
Xu
,
J. Chem. Phys.
129
,
164103
(
2008
).
19.
M. D.
Wodrich
and
C.
Corminboeuf
,
J. Phys. Chem. A
113
,
3285
(
2009
).
20.
J.
Behler
and
M.
Parrinello
,
Phys. Rev. Lett.
98
,
146401
(
2007
).
21.
J.
Behler
,
R.
Martoňák
,
D.
Donadio
, and
M.
Parrinello
,
Phys. Rev. Lett.
100
,
185501
(
2008
).
22.
R. M.
Balabin
and
E. I.
Lomakina
,
J. Chem. Phys.
131
,
074104
(
2009
).
23.
J.
Wu
and
X.
Xu
,
J. Chem. Phys.
127
,
214105
(
2007
).
24.
M.
Rupp
,
A.
Tkatchenko
,
K.
Müller
, and
O. A.
von Lilienfeld
,
Phys. Rev. Lett.
108
,
058301
(
2012
).
25.
J.
Sun
,
J.
Wu
,
T.
Song
,
L. H.
Hu
,
K. L.
Shan
, and
G. H.
Chen
,
J. Phys. Chem. A
118
,
9120
(
2014
).
26.
Y.
Zhou
,
J.
Wu
, and
X.
Xu
,
J. Comput. Chem.
37
,
1175
(
2016
).
27.
A. E.
Reed
,
L. A.
Curtiss
, and
F.
Weinhold
,
Chem. Rev.
88
,
899
(
1988
).
28.
M. J.
Frisch
,
G. W.
Trucks
,
H. B.
Schlegel
,
G. E.
Scuseria
,
M. A.
Robb
,
J. R.
Cheeseman
,
J. A.
Montgomery
, Jr.
,
T.
Vreven
,
K. N.
Kudin
, and
J. C.
Burant
, gaussian 03, Revision C. 02,
Gaussian, Inc.
,
Wallingford, CT
,
2004
.
29.
G. M.
Furnival
and
R. W.
Wilson
,
Technometrics
16
,
499
(
1974
).
30.
R. C.
Team
, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing (
2017
).
31.
D. J. C.
MacKay
,
Neural Comput.
4
,
415
(
1991
).
32.
P. P.
Rodriguez
and
D.
Gianola
, brnn: Bayesian Regularization for Feed-Forward Neural Networks, R package version 0.6 (
2016
).

Supplementary Material

You do not currently have access to this content.