This paper presents techniques for Fock matrix construction that are designed for high performance on shared and distributed memory parallel computers when using Gaussian basis sets. Four main techniques are considered. (1) To calculate electron repulsion integrals, we demonstrate batching together the calculation of multiple shell quartets of the same angular momentum class so that the calculation of large sets of primitive integrals can be efficiently vectorized. (2) For multithreaded summation of entries into the Fock matrix, we investigate using a combination of atomic operations and thread-local copies of the Fock matrix. (3) For distributed memory parallel computers, we present a globally accessible matrix class for accessing distributed Fock and density matrices. The new matrix class introduces a batched mode for remote memory access that can reduce the synchronization cost. (4) For density fitting, we exploit both symmetry (of the Coulomb and exchange matrices) and sparsity (of 3-index tensors) and give a performance comparison of density fitting and the conventional direct calculation approach. The techniques are implemented in an open-source software library called GTFock.

1.
V. R.
Saunders
and
M. F.
Guest
, “
Applications of the CRAY-1 for quantum chemistry calculations
,”
Comput. Phys. Commun.
26
,
389
395
(
1982
).
2.
P. M. W.
Gill
,
M.
Head-Gordon
, and
J. A.
Pople
, “
Efficient computation of two-electron-repulsion integrals and their nth-order derivatives using contracted Gaussian basis sets
,”
J. Phys. Chem.
94
,
5564
5572
(
1990
).
3.
K.
Wolinski
,
R.
Haacke
,
J. F.
Hinton
, and
P.
Pulay
, “
Methods for parallel computation of SCF NMR chemical shifts by GIAO method: Efficient integral calculation, multi-Fock algorithm, and pseudodiagonalization
,”
J. Comput. Chem.
18
,
816
825
(
1997
).
4.
K.
Yasuda
, “
Two-electron integral evaluation on the graphics processor unit
,”
J. Comput. Chem.
29
,
334
342
(
2008
).
5.
I. S.
Ufimtsev
and
T. J.
Martinez
, “
Quantum chemistry on graphical processing units. 1. Strategies for two-electron integral evaluation
,”
J. Chem. Theory Comput.
4
,
222
231
(
2008
).
6.
A.
Asadchev
,
V.
Allada
,
J.
Felder
,
B. M.
Bode
,
M. S.
Gordon
, and
T. L.
Windus
, “
Uncontracted Rys quadrature implementation of up to G functions on graphical processing units
,”
J. Chem. Theory Comput.
6
,
696
704
(
2010
).
7.
N.
Luehr
,
I. S.
Ufimtsev
, and
T. J.
Martinez
, “
Dynamic precision for electron repulsion integral evaluation on graphical processing units (GPUs)
,”
J. Chem. Theory Comput.
7
,
949
954
(
2011
).
8.
K. A.
Wilkinson
,
P.
Sherwood
,
M. F.
Guest
, and
K. J.
Naidoo
, “
Acceleration of the GAMESS-UK electronic structure package on graphical processing units
,”
J. Comput. Chem.
32
,
2313
2318
(
2011
).
9.
Y.
Miao
and
K. M.
Merz
, “
Acceleration of electron repulsion integral evaluation on graphics processing units via use of recurrence relations
,”
J. Chem. Theory Comput.
9
,
965
976
(
2013
).
10.
E. F.
Valeev
, A library for the evaluation of molecular integrals of many-body operators over Gaussian functions,
2014
, http://libint.valeyev.net/.
11.
Q.
Sun
, “
Libcint: An efficient general integral library for Gaussian basis functions
,”
J. Comput. Chem.
36
,
1664
1671
(
2015
).
12.
J.
Zhang
, “
libreta: Computerized optimization and code synthesis for electron repulsion integral evaluation
,”
J. Chem. Theory Comput.
14
,
572
587
(
2018
).
13.
B. P.
Pritchard
and
E.
Chow
, “
Horizontal vectorization of electron repulsion integrals
,”
J. Comput. Chem.
37
,
2537
2546
(
2016
).
14.
H.
Huang
and
E.
Chow
, “
Accelerating quantum chemistry with vectorized and batched integrals
,” in
SC’18: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
(
IEEE Press
,
Dallas, TX
,
2018
).
15.
V.
Mironov
,
Y.
Alexeev
,
K.
Keipert
,
M.
D’mello
,
A.
Moskovsky
, and
M. S.
Gordon
, “
An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor
,” in
SC’17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
(
Association for Computing Machinery
,
Denver, CO
,
2017
).
16.
I. T.
Foster
,
J. L.
Tilson
,
A. F.
Wagner
,
R. L.
Shepard
,
R. J.
Harrison
,
R. A.
Kendall
, and
R. J.
Littlefield
, “
Toward high-performance computational chemistry: I. Scalable Fock matrix construction algorithms
,”
J. Comput. Chem.
17
,
109
123
(
1996
).
17.
R. J.
Harrison
,
M. F.
Guest
,
R. A.
Kendall
,
D. E.
Bernholdt
,
A. T.
Wong
,
M.
Stave
,
J. L.
Anchell
,
A. C.
Hess
,
R. J.
Littlefield
,
G. L.
Fann
,
J.
Nieplocha
,
G. S.
Thomas
,
D.
Elwood
,
J. L.
Tilson
,
R. L.
Shepard
,
A. F.
Wagner
,
I. T.
Foster
,
E.
Lusk
, and
R.
Stevens
, “
Toward high-performance computational chemistry: II. A scalable self-consistent field program
,”
J. Comput. Chem.
17
,
124
132
(
1996
).
18.
T. R.
Furlani
,
J.
Kong
, and
P. M. W.
Gill
, “
Parallelization of SCF calculations within Q-Chem
,”
Comput. Phys. Commun.
128
,
170
177
(
2000
).
19.
Y.
Alexeev
,
R. A.
Kendall
, and
M. S.
Gordon
, “
The distributed data SCF
,”
Comput. Phys. Commun.
143
,
69
82
(
2002
).
20.
J.
Baker
,
K.
Wolinski
,
M.
Malagoli
,
D.
Kinghorn
,
P.
Wolinski
,
G.
Magyarfalvi
,
S.
Saebo
,
T.
Janowski
, and
P.
Pulay
, “
Quantum chemistry in parallel with PQS
,”
J. Comput. Chem.
30
,
317
335
(
2009
).
21.
K.
Ishimura
,
K.
Kuramoto
,
Y.
Ikuta
, and
S.-a.
Hyodo
, “
MPI/OpenMP hybrid parallel algorithm for Hartree–Fock calculations
,”
J. Chem. Theory Comput.
6
,
1075
1080
(
2010
).
22.
M.
Valiev
,
E.
Bylaska
,
N.
Govind
,
K.
Kowalski
,
T.
Straatsma
,
H.
Van Dam
,
D.
Wang
,
J.
Nieplocha
,
E.
Aprà
,
T.
Windus
, and
W. A.
de Jong
, “
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
,”
Comput. Phys. Commun.
181
,
1477
1489
(
2010
).
23.
H.
Umeda
,
Y.
Inadomi
,
T.
Watanabe
,
T.
Yagi
,
T.
Ishimoto
,
T.
Ikegami
,
H.
Tadano
,
T.
Sakurai
, and
U.
Nagashima
, “
Parallel Fock matrix construction with distributed shared memory model for the FMO-MO method
,”
J. Comput. Chem.
31
,
2381
2388
(
2010
).
24.
Y.
Alexeev
,
A.
Mahajan
,
S.
Leyffer
,
G.
Fletcher
, and
D. G.
Fedorov
, “
Heuristic static load-balancing algorithm applied to the fragment molecular orbital method
,” in
SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
(
IEEE Computer Society Press
,
2012
), pp.
1
13
.
25.
X.
Liu
,
A.
Patel
, and
E.
Chow
, “
A new scalable parallel algorithm for Fock matrix construction
,” in
2014 IEEE 28th International Parallel and Distributed Processing Symposium
(
IEEE Computer Society Press
,
2014
), pp.
902
914
.
26.
E.
Chow
,
X.
Liu
,
S.
Misra
,
M.
Dukhan
,
M.
Smelyanskiy
,
J. R.
Hammond
,
Y.
Du
,
X.-K.
Liao
, and
P.
Dubey
, “
Scaling up Hartree-Fock calculations on Tianhe-2
,”
Int. J. High Perform. Comput. Appl.
30
,
85
102
(
2015
).
27.
T.
Nakajima
,
M.
Katouda
,
M.
Kamiya
, and
Y.
Nakatsuka
, “
NTChem: A high-performance software package for quantum molecular simulation
,”
Int. J. Quantum Chem.
115
,
349
359
(
2015
).
28.
E.
Chow
,
X.
Liu
,
M.
Smelyanskiy
, and
J. R.
Hammond
, “
Parallel scalability of Hartree-Fock calculations
,”
J. Chem. Phys.
142
,
104103
(
2015
).
29.
J.
Nieplocha
,
B.
Palmer
,
V.
Tipparaju
,
M.
Krishnan
,
H.
Trease
, and
E.
Aprà
, “
Advances, applications and performance of the global arrays shared memory programming toolkit
,”
Int. J. High Perform. Comput. Appl.
20
,
203
231
(
2006
).
30.
H.-J.
Werner
,
P. J.
Knowles
,
G.
Knizia
,
F. R.
Manby
, and
M.
Schütz
, “
Molpro: A general-purpose quantum chemistry program package
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
2
,
242
253
(
2012
).
31.
M. F.
Guest
,
I. J.
Bush
,
H. J. J.
Van Dam
,
P.
Sherwood
,
J. M. H.
Thomas
,
J. H.
Van Lenthe
,
R. W. A.
Havenith
, and
J.
Kendrick
, “
The GAMESS-UK electronic structure package: Algorithms, developments and applications
,”
Mol. Phys.
103
,
719
747
(
2005
).
32.
M.
Wang
,
A. J.
May
, and
P. J.
Knowles
, “
Improved version of parallel programming interface for distributed data with multiple helper servers
,”
Comput. Phys. Commun.
182
,
1502
1506
(
2011
).
33.
D. G.
Fedorov
,
R. M.
Olson
,
K.
Kitaura
,
M. S.
Gordon
, and
S.
Koseki
, “
A new hierarchical parallelization scheme: Generalized distributed data interface (GDDI), and an application to the fragment molecular orbital method (FMO)
,”
J. Comput. Chem.
25
,
872
880
(
2004
).
34.
D.
Ozog
,
A.
Kamil
,
Y.
Zheng
,
P.
Hargrove
,
J. R.
Hammond
,
A.
Malony
,
W.
de Jong
, and
K.
Yelick
, “
A Hartree–Fock application using UPC++ and the new DArray library
,” in
2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
(
IEEE
,
2016
), pp.
453
462
.
35.
Y.
Zheng
,
A.
Kamil
,
M. B.
Driscoll
,
H.
Shan
, and
K.
Yelick
, “
UPC++: A PGAS extension for C++
,” in
2014 IEEE 28th International Parallel and Distributed Processing Symposium
(
IEEE
,
2014
), pp.
1105
1114
.
36.
J. L.
Whitten
, “
Coulombic potential energy integrals and approximations
,”
J. Chem. Phys.
58
,
4496
4501
(
1973
).
37.
E.
Baerends
,
D.
Ellis
, and
P.
Ros
, “
Self-consistent molecular Hartree–Fock–Slater calculations I. The computational procedure
,”
Chem. Phys.
2
,
41
51
(
1973
).
38.
B. I.
Dunlap
,
J. W. D.
Connolly
, and
J. R.
Sabin
, “
On first-row diatomic molecules and local density models
,”
J. Chem. Phys.
71
,
4993
(
1979
).
39.
O.
Vahtras
,
J.
Almlöf
, and
M.
Feyereisen
, “
Integral approximations for LCAO-SCF calculations
,”
Chem. Phys. Lett.
213
,
514
518
(
1993
).
40.
C. D.
Sherrill
, “
Frontiers in electronic structure theory
,”
J. Chem. Phys.
132
,
110902
(
2010
).
41.
J. M.
Turney
,
A. C.
Simmonett
,
R. M.
Parrish
,
E. G.
Hohenstein
,
F. A.
Evangelista
,
J. T.
Fermann
,
B. J.
Mintz
,
L. A.
Burns
,
J. J.
Wilke
,
M. L.
Abrams
,
N. J.
Russ
,
M. L.
Leininger
,
C. L.
Janssen
,
E. T.
Seidl
,
W. D.
Allen
,
H. F.
Schaefer
,
R. A.
King
,
E. F.
Valeev
,
C. D.
Sherrill
, and
T. D.
Crawford
, “
PSI4: An open-source ab initio electronic structure program
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
2
,
556
565
(
2012
).
42.
D. E.
Bernholdt
and
R. J.
Harrison
, “
Large-scale correlated electronic structure calculations: The RI-MP2 method on parallel computers
,”
Chem. Phys. Lett.
250
,
477
484
(
1996
).
43.
H. A.
Früchtl
,
R. A.
Kendall
,
R. J.
Harrison
, and
K. G.
Dyall
, “
An implementation of RI-SCF on parallel computers
,”
Int. J. Quantum Chem.
64
,
63
69
(
1997
).
44.
L.
Maschio
, “
Local MP2 with density fitting for periodic systems: A parallel implementation
,”
J. Chem. Theory Comput.
7
,
2818
2830
(
2011
).
45.
T.
Shiozaki
, “
Bagel: Brilliantly advanced general electronic-structure library
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1331
(
2018
).
46.
S.
Obara
and
A.
Saika
, “
Efficient recursive computation of molecular integrals over Cartesian Gaussian functions
,”
J. Chem. Phys.
84
,
3963
3974
(
1986
).
47.
S.
Obara
and
A.
Saika
, “
General recurrence formulas for molecular integrals over Cartesian Gaussian functions
,”
J. Chem. Phys.
89
,
1540
1559
(
1988
).
48.
H.
Shan
,
S.
Williams
,
W.
de Jong
, and
L.
Oliker
, “
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
,” in
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM’15
(
ACM
,
New York, NY, USA
,
2015
), pp.
58
67
.
49.
Message Passing Interface Forum
, MPI: A Message-Passing Interface Standard, Version 3.1 (
High Performance Computing Center Stuttgart
,
2015
).
50.
J.
Dongarra
,
S.
Hammarling
,
N. J.
Higham
,
S. D.
Relton
,
P.
Valero-Lara
, and
M.
Zounon
, “
The design and performance of batched BLAS on modern high-performance computing systems
,”
Procedia Comput. Sci.
108
,
495
504
(
2017
).
51.
H.-J.
Werner
,
F. R.
Manby
, and
P. J.
Knowles
, “
Fast linear scaling second-order Moller-Plesset perturbation theory (MP2) using local and density fitting approximations
,”
J. Chem. Phys.
118
,
8149
8160
(
2003
).
52.
F.
Weigend
,
A.
Köhn
, and
C.
Hättig
, “
Efficient use of the correlation consistent basis sets in resolution of the identity MP2 calculations
,”
J. Chem. Phys.
116
,
3175
3183
(
2002
).
You do not currently have access to this content.