Coupled cluster theory is a vital cornerstone of electronic structure theory and is being applied to ever-larger systems. Stochastic approaches to quantum chemistry have grown in importance and offer compelling advantages over traditional deterministic algorithms in terms of computational demands, theoretical flexibility, or lower scaling with system size. We present a highly parallelizable algorithm of the coupled cluster Monte Carlo method involving sampling of clusters of excitors over multiple time steps. The behavior of the algorithm is investigated on the uniform electron gas and the water dimer at coupled-cluster levels including up to quadruple excitations. We also describe two improvements to the original sampling algorithm, full non-composite, and multi-spawn sampling. A stochastic approach to coupled cluster results in an efficient and scalable implementation at arbitrary truncation levels in the coupled cluster expansion.
REFERENCES
Our implementation does not require canonical orbitals, so we have the full freedom to choose any transformation of the single particle orbitals, for example the study in Ref. 50.
We set in order to ensure a normalised probability.
Booth et al. used a custom hash function based upon the list of occupied orbitals. We find hashing the bit string representation of the determinant simpler and computationally more efficient, whilst giving at least as good distribution over processors when a hash function of sufficient quality is used, and therefore use the MurmurHash2 function.
A node may consist of a single processor or multiple processors. Within the MPI paradigm, we distribute over MPI ranks, where each rank contains one or more threads.
We note that available computational resources rarely grow polynomially with system size!.
Using MRCC,28 a single iteration was not completed within a week on a 32-core node on the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (http://www.csd3.cam.ac.uk/). This provides a lower estimate of 2 node-months, or 46000 core hours.
These approaches were conceived and implemented some time before the work in Ref. 49, and briefly referred to in the same, and are reported in fuller detail now for completeness.