We provide a very simple adaptation of our recently published quantum Monte Carlo algorithm in full configuration-interaction (Slater determinant) spaces which dramatically reduces the number of walkers required to achieve convergence. A survival criterion is imposed for newly spawned walkers. We define a set of initiator determinants such that progeny of walkers spawned from such determinants onto unoccupied determinants are able to survive, while the progeny of walkers not in this set can survive only if they are spawned onto determinants which are already occupied. The set of initiators is originally defined to be all determinants constructable from a subset of orbitals, in analogy with complete-active spaces. This set is dynamically updated so that if a noninitiator determinant reaches an occupation larger than a preset limit, it becomes an initiator. The new algorithm allows sign-coherent sampling of the FCI space to be achieved with relatively few walkers. Using the $N2$ molecule as an illustration, we show that rather small initiator spaces and numbers of walkers can converge with submilliHartree accuracy to the known full configuration-interaction (FCI) energy (in the cc-pVDZ basis), in both the equilibrium geometry and the multiconfigurational stretched case. We use the same method to compute the energy with cc-pVTZ and cc-pVQZ basis sets, the latter having an FCI space of over $1015$ with very modest computational resources.

We have recently published an exact algorithm for simulation of Fermion systems in Slater determinant spaces based on stochastic population dynamics of a set of annihilating positive and negative walkers, which simulates the underlying imaginary time Schrödinger equation of the interacting Hamiltonian.^{1} We demonstrated that the algorithm converges onto the full configuration-interaction (FCI) energy and wave function, as long as the number of walkers in the simulation exceeds a (system-dependent) critical number $(Nc)$. $Nc$ was generally found to scale with the size of the FCI space $(Ndet)$, although for some systems the fraction $fc=Nc/Ndet$ could be as small as $fc\u224810\u22125$ whereas for other systems it was close to 1, meaning that we needed almost as many walkers as there are determinants to achieve convergence.

Discovering methods which reduce the number of walkers required to achieve convergence has been the source of much investigation in our group. In this communication we will describe a very simple strategy which achieves this in a dramatic way. For example, we have been able to reduce the number of walkers needed to converge $N2$ cc-pVDZ (with an FCI space of $5.4\xd7108$ determinants) from $2.7\xd7108$ to $105$ while achieving a sub-$mEh$ level of accuracy. Translated into computer resources, this implies a gain of three orders of magnitude. Furthermore, we show that similar numbers of walkers can be used to converge the vastly larger problem of $N2$ with a cc-pVQZ basis which has an FCI space exceeding $1015$.

We briefly outline the original algorithm; more details being provided in Ref. 1. We assume that we have at our disposal a set of $2M$ orthonormal one-particle orbitals [usually, but not necessarily, restricted Hartree–Fock (HF) orbitals] and the associated one- and two-body molecular integrals (which have been generated by the HF solver). The latter are used to compute matrix elements of the full Hamiltonian operator according to standard prescriptions. The basic entity in our simulation is a signed walker which permanently lives on the Slater determinant it was born. At each cycle of the algorithm, a walker on determinant $Di$ attempts to die with a probability given by $pd=\tau (\u27e8Di|H\u2212E0|Di\u27e9\u2212S)$ where $S$ is an energy shift and $\tau $ the timestep. In addition to this death/cloning step, each walker attempts to spawn a new walker at a connected determinant $Dj$ with a probability $ps=[\tau |\u27e8Dj|H|Di\u27e9|]/[pgen(j\u2223i)]$, where $pgen(j\u2223i)$ is the generation probability of $j$ from $i$. The sign of the child spawned is determined by the sign of the Hamiltonian matrix element and the parent; it has the same sign as the parent if the matrix element is negative and opposite to the parent if the matrix element is positive. After each walker has attempted the two steps above, there is a global annihilation step in which walkers of the opposite sign on the same determinant are removed from the simulation. The simulation can be run in constant shift $(S)$ mode or in constant of number of walkers $(Nw)$ mode. In the latter case, $S$ is adjusted regularly in order to keep $Nw$ approximately fixed. As discussed in Ref. 1 in a converged calculation, the time-averaged value of $S$ equals the correlation energy. The total energy can also be measured directly from the long-time ensemble average of the walker distribution, according to the projection formula

where $j$ runs over the single and double excitations of a reference determinant $D0$ (here taken to be the HF determinant) and $Nj$ being the population of walkers on determinant $j$. $E0=\u27e8D0|H|D0\u27e9$ is the energy of the reference determinant $D0$.^{2}

We now introduce an additional rule in the algorithm, *which defines a survival criterion for newly spawned walkers*: some determinants (dubbed *initiator* determinants by analogy to chain-reactions) are endowed with the ability to spawn progeny onto unoccupied determinants. However, progeny of the remaining noninitiator determinants survive only if they are spawned onto a determinant which is already occupied. The exception to this rule is the special case whereby two noninitiator determinants spawn walkers of the same sign onto a third previously unoccupied determinant in one iteration. This is deemed to be a sign-coherent event and thus simultaneously spawned walkers such as these are allowed to live. The set of initiator determinants is chosen in two ways. First, a fixed set of initiator determinants are chosen according to a complete-active-space (CAS) criterion; i.e., a set $(n,m)$ of $n$ electrons distributed in $m$ spatial orbitals about the Fermi energy, such that a determinant where all $n$ electrons reside within the $m$ orbitals is an initiator, all other determinants being noninitiators. During the simulation, however, the set of initiator *determinants* can also be dynamically enlarged so that if a noninitiator determinant acquires a population which exceeds a preset value $(na)$, then it becomes an initiator determinant, i.e., acquires the ability to spawn progeny onto unoccupied determinants, and stays so as long as its population exceeds $na$. With this, the rigidity of the fixed CAS space is avoided and the initiator space is free to adopt any set of determinants, decided by the Hamiltonian of the system. This is a significant advantage when comparing to multireference (MR) methods.

Clearly, the above rules tend to the original, exact algorithm in three limits: first, as the number of walkers becomes very large, all determinants acquire a population and so all progeny survive, regardless of the state of the parent. Second, as we enlarge the fixed initiator space toward the full space, again all progeny will survive since all parent walkers are initiators. Finally, as $na$ tends toward 0, all determinants are included in the initiator space. On the other hand, as we shall show below, the new rules effectively allow a dramatic reduction in the number of walkers.

We have tested this idea on the $N2$ molecule in various basis sets. The $N2$ molecule has been extensively studied as a prototype of a strongly correlated system, particularly as it is stretched to dissociation and has proven to be very challenging. $N2$ is thus a stringent test for this method and accurate results indicate an ability to cope with many subsequent systems. This will be tested in studies yet to come in which a wide range of molecules are to be investigated. The FCI energy of $N2$ is known in the cc-pVDZ basis, but the larger cc-pVTZ and cc-pVQZ basis sets are out of reach of conventional FCI calculations. However, these larger basis sets are needed for a proper description of the dynamical correlation.

As a first example, we consider the $N2$ molecule in the cc-pVDZ basis. In Ref. 1, we showed that the FCI energy of this system could be obtained with sub-$mEh$ accuracy but required $2.7\xd7108$ and $3.45\xd7108$ walkers at equilibrium and stretched geometries, respectively. By contrast, consider the evolution of the system with the additional rules starting with a (10,8) initiator space—which includes only 396 determinants—and setting $na=3$. We started the simulation with one walker on $D0$, and let the population grow until 100 000 walkers had been attained. After this, we changed to constant $Nw$ mode with the shift varying and allowed the population dynamics to run for $5\xd7105$ time steps (of 0.001 a.u. imaginary time). During this run, the dynamic enlargement added a further $\u27e8Nini\u27e9=2250$ determinants to the initiator space. Figure 1 shows the energy computed according to Eq. (1). Runs with different random number of seeds are shown, all of which converge to within sub-$mEh$ accuracy of the known exact result.

We also show the pattern of growth in both the total number of walkers and the number at $D0$ $(\u27e8N0\u27e9)$. This behavior is markedly different from the original algorithm which in constant $S$ mode would show an initial explosive (fast exponential) growth until the “annihilation plateau” is reached. This is followed by a period of time in this plateau, then a slower exponential rise whose rate is determined by the correlation energy. The population at $D0$ typically would only begin to reach meaningful levels once in the second exponential growth phase. With the additional rules, however, the initial fast exponential growth phase and the annihilation plateau either disappears completely or is severely reduced. Instead we obtain essentially from the very beginning, a slow exponential growth in the number of walkers which resembles the second growth phase of the original algorithm. When the shift is allowed to vary, it quickly settles down to fluctuate around the correlation energy. The number of walkers at $D0$ also grows exponentially from the outset. This is important because the population at $D0$ determines the accuracy to which the projected energy $E$ of Eq. (1) can be measured. As a general rule, a simulation which achieves several thousand walkers on $D0$ will obtain the correlation energy to within one part in $103$ which is sub-$mEh$ accuracy for these systems. $N0$ thus provides a measure of the total number of walkers required in the simulation.

We have looked at variations in both the fixed initiator size and the value of $na$, as well as different numbers of walkers, ranging from $104$ to $107$. The results are presented in Table I. The following can be seen. First, as the initiator spaces are enlarged, an annihilation plateau appears, and it is necessary to go to larger numbers of walkers before convergence can be achieved. However, the accuracy of the final result does not depend on the choice of fixed (CAS) initiator space. In fact, unnecessarily large spaces appear to be inefficient, as they make the accumulation of walkers at $D0$ more difficult. Smaller fixed initiator spaces are therefore preferred and dynamic enlargement of the space, shown by $\u27e8Nini\u27e9$, becomes crucial to the accuracy. A very large $na$ prevents any dynamic enlargement at all and determinants beyond single and double excitations of the initiator space may only be accessed via the special case of simultaneous spawning from two noninitiator determinants. The resulting energy is consequently too high. (Note if this special case is disallowed and $na\u2192\u221e$, then in the limit of large $Nw$, this method becomes comparable to MRCISD in a basis of HF orbitals). However, by reducing $na$ until we no longer lower the energy, the point at which all significant determinants are being included in the initiator space may be deduced. The optimal value for this case appears to be $na=3$, with which sub-$mEh$ accuracy is obtained with only a few thousand initiator determinants.

Initiator space . | $Nw$ . | $Nc/103$ . | $\u27e8N0\u27e9/103$ . | $\u27e8Nini\u27e9/103$ . | $(E\u2212EFCI$)/$mEh$ . |
---|---|---|---|---|---|

Equilibrium VDZ $(2.068a0)$ | |||||

(10,7) | $106$ | 0 | 18 | 35 | 0.02(6) |

(10,8) | $104$ | 0 | 0.330 | 0.22 | 2.2(8) |

(10,8) | $105$ | 0 | 2.25 | 2.24 | 0.1(2) |

$(10,8)na=3$ | $106$ | 0 | 17.2 | 33 | 0.01(7) |

$(10,8)na=5$ | $106$ | 0 | 13.8 | 18.7 | 0.19(5) |

$(10,8)na\u2192\u221e$ | $106$ | 0 | 368 | 0 | 2.43(4) |

(10,8) | $107$ | 0 | 154.0 | 349 | 0.02(3) |

(10,11) | $106$ | 4.4 | 16.8 | 30.8 | −0.07(9) |

(10,14) | $106$ | 248 | 13.0 | 21 | 0.04(7) |

CCSD | 13.46 | ||||

CCSD(T) | 1.71 | ||||

CASSCF (10,8) | 174 | ||||

CASPT2 (10,8) | 18.7 | ||||

$MRCISD(CASSCF\u2002ref.)$ (10,8) | 7.31 | ||||

Stretched VDZ $(4.2a0)$ | |||||

(10,7) | $106$ | 0 | 4 | 32 | 0.4(8) |

(10,8) | $104$ | 0 | 0.08 | 0.14 | 16(5) |

(10,8) | $105$ | 0 | 0.5 | 2.84 | 1(3) |

(10,8) | $106$ | 0 | 4 | 29 | −0.4(7) |

(10,8) | $107$ | 0 | 40 | 403 | 0.05(60) |

(10,11) | $106$ | 50 | 3.5 | 27 | 0.6(8) |

(10,14) | $106$ | 630 | 1 | 12 | 1(1) |

CASSCF (10,8) | 185 | ||||

CASPT2 (10,8) | 12.4 | ||||

$MRCISD(CASSCF\u2002ref.)$ (10,8) | 8.20 |

Initiator space . | $Nw$ . | $Nc/103$ . | $\u27e8N0\u27e9/103$ . | $\u27e8Nini\u27e9/103$ . | $(E\u2212EFCI$)/$mEh$ . |
---|---|---|---|---|---|

Equilibrium VDZ $(2.068a0)$ | |||||

(10,7) | $106$ | 0 | 18 | 35 | 0.02(6) |

(10,8) | $104$ | 0 | 0.330 | 0.22 | 2.2(8) |

(10,8) | $105$ | 0 | 2.25 | 2.24 | 0.1(2) |

$(10,8)na=3$ | $106$ | 0 | 17.2 | 33 | 0.01(7) |

$(10,8)na=5$ | $106$ | 0 | 13.8 | 18.7 | 0.19(5) |

$(10,8)na\u2192\u221e$ | $106$ | 0 | 368 | 0 | 2.43(4) |

(10,8) | $107$ | 0 | 154.0 | 349 | 0.02(3) |

(10,11) | $106$ | 4.4 | 16.8 | 30.8 | −0.07(9) |

(10,14) | $106$ | 248 | 13.0 | 21 | 0.04(7) |

CCSD | 13.46 | ||||

CCSD(T) | 1.71 | ||||

CASSCF (10,8) | 174 | ||||

CASPT2 (10,8) | 18.7 | ||||

$MRCISD(CASSCF\u2002ref.)$ (10,8) | 7.31 | ||||

Stretched VDZ $(4.2a0)$ | |||||

(10,7) | $106$ | 0 | 4 | 32 | 0.4(8) |

(10,8) | $104$ | 0 | 0.08 | 0.14 | 16(5) |

(10,8) | $105$ | 0 | 0.5 | 2.84 | 1(3) |

(10,8) | $106$ | 0 | 4 | 29 | −0.4(7) |

(10,8) | $107$ | 0 | 40 | 403 | 0.05(60) |

(10,11) | $106$ | 50 | 3.5 | 27 | 0.6(8) |

(10,14) | $106$ | 630 | 1 | 12 | 1(1) |

CASSCF (10,8) | 185 | ||||

CASPT2 (10,8) | 12.4 | ||||

$MRCISD(CASSCF\u2002ref.)$ (10,8) | 8.20 |

Also in Table I are presented results for stretched $N2$, using the same parameters as above. It can be seen that this problem behaves similarly to the equilibrium geometry, except that generally somewhat more walkers are necessary to achieve convergence. Increasing numbers of walkers from $104$ to $107$, the error in the energy drops from $16mEh$ to $0.05mEh$, essentially by an order of magnitude for each order of magnitude increase in walker number. Two sources of error can be identified: a systematic error due to insufficient numbers of walkers and a stochastic error (as measured by blocking analysis) which can be reduced by running the simulations for longer. The results show that the systematic error is strongly reduced as the number of walkers is increased, and that one needs some $106$ walkers to obtain sub-$mEh$ accuracy. Consistent with the discussion above, this corresponds to a few thousand walkers on $D0$ and some $3\xd7104$ initiator determinants. It should be emphasized that although the computational demands of these calculations are one order of magnitude greater than at equilibrium geometry, this nevertheless represents a very cheap calculation (using about 25 Mbyte memory) to converge the FCI energy.

The experience developed above provides us with two useful simulation protocols. First, runs with different numbers of walkers should be done, monitoring the total energy for each. Convergence should be obtained in the sense that increasing walker number should not lead to a statistically significant lowering of the energy. Second, it is useful to monitor the number of walkers on $D0$. In constant $Nw$ mode, this number should ideally remain well above $103$ while the projected energy is being averaged.

The results for the cc-pVTZ and cc-pVQZ basis sets are shown in Table II. The FCI spaces have $2.62\xd71012$ and $1.55\xd71015$ determinants, respectively (which is reduced by effectively a factor of two using time-reversal symmetry). For comparison we have also included CCSD, CCSD(T), CASSCF, CASPT2, and MRCISD results for equilibrium geometry, and CASSCF, CASPT2, and MRCISD at the stretched geometry (the coupled cluster methods diverge at this geometry). For both basis sets, we used the (10,8) fixed initiator spaces. At equilibrium geometry, simulations with $106$ walkers result in an $N0$ which exceed $103$, and the corresponding energy can be calculated to sub-$mEh$ accuracy. Our predicted energies are between $1mEh$ and $2mEh$ below CCSD(T) which at equilibrium is a very good theory. At stretched geometry, we need some $107$ walkers before $N0$ substantially exceeds $103$, but we still are able to obtain error bars in the sub-$mEh$ range. The CASSCF and CASPT2/MRCISD energies are higher in both cases.

. | $Nw$ . | VTZ . | VQZ . | ||||
---|---|---|---|---|---|---|---|

$N0/103$ . | $\u27e8Nini\u27e9/103$ . | $Etot$/$Eh$ . | $N0/103$ . | $\u27e8Nini\u27e9/103$ . | $Etot$/$Eh$ . | ||

Equilibrium$(2.068a0)$ | $105$ | 0.85 | 1.3 | −109.373 9(2) | 0.6 | 0.85 | −109.400 1(2) |

$106$ | 5.5 | 14 | −109.375 2(2) | 3.3 | 9.761 | −109.404 8(2) | |

$107$ | 35 | 192 | −109.374 9(1) | 18.7 | 115 | −109.405 2(1) | |

CCSD | −109.355 35 | −109.384 26 | |||||

CCSD(T) | −109.373 70 | −109.404 30 | |||||

CASSCF (10,8) | −109.131 57 | −109.139 44 | |||||

CASPT2 (10,8) | −109.352 79 | −109.384 52 | |||||

$MRCISD(CASSCF\u2002ref.)$ (10,8) | −109.361 09 | −109.389 47 | |||||

Stretched$(4.2a0)$ | $105$ | 240 | 1.2 | −109.028(3) | 0.15 | 0.84 | −109.044(2) |

$106$ | 1.3 | 16 | −109.034(3) | 0.9 | 11 | −109.057 0(15) | |

$107$ | 8.9 | 188 | −109.042 1(5) | 4.7 | 130 | −109.061 2(8) | |

CASSCF (10,8) | −108.799 45 | −108.799 45 | |||||

CASPT2 (10,8) | −109.031 22 | −109.031 22 | |||||

$MRCISD(CASSCF\u2002ref.)$ (10,8) | −109.025 36 | −109.025 36 |

. | $Nw$ . | VTZ . | VQZ . | ||||
---|---|---|---|---|---|---|---|

$N0/103$ . | $\u27e8Nini\u27e9/103$ . | $Etot$/$Eh$ . | $N0/103$ . | $\u27e8Nini\u27e9/103$ . | $Etot$/$Eh$ . | ||

Equilibrium$(2.068a0)$ | $105$ | 0.85 | 1.3 | −109.373 9(2) | 0.6 | 0.85 | −109.400 1(2) |

$106$ | 5.5 | 14 | −109.375 2(2) | 3.3 | 9.761 | −109.404 8(2) | |

$107$ | 35 | 192 | −109.374 9(1) | 18.7 | 115 | −109.405 2(1) | |

CCSD | −109.355 35 | −109.384 26 | |||||

CCSD(T) | −109.373 70 | −109.404 30 | |||||

CASSCF (10,8) | −109.131 57 | −109.139 44 | |||||

CASPT2 (10,8) | −109.352 79 | −109.384 52 | |||||

$MRCISD(CASSCF\u2002ref.)$ (10,8) | −109.361 09 | −109.389 47 | |||||

Stretched$(4.2a0)$ | $105$ | 240 | 1.2 | −109.028(3) | 0.15 | 0.84 | −109.044(2) |

$106$ | 1.3 | 16 | −109.034(3) | 0.9 | 11 | −109.057 0(15) | |

$107$ | 8.9 | 188 | −109.042 1(5) | 4.7 | 130 | −109.061 2(8) | |

CASSCF (10,8) | −108.799 45 | −108.799 45 | |||||

CASPT2 (10,8) | −109.031 22 | −109.031 22 | |||||

$MRCISD(CASSCF\u2002ref.)$ (10,8) | −109.025 36 | −109.025 36 |

Why does this simple algorithm prove so effective? We believe the reason is this: the new rule of “the survival of the fittest” means that whenever we add a new determinant to the system, it comes from a determinant whose sign has been well-established, either by virtue of being in a select group of CAS determinants, or by developing a sufficient population that makes it probable that its progeny will be sign-coherent as well. As a result the whole space can be sign-coherently sampled much more easily than with the original algorithm in which sign-coherence arose only when annihilation events became probable. This generally required a large number of walkers. It is remarkable that a very small number of determinants is able to endow sign-coherence to the whole space, even in the strongly multiconfigurational problem of stretched $N2$. This is an extremely interesting feature of the algorithm, as once sign-coherence is achieved, the correct sampling of enormous FCI spaces becomes possible. It means that the dynamical correlation which large basis sets capture can be calculated with essentially FCI accuracy. This is a significant advance over MR perturbation methods such as CASPT2 which at present are the only viable way to treat such difficult problems. The algorithmic simplicity of the method is also an attractive feature. We will be examining the scaling properties of the new algorithm in a forthcoming paper, as well as applying it to other strongly correlated systems where both dynamic and static correlation are integral to the system.