Quadratic Scaling Bosonic Path Integral Molecular Dynamics

Bosonic exchange symmetry leads to fascinating quantum phenomena, from exciton condensation in quantum materials to the superfluidity of liquid Helium-4. Unfortunately, path integral molecular dynamics (PIMD) simulations of bosons are computationally prohibitive beyond $\mathord{\sim} 100$ particles, due to a cubic scaling with the system size. We present an algorithm that reduces the complexity from cubic to quadratic, allowing the first simulations of thousands of bosons using PIMD. Our method is orders of magnitude faster, with a speedup that scales linearly with the number of particles and the number of imaginary time slices (beads). Simulations that would have otherwise taken decades can now be done in days. In practice, the new algorithm eliminates most of the added computational cost of including bosonic exchange effects, making them almost as accessible as PIMD simulations of distinguishable particles.

Exchange symmetry between indistinguishable particles is central to quantum phenomena such as Bose-Einstein condensation and superfluidity.However, enforcing it poses a formidable challenge to PIMD.It requires summing over all permutations of the identical particles in the quantum partition function, leading to a different set of ring polymers for each permutation where the rings of exchanged particles are connected [23].As a result, a straightforward implementation of PIMD for indistinguishable particles scales exponentially, O(P N !), where N is the number of particles.For that reason, while efficient sampling of permutations is possible in Path Integral Monte Carlo (PIMC) [23][24][25][26][27][28][29][30][31][32], exchange effects were not included in PIMD simulations until recently, except for very few particles [33][34][35].
In previous work, we showed [36] that bosonic PIMD simulations can be performed with only cubic scaling, O(P N 3 ).Based on a recurrence relation for the ring polymer potential and forces, the algorithm allowed the first PIMD simulations of moderately large bosonic sys- * hirshb@tauex.tau.ac.il tems; the largest application to date consists of N = 128 atoms [37].Fermionic systems whose sign problem is not too severe were also simulated using a related approach [38][39][40][41][42][43].Nevertheless, even for bosonic systems, the cubic scaling hindered applications to systems with more than ∼100 particles.
In this paper, we present an algorithm that reduces the scaling of bosonic PIMD from cubic to quadratic, O(N 2 + P N ).Our new algorithm generates the exact same potential, forces, and trajectories as the previous algorithm, only more efficiently.The outcome is that bosonic PIMD simulations are O(P N ) times faster than before.This significant improvement allows us to perform simulations of thousands of bosonic particles, which were previously beyond reach using PIMD.For example, a simulation of N = 1372 4 He atoms in the superfluid liquid phase takes ∼7 days, but would require over 14 years with the previous algorithm.We therefore dramatically reduce the added computational overhead of bosonic exchange in PIMD simulations in comparison to distinguishable particles.
In Section II A, we first describe the theory behind the original bosonic PIMD algorithm, and then explain how to reduce the scaling from cubic to quadratic in the evaluation of the ring polymer potential and forces.In Section II B, we verify the improved scaling of the algorithm numerically, and present the observed speedup thanks to our method in simulations of > 1000 of ultracold trapped atoms and superfluid liquid 4 He atoms.Section III presents our conclusions while Section IV provides additional computational details.

A. Theory
This section explains the new algorithm and how it achieves the reduced scaling.We start by reviewing bosonic PIMD, and the original, cubic-scaling algorithm.
1. Background: cubic scaling bosonic PIMD We consider a standard quantum Hamiltonian of N bosons of mass m, interacting through the potential Û , which is given by Ĥ = 1 2m N ℓ=1 p2 ℓ + Û (r 1 , ..., r N ).The canonical partition function at inverse temperature β = (k B T ) −1 is Z = Tr [e −β Ĥ ], where the trace is taken over a properly-symmetrized position basis.The resulting path integral expression [44] for this partition function is Equation ( 1) is the isomorphism [4] between the partition function of bosonic quantum particles and that of a system of classical ring polymers, where R ℓ = r 1 ℓ , . . ., r P ℓ are the positions of the P polymer beads that correspond to particle ℓ.It is exact in the limit P → ∞, and should be interpreted as follows: 1) The summation is over all the permutations σ of the N particles.2) Beads of the same index associated with different particles interact through a scaled potential Ū = 1 P P j=1 U (r j 1 , . . ., r j N ), which is invariant under particle permutations.We do not consider the potential Ū further since it is the same for bosonic and distinguishable particles.3) is the spring energy of the ring polymer configuration that corresponds to σ, as we now explain.
(2) Figure 1 depicts all the permutations of N = 3 particles, and the ring polymer configurations that correspond to them.
Equation ( 1) provides an expression for the bosonic partition function that can be sampled, at least in principle, through simulations of a classical system with ring polymer potential − 1 β ln 1 N !σ e −βE σ .However, the number of permutations N !grows exponentially, making this naive approach intractable.
Recently, we showed [36] that the exponential scaling can be avoided by sampling the ring polymer potential V [1,N ] = V [1,N ] (R 1 , . . ., R N ) defined through the following recurrence relation: The recursion is terminated by setting V [1,0] = 0.
In Equation ( 3), is the spring energy of the ring polymer obtained by connecting the beads of particles where In the following, we refer to E [N −k+1,N ] as cycle energies [45].The potential V [1,N ] is different from the potential obtained by the sum over all permutations, but it samples the bosonic partition function in an equivalent manner [36].
The recurrence relation defining V [1,N ] allows to compute both the potential and the derived forces in O(P N 3 ) time.To evaluate the potential, first the O(N 2 ) cycle energies are evaluated, each of which costs O(P N ) due to the double sum in Equation ( 4), resulting in O(P N 3 ) overall.The forces due to the potential, i.e., −∇ r j ℓ V [1,N ]  for all ℓ and j, are also computed in O(P N 3 ) time using a recurrence relation that is obtained from Equation (3) by taking the derivative.See the Supporting Information of ref. 36 for the full complexity analysis.
In this paper, we improve on this complexity and show that the potential and forces can be computed in O(N 2 + P N ) time, leading to significantly faster bosonic PIMD simulations.As in the previous method, the algorithm first computes the potential and then the forces on all beads.We achieve a faster evaluation of the potential by a recursive formula also for the cycle energies E [N −k+1,N ] , and a faster evaluation of the forces by writing a probabilistic expression for them.To reduce the computational cost of evaluating the potential, we evaluate the cycle energies by incrementally extending cycles one particle at a time.We obtain each E [u,v] from E [u+1,v] by "breaking" and "reforming" the cycle: ( In Equation (5), we first open the ring by removing the spring between the last bead of v and the first bead of u+ 1.We then connect u to u + 1, add the contribution from the interior springs of u (E (u) int defined below), and then close the ring by connecting v to u. Figure 2b illustrates this graphically.The term FIG. 1. Ring polymer configurations for all permutations over three particles.The first and last beads are marked in green (solid) and red (dashed), respectively.Each permutation is given in two notations: the two-line notation, 1 2 ... N σ(1) σ(2) ... σ(N ) , and the cycle notation, where each cycle is a sequence (ℓ, σ(ℓ), σ(σ(ℓ)), . ..), which must close due to the properties of permutations, and each element appears in exactly one cycle (see the supporting information, SI).Permutations that contribute to the potential V [1,N ] are enclosed in a square.
E [1,3]   (a) The interior springs of particle 1 are the same in all the cycles in which the particle participates.(b) The cycle energy E [1,3] can be obtained from E [2,3] by removing the term corresponding to the the spring closing the cycle from 3 to 2 (red, dotted), connecting particle 1 to the beginning of 2 (purple, solid), adding the interior springs of particle 1 (gold, solid), and closing the cycle from 3 to 1 (purple, solid).
summing over the springs that are unaffected by particle exchange (see Figure 2a).Note that j = P is not included in the summation.Extending cycles begins with cycles that contain just one particle: With this procedure, all the cycle energies are computed in O(N 2 +P N ) time: 1) summing over the P −1 interior springs of N particles (Equation ( 6)) takes O(P N ) time.2) Forming the single-particle cycles of Equation (7) takes additional O(N ), and 3) evaluating each cycle energy E [u,v] from E [u+1,v] (Equation ( 5)) is O(1), with O(N 2 ) such cycles.
Once the cycle energies are known, evaluating V [1,N ]  takes O(N 2 ) only, since it amounts to evaluating all the N potentials V [1,v] , each of which costs O(N ) due to the sum in Equation (3).This part of the algorithm is shown in lines 1 to 11 of Algorithm 1 in Appendix A. Reducing the time required to compute the potential does not directly reduce the scaling of the evaluation of the forces.In the original algorithm, the recurrence for the force is O(N 2 ) once the cycle energies are known.However, it needs to be evaluated separately per bead, which amounts to O(P N 3 ) overall.The techniques we use above cannot improve this further.To overcome this, we need to rewrite the forces in a new, probabilistic form.
a. Potential as a sum over representative permutations.To this end, we rewrite the potential of Equation (3).We show that it can be equivalently defined by a sum over the permutations, provided that they are transformed by a function G: Given a permutation σ, the function G returns a permutation G[σ] that is called the representative permutation of σ.For example, for three particles, we choose , and G[σ] = σ otherwise (the permutations are identified through their cycle notation; see Figure 1 and Appendix C).With this choice, Equation (8) for N = 3 gives which is exactly the result of expanding the recurrence of Equation (3).For a general N , G[σ] is defined by first ordering the cycles of σ is ascending order due to their largest element, then replacing all elements by 1, ..., N consecutively.The precise definition appears in Methods.The proof of Equation ( 8) appears in Theorem 1 The reason that Equation (8) transforms permutations using G is that V [1,N ] does not include all permutations.For example, Equation (9) does not include terms for the permutations (132) and (13)(2) (see Figure 1).However, Equation (8) proves that V [1,N ] indeed samples the correct quantum partition function, since G[σ] preserves the topology of σ by construction.
It might seem counter-intuitive to rewrite the recursive potential as a sum over representative permutations, because their number also grows exponentially (as shown in Appendix D).However, we never use this expression to evaluate the potential, but rather to derive an efficient expression for the force, as we now explain.
b. Force as an expectation value.Since the potential is an average over representative permutations, per Equation (8), taking the derivative yields that the force is a weighted average over the representative permutations: where For given positions of the beads, Equation ( 11) defines a Boltzmann distribution of permutations.Equation (10) states that the force on each bead r j ℓ is the expectation value of the force over this distribution.
In itself, this expression is still not amenable to efficient computation due to the sum over the permutations.However, it can be greatly simplified by observing that, in any permutation, the force on a bead depends only on its immediate neighbors.When taking the expectation value, we can group together all permutations that have the same force exerted on the bead.The calculation then splits to two cases: the case of interior beads (j = 2, . . ., P − 1), and the case of exterior beads (j = 1, P ).
c. Force on interior beads.The force on each interior bead is the same in all permutations: r j ℓ is always connected to r j+1 ℓ , r j−1 ℓ , for j = 2, . . ., P − 1 and any ℓ (see Figure 2a).Therefore, its expectation value is trivial, given by the standard PIMD expression, This expression requires O(1) time to evaluate, resulting in O(P N ) time for all the interior beads.Thus, our algorithm computes the force on most beads the same way as in ordinary PIMD, which already reduces much of the added cost of exchange effects.But to achieve quadratic scaling of the entire algorithm, we must compute the force on exterior beads efficiently as well.d.Force on exterior beads.For beads j = 1, P of each particle ℓ, different permutations do exert different forces, depending on its neighbors.We group together all permutations in which the neighbor is the same: and In these expressions, Pr [G[σ](ℓ) = ℓ ′ ] represents the probability that bead P of particle ℓ is connected to bead 1 of particle ℓ ′ .It is defined by summing the probabilities of Equation ( 11) over all permutations in which this condition holds in G[σ].
To demonstrate Equation ( 14) for N = 3, the force on bead P of particle 1 is where (23)  , 123)  , (see Equation ( 9) and Figure 1).Equations ( 13) and ( 14) imply that once the connection probabilities Pr [G[σ](ℓ) = ℓ ′ ] are known for all ℓ, ℓ ′ , the force on all exterior beads can be computed in O(N 2 ) time: there are 2N of them, and each is computed in O(N ) time due to the sum in Equations ( 13) and ( 14).Our final goal then is to precompute the connection probabilities efficiently.This is a mighty challenge, because the connection probability sums over an exponential number of permutations (as shown in Appendix D).
e. Connection probabilities.To derive efficient expressions for the connection probabilities, we define another set of potentials, V [N,N ] , V [N −1,N ] , . . ., V [1,N ]  through a similar recursion relation, Note that the first index changes, as opposed to Equation (3), and that the recursion is terminated by setting V [N +1,N ] = 0.As before, these potentials can be computed in O(N 2 ), since we already have the cycle energies.
Once the potentials V [u,N ] are known, we show that all the connection probabilities can be computed in O(N 2 ) time: there are N 2 such probabilities, and we provide expressions that compute each one in O(1) time.The intuition is that V [ℓ+1,N ] accounts for all the permutations over the particles ℓ + 1, . . ., N , similarly to how V [1,ℓ]  accounts for permutations over 1, . . ., ℓ.In this way, for example, we can express the connection probability for which is the complement of the probability that ℓ, ℓ + 1 do not belong to the same cycle in representative permutations.Expressions for the rest of the connection probabilities appear in Methods.
We provide a pseudocode of the force evaluation algorithm in Algorithm 1 of the SI.

B. Benchmarks & Applications
This section presents our numerical results for the improved performance of the algorithm.
a. Scaling.The improved scaling of our algorithm, quadratic rather than cubic, is shown numerically in Figure 3a for our LAMMPS [46] implementation.Here, we measure the time per simulation step as a function of the number of particles N on a log-log scale, for a system of trapped cold non-interacting bosons with P = 32 beads.The slope of the linear fit is 1.6 for our new algorithm (blue), compared to 2.5 for the original algorithm (green).In our i-PI [47] implementation, the slopes are 1.9 and 3.2 respectively (see Appendix B 1).
b. Speedup.The reduced scaling of the current algorithm implies a speedup of O(P N ) over the original algorithm.Figure 3b shows the speedup (the ratio of the time per step between the original and current algorithms) in the same system as in panel (a).The new algorithm achieves acceleration by orders of magnitude, ranging from ×21 faster simulations of N = 32 bosons to ×944 speedup for N = 1024.The speedup is close to linear with the number of particles (R 2 = 0.993), with a slope of ∼0.9.We note that the cost of evaluating pairwise interactions, if present, is the same in the current and original algorithm (and in ordinary PIMD).This reduces the speedup in such systems, but it remains significant; we find a linear speedup with slope 0.8 in liquid 4 He, ×6 for N = 32, and ×1069 for N = 1372 (see Appendix B).
c. Simulating > 1000 particles.Such speedups allow us to simulate systems far beyond what was previously possible in bosonic PIMD. Figure 4 shows a simulation of superfluid liquid 4 He, where we compute the pair correlation of N = 1372 atoms, with P = 96 beads.The pair correlation is in very good agreement with the PIMC results of Ceperley [23] (see Appendix B).This simulation takes ∼7 days in our algorithm, but would take longer than 14 years with the original algorithm (see Methods).Similarly, simulations of N = 1600 trapped cold bosons, with and without Gaussian repulsive interaction (see Appendix B), take less than ∼9 days, whereas the previous algorithm is estimated to require over 20 years to complete on the same hardware (see Methods).
d. Bosonic vs. ordinary PIMD.The new algorithm eliminates most of the added computational cost of bosonic exchange, and brings the scale of bosonic PIMD to be largely on par with ordinary PIMD. Figure 3a compares the new algorithm to the PIMD algorithm in LAMMPS for distinguishable particles (black).For 64 particles, there is only very little difference (37%) in the time per simulation step.For 1024 particle, bosonic simulations are merely ∼16 times slower than for distinguish- (a) The scaling of the original and current algorithms with the number of particles for a system of N noninteracting cold trapped bosons on a log-log scale.Blue circles and green squares represent the current and original algorithms, respectively.Black triangles represent simulations of distinguishable particles.Lines represent a linear fit to the simulation data.(b) The speedup gained by using the proposed algorithm, defined as the ratio between time per step in the original and current implementations.able particles.In the original algorithm, bosonic simulations of 1024 particles would have been ∼15000 times more expensive than simulations of distinguishable particles.

III. CONCLUSIONS
This paper presents an algorithm for bosonic PIMD with improved computational complexity, reducing the cost from cubic to quadratic with the number of particles.Our improved algorithm relies on two observations: 1) that the interior beads of each ring polymer are not affected by exchange interaction, and 2) that the bosonic ring polymer force can be evaluated efficiently as the expectation value of the force over a Boltzmann distribution of particle permutations.
Using our method, simulations of condensed phase bosonic systems with thousands of particles can be done in days instead of years, allowing the first PIMD simulations of thousands of interacting particles including bosonic exchange symmetry.
We believe that our algorithm is a significant step towards including bosonic exchange effects in PIMD simulations as routinely as performing standard simulations for distinguishable particles.The new algorithm will enable studies of systems such as exotic phases of superfluids under confinement using PIMD.Our insights might also be used to accelerate other algorithms where a similar recursion has been identified [48][49][50].Finally, it would be exciting to see whether approximate methods for obtaining time correlation functions based on PIMD, such as Matsubara dynamics, Ring Polymer Molecular Dynamics or Centroid Molecular Dynamics [3], could be devised also for bosonic systems.

A. Simulation details
We implemented our algorithm in development branches of LAMMPS [46] and i-PI [47].Unless otherwise stated, the results are obtained using the implementation in LAMMPS.We validated the proposed algorithm by reproducing results obtained before using the original method [36] on several bosonic systems with and without interaction, and extending some of them to larger system sizes, as detailed in Appendix B.
For the trapped bosons presented in Figure 3, we use P = 36, a time step of 1 fs, and average the time over 1000 simulation steps.For the superfluid liquid 4 He presented in Figure 4, the simulation was performed with the interaction potential of Aziz et al. [51], at a temperature of 1.379K and a density of 0.02182 Å−3 , in periodic boundary conditions in all spatial dimensions, and for 2.5 • 10 6 steps of 0.5 fs each, where the first 20% were used as equilibration.For the two-dimensional density of trapped bosons with and without Gaussian repulsive interaction (whose results are displayed in Appendix B) we use P = 36, appropriate for the temperature of 11.6K (βℏω = 3), and a time step of 1 fs.The simulations were performed for 3 • 10 6 steps with the first 20% discarded as equilibration.The repulsive potential is the same as described in ref. 36, with g = 3.

B. Estimating the run-time of the original algorithm
To estimate the time required for the implementation of the original algorithm to complete the simulation of N = 1600 trapped cold bosons with a Gaussian repulsive interaction, we simulated 1000 steps of the implementations of both the current and original algorithm, and extrapolated to 3•10 6 steps.The result was in agreement with the actual time required to perform the simulation in the new method, and actually slightly shorter, rendering our prediction conservative.Specifically, with the new method, 1000 steps lasted ∼150 seconds for g = 0 and ∼200 seconds for g = 3, which is extrapolated to ∼7 days for g = 0 and 5 days for g = 3; our actual simulation was slightly slower, ∼9 days for g = 0 and 7 days for g = 3, which is likely due to the presence of other CPU-heavy processes on the same machine.In contrast, with the original method, 1000 steps took ∼80 hours for g = 0 and ∼64 hours for g = 3, which is extrapolated to ∼27 years and ∼22 years, respectively, for 3 • 10 6 steps.
We estimated the time to simulate the system of superfluid 4 He in a similar manner.With the current algorithm, simulating 1000 steps on N = 1372 and P = 64 took ∼171 seconds, which is extrapolated to ∼5 days.In contrast, with the original algorithm, 1000 steps required ∼51 hours with P = 64, which is extrapolated to ∼14.5 years.This is a conservative estimate for the simulation with the original algorithm and P = 96, because increasing the number of beads increases the required time.We did not perform this estimation based on the time to perform 1000 steps with P = 96 because the original implementation uses a simple numerical stability procedure when evaluating sums of exponentials (see Appendix A 1), which is not sufficiently stable for large P and N .

C. Definition of G[σ]
In the context of rewriting the potential as a sum over representative permutations (Equation ( 8)), for a general N , the representative permutation G[σ] is defined by transforming the cycle notation of σ, in two steps.First, the cycles are sorted in ascending order according to the largest element in each cycle.For example, σ = (13)(2) is rewritten as (2) (13), because the highest element in the cycle (13) is larger than the highest element in (2).Second, the elements are replaced by the numbers 1, . . ., N consecutively while maintaining the same length and order of cycles.In the same example, (2)(13) turns into (1) (23), and overall G[(13)(2)] = (1) (23).

D. Expressions for connection probabilities
In the context of evaluating the forces (Equations ( 13) and ( 14)), we describe how to compute the connection probabilities based on the potentials defined in Equation (15).
Consider the connection probability Pr [G[σ](ℓ) = ℓ ′ ].For ℓ ′ > ℓ, the probability is nonzero only when particle ℓ is not the particle with the highest index in the ring, due to the properties of G[σ] (see Figure 1).In that case, ℓ is connected to the particle ℓ + 1.We prove in Theorem 2 (Appendix E) that the expression for this probability is the expression in Equation ( 16) above.
On the other hand, a connection ℓ ′ ≤ ℓ happens only when particle ℓ is the particle with highest index in the ring.We prove in Theorem 3 (Appendix E) that for ℓ ′ ≤ ℓ, The intuition underlying the proof is that V [ℓ+1,N ] accounts for all the permutations over the particles ℓ + 1, . . ., N , similarly to how V [1,ℓ] accounts for the permutations over 1, . . ., ℓ.Therefore, in Equation ( 16), we compute the complement of the probability that ℓ, ℓ + 1 do not belong to the same cycle.Similarly, in Equation (17), we account for all configurations that include a ring polymer that starts with particle ℓ ′ and ends with particle ℓ, and all possible permutations over the other particles.
// compute connection probabilities // compute force on interior beads for ℓ = 1 . . .N do for j = 2 . . .P − 1 do // compute force on exterior beads In this section, we provide the full pseudocode of our algorithm that was presented in the main text.The code appears in Algorithm 1.The algorithm first computes the potential V [1,N ] (lines 1 to 11), and then the forces −∇ r j ℓ V [1,N ] (lines 12 to 23).
For the potential, the algorithm first sums the spring energies due to interior springs (lines 1 to 3), which are then used to evaluate all cycle energies E [u,v] through a recurrence relation on u (lines 4 to 7).A recurrence relation on V [1,1] , V [1,2] , . . ., V [1,N −1] , V [1,N ] , which uses the cycle energies, yields the potential (lines 8 to 11).
For the forces, the algorithm first evaluates the potentials V [1,N ] , V [2,N ] , . . ., V [N −1,N ] , V [N,N ] through another recurrence relation (lines 12 to 15).The connection probabilities are computed based on all aforementioned potentials and cycle energies (lines 16 to 20).These are used to evaluate all the forces, separately on beads 1, P of all particles (lines 24 to 25), and on the interior beads j = 2, . . ., P − 1 of all particles (lines 21 to 23).The expressions for the force on beads 1, P are slightly different from the expressions in the main text.There, the sum is over all potential neighbors ℓ = 1, . . ., N , whereas line 24 have different limits to the sum, excluding terms for connection probabilities that vanish, Pr where for ℓ = 1 . . .N do for j = 1 . . .P do [1,v]   For comparison, a pseudocode for the original method [36] is shown in Algorithm 2. It has the same overall structure: the potential is computed (lines 1 to 8) by evaluating the cycle energies (lines 3 to 4) and using them in the recurrence relation (lines 5 to 8).However, the evaluation of the cycle energies is less efficient in this algorithm, and scales as O(P N 3 ) rather than O(N 2 + P N ).The original LAMMPS implementation did not separate the evaluation of the cycle energies from the recurrence relation.This has no effect on the asymptotic complexity, but may change the constant factor.Then, the forces are evaluated by a recurrence relation (lines 9 to 16), first evaluating spatial derivatives of cycle energies (lines 10 to 12) and then using a recurrence relation for the derivatives of the potential (lines 14 to 16).The latter step takes O(N 2 ) time for every bead, and O(P N 3 ) overall, worse than in the new algorithm.

Numerical Stability
Computing the potentials V [1,v] (lines 8 to 11 in Algorithm 1) and V [u,N ] (lines 12 to 15) involve a sum of exponentials, each of which may be too large to represent in ordinary floating-point representations.As described in the SI of Ref. 36, the trick is to shift all exponents by constant amount Ẽ. Line 11 of Algorithm 1 is replaced by The energy and radial density of N = 4 interacting particles with Gaussian repulsion of strength g (in harmonicoscillator units) in a harmonic trap and their agreement with previous results from numerically diagonalizing the Hamiltonian [52].

Numerical Validation
In this section, we present simulation results validating the correctness of our improved algorithm.Figure 6 shows the energy of trapped bosons in a harmonic trap: 1) as a function of N , at a temperature corresponding to ℏβω 0 = 6, and 2) as a function of temperature, with N = 16.We use P = 36, and a time step of 1 fs.The simulations were performed for 10 6 steps with the first 20% discarded as equilibration.Three independent runs were used; the error bars of one standard deviation are smaller than the marker size.The results of the simulation are compared with the analytical result.The maximal relative deviation is 0.5%.Figure 7 shows results for trapped bosons interacting through a Gaussian repulsive interaction (see Ref. 36 for details).We calculate the energy and radial density for N = 4 bosons for different interaction strengths, controlled by the parameter g.We use P = 72, and a time step of 1 fs.The simulations were performed for 10 • 10 6 steps with the first 20% discarded as equilibration.Three independent runs were used; the error bars of one standard deviation are smaller than the marker size.The results are compared to the results of Mujal et al. [52].The maximum relative deviation in the energy is 3.3% and the mean deviation is 1.5%, close to the deviations of Hirshberg et al. [36], which were 2.9% and 1.3% respectively.The mean relative Simulations of superfluid liquid 4 He.Left: the pair correlation of N = 1372 atoms at temperature 1.379K with and a density of 0.02182 Å−3 , compared with the PIMC results of Ceperley [23].Right: the speedup of the improved algorithm over the original method as a function of the number of 4 He molecules, with P = 64 beads, computed at 1.21K and the same density.deviation in the radial density is at most ∼5.3%.

Additional benchmarks and applications
Figure 8 compares the pair correlation function for superfluid liquid 4 He to the pair correlation from a PIMC simulation of [23] (see Fig. 16 there).The number of beads was chosen based on a convergence graph for N = 108 atoms at 1.21K (see Figure 9).
The speedup of the new algorithm over the original one in a system of superfluid liquid atoms is displayed in Figure 8.The simulation speedup as a function of the number of atoms N at a temperature of 1.21K and a density of 0.02182 Å−3 with P = 64 is approximately linear (R 2 = 0.997) with a slope of ∼0.8.We attribute the reduced speedup factor compared to the case of trapped bosons (displayed in the main text) to the cost of evaluating particle-particle interactions, which is not sped up by the new algorithm.
Figure 10 shows the result of simulations of N = 1600 trapped cold bosons, with and without Gaussian repulsive interaction.(For comparison, the largest trapped system studied in previous bosonic PIMD methods was for N = 64 [36].)We use P = 36, appropriate for the temperature of 11.6K (βℏω = 3) and a time step of 1 fs.The simulations Two-dimensional density of N = 1600 bosons in a harmonic trap without interaction (left) and with Gaussian repulsive interaction g = 3 (right).The simulations last ≤ 9 days, instead of > 20 years with the previous algorithm.
were performed for 3 • 10 6 steps with the first 20% discarded as equilibration.The repulsive potential is the same as described in ref. 36, with g = 3.As expected, the repulsive interaction between the particles pushes them to spread in the trap resulting in a wider two-dimensional density.
Our alternative form for the potential uses the concept of representative permutations: Definition 1.Given a permutation σ over N elements, the representative permutation G[σ] ∈ S[1, N ] is the permutation obtained from the cycle notation of σ as follows (and as explained in the main text): Start from the canonical cycle notation of σ, which crucially sorts the cycles of σ in increasing order in their highest element, σ = c 1 . . .c m , where max c i < max c i+1 .Construct G[σ] by replacing the elements in the cycles with 1, . . ., N in that order, that is, with max c ′ 0 = 0. Note that G[σ] is indeed always a permutation of 1, . . ., N , provided that σ is, and that it has the same cycle type, since |c ′ i | = |c i |.Our expression for the bosonic ring polymer potential is as follows: Definition 2. We define the potential V [1,N ] by Our first insight is that this definition captures exactly the bosonic ring polymer potential of Hirshberg et al. [36], as the following theorem shows: Theorem 1.The boson potentials V [1,1] , . . ., V [1,N ] defined in Definition 2 satisfy the recursion formula with the notation V [1,0] = 0.
The intuition underlying the proof is that the recursive definition is a manifestation of a recursive structure in G.The last cycle in G[σ] is of length k and always includes N .The other cycles in G[σ] repeat this process for the remaining N − k particles.The sum in Definition 2 corresponds to the different possible values of k = 1, ..., N , and the prefactor accounts for the fraction of permutations in which N is in a cycle of length k.The recursive structure of G and the ring polymer energy is formalized in the following lemma: Lemma 1.Let σ ∈ S[1, N ] be a permutation where the cycle to which N belongs contains k elements, that is, |c σ (N )| = k.Consider the permutation σ \ c σ (N ), that is, the permutation over N − k elements that is constructed from the cycles of σ excluding the cycle c σ (N ).Then Proof.In the canonical cycle notation of σ, the last cycle is always c σ (N ).Hence, in the representative permutation, according to Definition 1, this is transformed to The claim follows because E G [σ] factors to a sum over the cycles of G[σ].
We are now ready to prove the theorem.
Proof of Theorem 1.We split the sum over all permutations according the size of the cycle to which the element N belongs.
Consider the term on the right, the sum over all permutations over N elements where the cycle of N is of length k.We will transform it to use a sum over all permutations over N − k elements only.To this end, define a function f , that given a permutation σ over N − k possibly non-consecutive elements, returns a permutation of [1, N − k], by renaming the elements in σ to 1, . . ., N − k while respecting the order.It is easy to see that the representative permutation is unaltered by f : G[σ] = G[f (σ)], seeing that the renaming of the cycles that f performs does not change the relative order between them in the canonical cycle notation.Thus, it is valid to replace the representative of σ ′ with the representative of f (σ ′ ): We now introduce a sum over permutations over [1, N − k].At first, this sum is trivial, over only one element, because f is a function: Changing summation order, Noting that the summand depends on σ ′ but not on σ, Consider the number from the set cardinality in Equation (E9).We now show that it is N −1 k−1 • (k − 1)! for every σ ′ .To see this, consider h c (σ) = f (σ \ c σ (N )) that is defined the set of permutations σ ∈ S[1, N ] s.t. that c σ (N ) is determined to a specific set c. We argue that h is one-to-one from this set to S[1, N − k].To see that h c is surjective, given σ ′ ∈ S[1, N − k], we can construct σ s.t.c σ (N ) = c and h c (σ) = σ ′ by adding to c the cycles of σ ′ under the "inverse" renaming, from the elements [1, N − k] to the elements [1, N ] \ c while respecting the order.To see that , then, because σ 1 \ c σ1 (N ) and σ 2 \ c σ2 (N ) are both permutations over the same set of elements, than the renaming that f applies to both is the same, which means that σ 1 \ c σ1 (N ) = σ 2 \ c σ2 (N ) and thus also which is what we wanted.Plugging this result to Equation (E9) yields We can now introduce the potential over fewer elements: We now plug this expression back into Equation (E4): The claim follows because

Force as Expected Force
The expression of the potential as a sum over representative permutations yields an expression for the forces derived from the potential.As in the main text, we define the connection probability distribution over permutations by [1,N ] .
Then the force due to V [1,N ] can be written as the expectation value of the force in representative permutations: Lemma 2. For every bead j of particle ℓ, Proof.According to Definition 3, V [1,N ]  [σ] .Taking the derivative gives which yields the desired expression since , where the last equality uses Definition 2.

Connection Probabilities
This section derives the expressions for the connection probabilities and the recurrence on the potentials used to compute these probabilities.
First we define the potentials V [1,N ] , V [2,N ] , . . ., V [N −1,N ] , V [N,N ] , which will allow us to summary the potential on subsets of particles.First, we define them in a way that is different from the definition in the main text; we derive the latter, which is an equivalent form that is amenable to efficient computation, is reconstructed in Theorem 4. The definition that we use here is similar to the recursion relation that holds for V [1,1] , V [1,2] , . . ., V [1,N −1] , V [1,N ]  per Theorem 1, except that here, the recursion is terminated earlier: Notice that for u = 1 this agrees with the definition of V [1,N ] , per Theorem 1.
Intuitively, V [u,N ] is the potential of the ring polymers on particles [u, N ] in the representative permutations, which is why it is useful to derive the probabilities that a cycle ends at a particular particle.Although the recurrence in this definition takes O(N 2 ) time to evaluate for each s, resulting in O(N 3 ) overall, we will show that this can be reduced to O(N 2 ) using a different recurrence relation for V [u,N ] .For now, this definition is useful to derive expressions for the connection probabilities.
for some v ≥ ℓ + 1. (Intuitively, as G "scans" the interval [1, N ] according to the cycles of σ, it chooses to cut a cycle between ℓ and ℓ + 1.) Observe, from the definition of G, that G  N ] ) = e −βV [1,N ]   seeing that V [N +1,N ] = 0, and from the definition of V [1,N ] (Definition 2).For the induction step, assume that the claim holds for all v < N , and prove for N .By the same reasoning as in Equations (E2) to (E10)-considering the first cycle that G creates and representing the rest of the construction of G[σ] as the application of G to permutations on a subset of the elements-we have .(E14) By the induction hypothesis (for which, due to the recursive definition of V [u,N ] (Definition 3), is = e −βV [1,ℓ]  e −βV [ℓ+1,N ] , (E17) as required.= 1 ℓ 1 e −βV [1,N ] e −β(V [1,u−1] +E [u,ℓ] +V [ℓ+1,N ] ) , which we do by induction on N ≥ ℓ.Essentially, The proof uses the same inductive proof as in the previous theorems to "peel away" the part [ℓ + 1, N ] through any number of cycles, then cuts the cycle [u, ℓ], and continues to consider all permutations on the elements [1, u − 1].
For the base case N = ℓ, the condition that G[σ] snip at ℓ + 1 = N + 1 is considered vacuous, and we are at exactly the same scenario we had in the proof of Theorem 1, where we are interested in the sum of all permutations where the cycle of N has k elements, for k = N − u + 1. Exactly as shown there, this is which is the desired 1 ℓ e −β(V [1,u−1] +E [u,ℓ] ) since N = ℓ.The induction steps follows the same pattern: summing over all the ways to choose k elements in the cycle containing N , and summing over all the permutations on N − k elements on which G snips at the appropriate places: .
By the induction hypothesis (for N − k), Proof.Easy to see from the definition of G that all the cycles that G[σ] generates connect ℓ to either ℓ + 1 (if they are part of the same cycle) or a to a lower element (if ℓ is the endpoint of a cycle).
Finally, we use the above results about connection probabilities to derive a recurrence relation that enables an efficient computation of V [u,N ] :

2 .
Reducing the potential to O(N 2 + P N )

3 .
Reducing the force to O(N 2 + P N ) FIG. 3.(a) The scaling of the original and current algorithms with the number of particles for a system of N noninteracting cold trapped bosons on a log-log scale.Blue circles and green squares represent the current and original algorithms, respectively.Black triangles represent simulations of distinguishable particles.Lines represent a linear fit to the simulation data.(b) The speedup gained by using the proposed algorithm, defined as the ratio between time per step in the original and current implementations.

FIG. 4 .
FIG.4.The pair correlation of N = 13724 He atoms in the superfluid liquid phase at temperature 1.379K with P = 96 beads.The simulation lasts ≤ 7 days, instead of > 14 years with the previous algorithm.

FIG. 6 .
FIG.6.The energy of non-interacting particles in a harmonic trap as a function of N with fixed temperature corresponding to βℏω = 6 (left), and as function of the temperature with fixed N = 16 (right), and their agreement with the analytical result.P = 36.
FIG. 8.Simulations of superfluid liquid4 He.Left: the pair correlation of N = 1372 atoms at temperature 1.379K with and a density of 0.02182 Å−3 , compared with the PIMC results of Ceperley[23].Right: the speedup of the improved algorithm over the original method as a function of the number of4 He molecules, with P = 64 beads, computed at 1.21K and the same density.

FIG. 9 .
FIG. 9.Bead convergence of the pair correlation for N = 1084 He atoms, with different values of P .The temperature is 1.21K and the density is 0.02182 Å−3 .
FIG. 10.Two-dimensional density of N = 1600 bosons in a harmonic trap without interaction (left) and with Gaussian repulsive interaction g = 3 (right).The simulations last ≤ 9 days, instead of > 20 years with the previous algorithm.