Atomic-scale simulations have progressed tremendously over the past decade, largely thanks to the availability of machine-learning interatomic potentials. These potentials combine the accuracy of electronic structure calculations with the ability to reach extensive length and time scales. The i-PI package facilitates integrating the latest developments in this field with advanced modeling techniques thanks to a modular software architecture based on inter-process communication through a socket interface. The choice of Python for implementation facilitates rapid prototyping but can add computational overhead. In this new release, we carefully benchmarked and optimized i-PI for several common simulation scenarios, making such overhead negligible when i-PI is used to model systems up to tens of thousands of atoms using widely adopted machine learning interatomic potentials, such as Behler–Parinello, DeePMD, and MACE neural networks. We also present the implementation of several new features, including an efficient algorithm to model bosonic and fermionic exchange, a framework for uncertainty quantification to be used in conjunction with machine-learning potentials, a communication infrastructure that allows for deeper integration with electronic-driven simulations, and an approach to simulate coupled photon-nuclear dynamics in optical or plasmonic cavities.

## I. INTRODUCTION

The atomic-scale modeling of molecules and materials is a constantly evolving field driven by the interest of researchers in discovering new fundamental phenomena, interpreting complex experimental results, and, together with an increasing number of commercial enterprises, improving, discovering, and developing new materials.^{1–6} Several computational developments in recent years extended dramatically the problems that can be addressed by atomistic simulations, far exceeding the steady improvement of the underlying hardware. Arguably, the largest contribution to such fast progress can be attributed to the development of machine learning interatomic potentials (MLIPs).

MLIPs approach the accuracy of the underlying *ab initio* data, but at a small fraction of the computational cost. They have allowed access to system sizes and time scale, which were previously only accessible to empirical interatomic potentials.^{7–10} This has driven a renewed interest in complex simulation algorithms that were thought inapplicable for studies that require a first-principles description of interatomic interactions, such as the investigation of catalytic reactions^{11–13} or of the thermodynamic stability of materials.^{14–17} However, the lack of easy-to-use implementations of such approaches has become an important factor that hinders a more widespread adoption.

Since its first release in 2014,^{18} one of i-PI’s primary objectives has been “lowering the implementation barrier to bring state-of-the-art sampling and atomistic modeling techniques to *ab initio* and empirical interatomic potentials programs.” The first version of i-PI was designed for path integral molecular dynamics (PIMD) simulations, hence the code name. Since then, and thanks to a community effort, it has evolved into a “universal force engine for advanced molecular simulations,” including both standard and advanced simulation algorithms, as described in Secs. III–V.^{19}

In the last few years, our efforts have adapted to the changing needs of the field. While continuing with the implementation of new and advanced features, we have improved the scalability of the code backend, as required by newly available and computationally efficient MLIPs. We also adopted automatic testing workflows to strengthen code stability and enhance our reproducibility standards.

i-PI’s capabilities as a versatile tool for atomistic simulations can be demonstrated by the wide range of applications in which it has been employed. These examples range from the dynamics of charge density wave materials,^{20} interpretation of neutron scatter experiments,^{21} and nuclear motion associated with electronic excitations^{22–24} to the elucidation of the thermodynamic properties of aqueous electrons,^{25–27} the calculation of first principles phase diagrams,^{28–30} and the study of aqueous solutions and interfaces.^{31–36} Moreover, by virtue of its modular structure, i-PI is already being used as a simulation engine with newly developed MLIP architectures,^{9,37,38} making it easy to use them in combination with advanced sampling techniques. It can also be used as a toolbox to combine different simulation “modes” to design sophisticated modeling workflows that involve multiple types of interatomic potentials, free-energy-sampling software,^{39} as well as post-processing of trajectory data.^{40}

This article reports on the progress made in the last five years since the release of version 2.0 by describing the newly adopted maintenance strategies, the improved performance, and the newly implemented features. This article is organized as follows: In Sec. II, we start with a short review of the code architecture, followed by a description of the improvements made to the code’s robustness and usability and a list of the main features available in this release. In Sec. III, we present a detailed discussion of the code efficiency for production and large-scale simulations, while in Sec. IV, we describe some selected new features since the last code release discussing their implementation and examples of their application. In Sec. V, we show how the modularity and flexibility of i-PI help realizing advanced simulation setups that would otherwise require new codes to be written, and Sec. VI concludes this article.

## II. PROGRAM OVERVIEW

i-PI is a force engine that operates with a client–server paradigm. i-PI acts as a server in command of the evolution of the nuclear positions, while one or more instances of the client code handle the evaluation of energy, forces, and possibly other properties for individual configurations. Designed to be universal, i-PI’s architecture is agnostic to the identity of the force providers, with a general and flexible backend that accommodates a wide range of client codes.

The code is distributed under a dual GPL and MIT license, and it is written in Python 3, a high-level, general-purpose interpreted programming language known for its emphasis on code readability. This choice facilitates rapid prototyping of new ideas and relatively easy code maintenance when compared to compiled languages traditionally used in scientific computing. Additionally, the availability of highly optimized numerical libraries allows Python-based software to deliver competitive performance. Indeed, even computing-intensive electronic structure codes have released Python implementations in recent years.^{41,42} A detailed discussion of the code efficiency and improvements since the previous release is presented in Sec. III.

The communication between i-PI and the client codes is carried out through sockets, enabling local connections by shared-memory inter-process communication (UNIX domain sockets) or between two hosts connected to a local network or to the Internet (TCP/IP sockets). During the initialization, i-PI creates the sockets and awaits the connection of one or more clients. When the connection is established, the simulation loop starts: (i) i-PI communicates the nuclear positions and the cell lattice vectors to the active clients, (ii) the clients evaluate the required properties—typically energies, forces, and stresses—and send these back, and (iii) i-PI uses these data to displace the atoms and to calculate the new nuclear positions, restarting the loop. As i-PI is designed to be client-agnostic, clients must be initialized independently before they can connect. Importantly, the identity and the number of atoms must be defined consistently in the client code according to its own input mechanism and remain unchanged throughout the simulation.

A schematic representation of the code structure and the server–client communication is presented in Fig. 1. i-PI is structured in a modular way that represents the underlying physics of the target simulation as faithfully as possible. To achieve this, the code is organized around the *System* class, which encodes all the information related to the physical system, such as the number and identity of atoms, the ensemble to be sampled, the number of replicas in the path integral simulations, the initialization procedure, and the algorithm for evolving the nuclear positions. Forces are managed through the *Forces* class, which can combine multiple force components, each of which is computed following the strategy specified in a *Forcefield* class. This two-layer approach is particularly advantageous for algorithms where the total forces and energies are obtained as the sum of terms computed at different accuracy levels^{24,43–47} or for different portions of the system.^{48} Simulation techniques that require the evolution of many systems simultaneously make use of the *SMotion* (Systems Motion) class. This class enables the definition of evolution steps that combine the (possibly many) different systems, facilitating, for example, exchanges between system replicas as done in replica exchange simulations, or the application of biases, such as in metadynamics. Finally, estimators can be defined within the code or computed by the provided post-processing tools or user-generated scripts. An updated list of code features is reported in Sec. II C.

### A. Code philosophy and development model

*Many available client codes*: i-PI is connected to a multitude of electronic structure and machine learning codes, either directly or through its ASE^{49} or LAMMPS^{50} interfaces. This wide-ranging connectivity means that a single implementation of an advanced simulation method is readily available to all the users of these codes, which significantly amplifies the potential impact of each contribution to i-PI. Furthermore, this capability enables a direct comparison between different client codes within an identical simulation setup.

*Ease of implementation of new interfaces*: Adding a new interface to i-PI is simple. For codes that provide a Python interface, this can be implemented by adding a small new file to the Python driver directory (drivers/py/pes/). For compiled software, the driver loop can usually be rather self-contained, modeled after the molecular dynamics or geometry optimization loops. The FORTRAN driver distributed with i-PI provides a good example, as well as utilities to open and communicate over sockets that can be reused in third-party codes, while the LAMMPS^{50} implementation of fix_ipi can be used as a template of a C++ interface. In all cases, the structure (atom numbers and types) should be initialized *before* communication is established. In addition, it is noteworthy that many approaches in i-PI, such as Monte Carlo, do not guarantee that successive calls to force providers provide similar structures. Thus, operations such as neighbor list updates, wavefunction extrapolation, or similar that assume continuity in structures may need to be carefully updated on the force evaluator side.

*Lower development and maintenance effort*: In scenarios where a developer aims to use a newly implemented algorithm for nuclear motion with different types of codes, such as a density-functional theory (DFT) plane wave code to study periodic systems or a quantum chemistry code to study gas phase molecules at a higher electronic structure level, two separate implementations would be required. By integrating the algorithm into i-PI, only one implementation is needed and there is only one code to maintain.

*Automatic (re)evaluation of physical quantities*: The infrastructure of i-PI automatically manages updates of energy, forces, stresses, and other structure-dependent properties by establishing dependencies between physically related variables. This architecture enables developers to concentrate solely on the algorithm’s implementation, making the development process more focused and efficient.

*Participative development model*: The modular structure of the code simplifies the contribution of self-contained implementations of new methods. These methods are explicitly listed in a dedicated section of the documentation, giving developers visibility and recognition for their contributions. Many of the applications presented in this paper are examples of this model.

*Code Maintenance and Long-term Sustainability*: As with most open-source projects, i-PI is developed and maintained by a community of volunteer contributors. Several of the authors of this article, who are core developers and hold permanent or long-term academic positions, are involved in the governance of the code. We encourage power users to also help maintain the core infrastructure and the documentation; in preparation for this release, we have dedicated a special focus on improving the development infrastructure, as we discuss in Sec. II B.

### B. Code robustness and usability

The increased number of i-PI users and contributors since the last release required enhancing the maintainability, robustness, and standardization of the code. These improvements involve five distinct aspects:

*Documentation*: We refactored the documentation of i-PI, which is now generated using Sphinx, is deployed automatically when new features are merged to the main branch, and is linked to the main code webpage.^{51} The documentation has been updated to reflect the current status of the code, and a section about how to contribute new features has been added to reduce the barrier for new developers. An automatic parsing of the Python docstrings throughout i-PI ensures a complete listing of all possible keywords and the corresponding explanation, which is automatically updated as soon as new features are implemented and merged in the main branch.

*Examples*: We have completely restructured the provided examples to make them easier to navigate and as intuitive as possible for newcomers. The examples are now organized into three directories: *features*, *clients*, and *hpc_scripts*. The *features* directory displays extensive examples of the different implemented functionalities, all of which can be run locally using the built-in Python or Fortran driver, without requiring additional software installation. The *clients* directory includes examples tailored to specific driver codes, highlighting the unique syntax and tags needed for an adequate setup when used in connection to i-PI. At the moment, we provide examples for ASE,^{49} CASTEP,^{52} CP2K,^{53} LAMMPS,^{50} DFTB+,^{54} ffsGDML,^{55} FHI-aims,^{56} librascal,^{57} Quantum ESPRESSO,^{58–60} VASP,^{61} elphmod,^{62} and YAFF.^{63} We reiterate that many other codes can also be connected to i-PI through their ASE and LAMMPS interfaces. Finally, motivated by the number of user queries on this topic, the *hpc_scripts* directory provides examples, which include the submission scripts to run the simulations on high-performance computing platforms.

*Demonstrations and tutorials*: We have added a *demos* directory that contains examples of more complicated setups, to illustrate the application of i-PI in realistic simulation scenarios, combining several of the features showed in the *features* directory. Some of these demonstration examples also cover some of the use of the pre- and post-processing scripts accompanying the code. Users specifically interested in learning the theory of path integral simulations and obtaining hands-on practice using i-PI are referred to the online course developed with this goal.^{64}

*Continuous integration*: We have implemented continuous integration workflows to automatically test for the code’s functionality. One automatic workflow reviews all the simulations in the *examples* directory by running them for a few steps and quickly verifying if they are functioning. A second workflow performs regression tests. In these cases, the simulations are performed for the required amount of steps and the simulation outputs are compared against pre-computed reference values. Together, these workflows guarantee that any new modification to the code neither alters the results of simulations that are tested nor breaks the provided examples. Additional workflows, less critical for the functioning of the code, focus on checking the formatting syntax using Flake8^{65} and Black^{66} or managing the updates of PyPI releases.

*Developers’ documentation and workflows*: We have included and updated the development guidelines, which are discussed in the main documentation, as well as in the CONTRIBUTING.md file in the main repository. Contributors are encouraged to use the standard *git* workflows, such as creating a fork and opening a pull request, and expect code review that will involve both the correctness of the algorithm and the consistency with i-PI’s coding style and quality.

### C. Main features

We conclude this section by listing the main features currently available in i-PI. A comprehensive list of features, indicating the main contributors, is maintained as part of the documentation.

#### 1. Features available in the 2.0 release

*Classical and path integral molecular dynamics*: the *NVE*, *NVT*, *NPT*, and constant stress (*NsT*) ensembles.

*Thermostats*: stochastic velocity rescaling;^{67} path-integral Langevin equation thermostats;^{68} generalized Langevin equation (GLE) thermostats, including the optimal sampling,^{69,70} quantum,^{71}^{,} *δ*,^{72} hot-spot,^{73} and PIGLET thermostats;^{74} GLE thermostats for use with path-integral correlation functions;^{75} the fast forward Langevin thermostat;^{76} and Langevin sampling for noisy and/or dissipative forces.^{77,78}

*Barostats*: isotropic barostats,^{18,79} anisotropic barostats,^{80} and its path integral adaptations.^{47,81}

*Geometry optimizers*: steepest descent, conjugate gradient, L-BFGS, trust radius BFGS, nudged elastic band,^{34,82,83} and mode following saddle point search.^{84}

*Enhanced sampling and free energy calculations*: thermodynamic integration,^{85} metadynamics, replica exchange MD,^{86} and quantum alchemical transformations.^{87,88}

*Multiple time-stepping (MTS)*: MTS in real time and imaginary time, also known as ring polymer contraction.^{43,44,46,48}

*Approximate quantum dynamics calculations*: ring polymer molecular dynamics (RPMD),^{89,90} centroid molecular dynamics (CMD),^{91,92} thermostatted RPMD,^{75,93} and ring polymer instanton rates and tunneling splittings.^{94}

*Advanced path integral simulations and estimators*: fourth-order path integrals,^{95} perturbed path integrals,^{96} the scaled-coordinate heat capacity estimator,^{97} isotope fractionation estimations,^{98} particle momentum distribution,^{99} and reweighted fourth-order path integral MD.^{100,101}

#### 2. Features added after the 2.0 release

*Barostats*: MTTK fully flexible barostat,^{103} MTS implementation, and barostat implementation for the Suzuki–Chin^{47} PIMD.

*Advanced path integral simulations and estimators*: Suzuki–Chin double virial operator estimator for calculating heat capacity^{47} and open-chain path integrals^{104} for estimating momentum dependent operators.

*Vibrational analysis*: through the harmonic Hessian, independent mode framework, the vibrational self-consistent field, and self-consistent methods.^{105}

*Particle exchange Monte Carlo*: for efficient sampling of poorly ergodic mixtures.^{106}

*Path Integral Molecular Dynamics (PIMD) for indistinguishable particles*: Methods for simulating bosons,^{107} with improved quadratic scaling,^{108} and fermions, through a reweighting procedure.^{109}

*Cavity MD (CavMD)*: Simulation of systems confined in optical or plasmonic cavities.^{110,111}

*Nuclear propagation by additional dynamical variables*: non-adiabatic tunneling rates in metallic systems^{112,113} and molecular dynamics with time-dependent external fields.

*Ensemble models*: committee models, baseline energy model, and uncertainty estimation.^{106,114}

*Extracting dynamical information from thermostatted simulations*: for vibrational spectra and diffusion coefficients from PIGLET thermostatted simulations.^{115}

*Non-linear spectroscopy*: Evaluation of non-linear spectroscopic observables by equilibrium–non-equilibrium simulations for classical and quantum nuclei.^{116–118}

*T*_{e} *path integral coarse-graining simulations (PIGSs)*: for approximate quantum dynamics using molecular dynamics on an MLIP representing the centroid potential of mean force.^{119,120}

## III. EFFICIENCY OF LARGE-SCALE SIMULATIONS

i-PI’s socket-based communication and Python architecture offer convenience and flexibility to implement new and advanced simulation algorithms, facilitating integration with other codes. This flexibility, however, comes at the price of a computational overhead associated with UNIX- and TCP-socket communications and Python’s inherent inefficiency compared to lower-level languages, such as Fortran, C, and C++.

To analyze i-PI’s overhead on the performance of the simulations, it is instructive to separate the intrinsic wall-clock time needed to evaluate energy and forces, *τ*_{F}, and the time overhead associated only with i-PI, *τ*_{O}, such that the length of simulation that can be obtained for a certain computational effort is proportional to 1/(*τ*_{F} + *τ*_{O}). The initial focus of i-PI was on *ab initio* force providers, where *τ*_{F} is of the order of several seconds, and therefore, an overhead of *τ*_{O} ∼ 10 ms was inconsequential (see Fig. 2). However, with MLIPs becoming the primary tool for large-scale, reactive calculations, an overhead of several ms can become the performance bottleneck. Therefore, we have improved the efficiency of i-PI for this release.

### A. Estimating the overhead of i-PI

The cost per MD step of i-PI simulations can be roughly modeled as a constant term plus a per-atom contribution. This overhead can be estimated as *τ*_{O} ≈ *A* + *BN*_{a}. Note that, for such a wide range of system sizes, it is not advisable to determine the coefficients by linear regression because the intercept would be affected by large relative errors. Therefore, we determined *A* as the time per step of the smaller simulation (eight atoms) and *B* as the time per step per atom of the largest one (2^{16} atoms).

We start by analyzing the simplest possible case of a constant-energy simulation of an ideal gas, with no forces acting between atoms and therefore *τ*_{F} ≈ 0. (Fig. 3). In this case, the advantage of a C++ implementation such as that in LAMMPS is evident. For the largest system sizes, LAMMPS is about ten times faster than i-PI, but for a few atoms, the difference is much larger: LAMMPS can perform millions of steps per second, whereas i-PI 2.6 saturates at about 1000 steps per second [Fig. 3(a)].

It is also instructive to partition *τ*_{O} into a contribution associated with the Python overhead and the one associated with the socket communication. We estimated the former by disabling force evaluation altogether, which can be achieved by setting weight=”0” as an attribute of the <force> components. We estimated the latter by subtracting the Python overhead from the overall cost of a simulation communicating with a dummy driver that simply returned zero energy and forces [Figs. 3(b) and 3(c)].

Figure 3 also compares the computational efficiency of the stable 2.6 version with that of the current 3.0 release. In the latter, we performed several optimizations to reduce the overhead associated with accessing the i-PI *depend* objects,^{18} substituted explicit loops with array operations, and removed other obvious bottlenecks from the implementation. This reduced the overhead of the NVE integrator by a factor of 3–5 and the communication overhead by a factor of 2.

In Table I, we report the values of the *A* and *B* coefficients for different simulation setups enabling the calculation of the communication overhead, while in Table II, we report the values of the *A* and *B* associated with the Python overhead for different nuclear motion algorithms. Thus, the total overall i-PI overhead, *τ*_{O}, can be obtained by adding the TCP/IP or UNIX socket timings from Table I to the algorithm-specific times from Table II. In all cases, the new release dramatically reduces both constants so that for most classical MD simulations and up to a few 1000 s of atoms, one can now expect an overhead of a few ms per step. It is thus possible to perform tens of millions of simulation steps per day when disregarding the force evaluation costs.

. | . | . | . | . | A (ms)
. | B (μs/N_{a})
. | ||
---|---|---|---|---|---|---|---|---|

Type . | Replicas . | Socket . | Ensemble . | Thermostat . | i-PI 2.6 . | i-PI 3.0 . | i-PI 2.6 . | i-PI 3.0 . |

MD | 1 | None | NVE | None | 0.47 | 0.09 | 0.06 | 0.02 |

MD | 1 | UNIX | NVE | None | 1.03 | 0.34 | 0.17 | 0.08 |

UNIX socket communication overhead | 0.56 | 0.25 | 0.11 | 0.06 | ||||

MD | 1 | TCP/IP | NVE | None | 21.54 | 1.26 | 0.64 | 0.10 |

TCP/IP socket communication overhead | 21.07 | 1.17 | 0.56 | 0.08 | ||||

PIMD | 2 | None | NPT | pile_l | 1.39 | 0.47 | 0.94 | 0.60 |

PIMD | 2 | UNIX | NPT | pile_l | 2.62 | 1.27 | 1.32 | 0.76 |

UNIX socket communication overhead | 1.23 | 0.80 | 0.38 | 0.16 | ||||

PIMD | 2 | TCP/IP | NPT | pile_l | 32.64 | 2.45 | 1.68 | 0.77 |

TCP/IP socket communication overhead | 31.25 | 1.98 | 0.74 | 0.17 |

. | . | . | . | . | A (ms)
. | B (μs/N_{a})
. | ||
---|---|---|---|---|---|---|---|---|

Type . | Replicas . | Socket . | Ensemble . | Thermostat . | i-PI 2.6 . | i-PI 3.0 . | i-PI 2.6 . | i-PI 3.0 . |

MD | 1 | None | NVE | None | 0.47 | 0.09 | 0.06 | 0.02 |

MD | 1 | UNIX | NVE | None | 1.03 | 0.34 | 0.17 | 0.08 |

UNIX socket communication overhead | 0.56 | 0.25 | 0.11 | 0.06 | ||||

MD | 1 | TCP/IP | NVE | None | 21.54 | 1.26 | 0.64 | 0.10 |

TCP/IP socket communication overhead | 21.07 | 1.17 | 0.56 | 0.08 | ||||

PIMD | 2 | None | NPT | pile_l | 1.39 | 0.47 | 0.94 | 0.60 |

PIMD | 2 | UNIX | NPT | pile_l | 2.62 | 1.27 | 1.32 | 0.76 |

UNIX socket communication overhead | 1.23 | 0.80 | 0.38 | 0.16 | ||||

PIMD | 2 | TCP/IP | NPT | pile_l | 32.64 | 2.45 | 1.68 | 0.77 |

TCP/IP socket communication overhead | 31.25 | 1.98 | 0.74 | 0.17 |

. | . | . | . | . | A (ms)
. | B (μs/N_{a})
. | ||
---|---|---|---|---|---|---|---|---|

Type . | Replicas . | Socket . | Ensemble . | Thermostat . | i-PI 2.6 . | i-PI 3.0 . | i-PI 2.6 . | i-PI 3.0 . |

MD | 1 | None | NVT | SVR | 0.56 | 0.14 | 0.07 | 0.03 |

MD | 1 | None | NVT | Langevin | 0.86 | 0.22 | 0.24 | 0.21 |

MD | 1 | None | NPT | Langevin | 1.45 | 0.47 | 0.37 | 0.32 |

MD | 1 | None | NPT | SVR | 1.13 | 0.38 | 0.22 | 0.13 |

PIMD | 32 | None | NPT | Langevin | 2.73 | 0.67 | 17.21 | 12.97 |

REMD | 4 × 6 | None | SC-NPT | Langevin | 27.80 | 16.86 | 11.29 | 12.50 |

. | . | . | . | . | A (ms)
. | B (μs/N_{a})
. | ||
---|---|---|---|---|---|---|---|---|

Type . | Replicas . | Socket . | Ensemble . | Thermostat . | i-PI 2.6 . | i-PI 3.0 . | i-PI 2.6 . | i-PI 3.0 . |

MD | 1 | None | NVT | SVR | 0.56 | 0.14 | 0.07 | 0.03 |

MD | 1 | None | NVT | Langevin | 0.86 | 0.22 | 0.24 | 0.21 |

MD | 1 | None | NPT | Langevin | 1.45 | 0.47 | 0.37 | 0.32 |

MD | 1 | None | NPT | SVR | 1.13 | 0.38 | 0.22 | 0.13 |

PIMD | 32 | None | NPT | Langevin | 2.73 | 0.67 | 17.21 | 12.97 |

REMD | 4 × 6 | None | SC-NPT | Langevin | 27.80 | 16.86 | 11.29 | 12.50 |

The Python overhead is larger when considering simulations with multiple replicas and/or complicated setups that involve, e.g., high-order path integration or replica exchange, as evidenced in Table II. Often, however, these advanced simulation techniques decrease the overall cost of a calculation by reducing the number of force evaluations necessary to reach the desired statistical accuracy. This is a key advantage of using i-PI over faster, but less feature-rich, alternatives. An example, also relevant to classical MD trajectories, is the use of BAOAB integrators,^{122} which allow doubling the time step in simulations of liquid water. A technical detail we also want to highlight from Table I is the $>20\xd7$ reduction of the constant overhead for TCP/IP communication. This was achieved by disabling Nagle’s algorithm,^{123} which was found to interfere negatively with the transfer of small messages in the i-PI communication protocol. This change has to be enacted also on the client side, as currently done in the driver codes we distribute with i-PI, so we urge developers of packages that contain an i-PI interface to apply similar changes.

### B. Efficient performance of i-PI on realistic parallelized simulations

Even though this release significantly cuts the i-PI overhead *τ*_{O}, and we are committed to further reducing it in the future, it is important to determine whether and when it matters in practical use cases. To this end, we consider the task of simulating 4096 water molecules at room temperature, in the canonical *NVT* ensemble, using a stochastic velocity rescaling thermostat. Based on Table I, we expect that using i-PI 3.0 should entail an overhead of 1.3 ms when using a UNIX socket and 2.5 ms when using a TCP/IP socket, as necessary for simulations running across many nodes.

We start by considering the LAMMPS implementation of a DeePMD neural network, which is used both to perform MD directly using its LAMMPS implementation^{7,125,126} and as a driver for running classical MD and path integral molecular dynamics with i-PI. On the hardware we use for this test, detailed in the caption of Fig. 4, the evaluation of energies and forces on one central processing unit (CPU) requires ∼10 s/step, which is several orders of magnitude larger than the estimated *τ*_{O}.

In practical scenarios, however, the MLIP calculation will be heavily parallelized, either using a multi-CPU implementation or using one or more graphics processing units (GPUs). As shown in Fig. 4, i-PI and LAMMPS have essentially equivalent performance up to the highest parallelization level for CPUs (up to 4 nodes, 288 cores), consistent with the small predicted *τ*_{O}. The GPU implementation of DeePMD is much faster and can be parallelized efficiently up to about eight GPUS (four nodes), for which (with a time step of 0.5 fs) LAMMPS clocks almost 2.5 ns of simulation per day. In our initial benchmarks, we observed the empirical *τ*_{O} with LAMMPS DeePMD to be up to ten times larger than when using the dummy driver, which we could trace to unnecessary reconstructions of the neighbor list at the LAMMPS side of the interface. We could optimize the LAMMPS interface of the driver, minimizing the number of neighbor list updates and reducing the observed overhead to about 3 ms for the 8-GPU run. This is still almost two times larger than predicted because of the additional bookkeeping that is needed to interface i-PI with a feature-rich code, such as LAMMPS, in comparison to the minimalistic dummy driver. Nevertheless, these optimizations have reduced *τ*_{O} to less than 20% of the overall simulation step even for the highly efficient multi-GPU setup, allowing to perform more than 2 ns of simulation per day with a relatively large number of atoms. For a PIMD setup, i-PI allows exploiting an additional level of parallelism by running multiple clients in parallel. Given that this is a trivial type of parallelization, it can improve the scaling whenever the client code shows less-than-ideal scaling, as observed for the GPU implementation of DeePMD in Fig. 4(b), even though the speedup is minimal in this case.

### C. Near-ideal performance of i-PI with popular MLIPs

These experiments show that, in practical use cases, the overhead introduced by i-PI is negligible or small up to high levels of parallelization of the MLIP code—although one should reassess these conclusions depending on the hardware, the specific MLIP implementation, and the type of simulation performed. As shown in Fig. 2, a practical recipe to determine what throughput one can expect out of the combination of i-PI with your favorite machine learning code is to determine the overhead *τ*_{O}, either from Table I or by running a short simulation with a dummy driver. A dummy calculation is also suitable to tune the simulation parameters (and potentially to help remove bottlenecks from other simulation pathways—some examples of profiling runs can be found in the *test* directory). The cost of evaluating energy and forces, *τ*_{F}, also depends on the type of MLIP, system size, and the hardware of choice. Thus, by comparing *τ*_{O} and *τ*_{F}, it is possible to determine whether *τ*_{O} is significant.

As shown in Table III, for both small and medium-sized supercells and for DeePMD, n2p2, and MACE models, a standard constant-temperature MD simulation is limited by *τ*_{F}. Even though, as shown by the case of LAMMPS, the overhead of a specific client might be larger than that of the minimalistic driver distributed with i-PI, this type of analysis helps determine whether i-PI is a suitable choice for the modeling task or whether one should look for (or code) a more performant, compiled-language implementation. In summary, using i-PI as a driver does not prevent reaching a few million energy evaluations per day, corresponding to several ns for typical MD time steps. In addition, it allows access to advanced simulation methods, such as path integral molecular dynamics, as well as facile interfacing with new MLIP implementations, which might make i-PI the preferred choice even in cases in which it entails a small performance penalty.

. | . | . | . | . | Throughput (ns/day) . | |
---|---|---|---|---|---|---|

Model . | Architecture . | n_{CPU} or n_{GPU}
. | $nH2O$ . | τ_{F}/s
. | Direct . | w/i-PI . |

DeePMD | CPU (Xeon) | 288 | 4096 | 0.066 | 0.65 | 0.56 |

DeePMD | GPU (V100) | 8 | 4096 | 0.017 | 2.54 | 2.16 |

MACE | GPU (A100) | 1 | 4096 | 1.47 | 0.03 | 0.03 |

MACE | GPU (A100) | 1 | 64 | 0.047 | 0.94 | 0.92 |

BPNN | CPU (Xeon) | 768 | 4096 | 0.015 | 2.79 | 2.10 |

. | . | . | . | . | Throughput (ns/day) . | |
---|---|---|---|---|---|---|

Model . | Architecture . | n_{CPU} or n_{GPU}
. | $nH2O$ . | τ_{F}/s
. | Direct . | w/i-PI . |

DeePMD | CPU (Xeon) | 288 | 4096 | 0.066 | 0.65 | 0.56 |

DeePMD | GPU (V100) | 8 | 4096 | 0.017 | 2.54 | 2.16 |

MACE | GPU (A100) | 1 | 4096 | 1.47 | 0.03 | 0.03 |

MACE | GPU (A100) | 1 | 64 | 0.047 | 0.94 | 0.92 |

BPNN | CPU (Xeon) | 768 | 4096 | 0.015 | 2.79 | 2.10 |

## IV. DESCRIPTION OF SELECTED FEATURES

### A. Bosonic and fermionic path integral molecular dynamics

Exchange effects, arising from bosonic and fermionic statistics, are important in simulations of phenomena such as superfluidity,^{128} supersolidity,^{129} and superconductivity. However, ordinary PIMD simulations assume that the atoms are distinguishable particles and obey Boltzmann statistics. The biggest challenge in going beyond this assumption and performing PIMD simulations of indistinguishable particles is including all the possible identical particle permutations in the partition function. These are taken into account by connecting rings of permuted particles to form larger polymers.^{130} Since there are *N*! permutations of *N* particles, the simulation, in principle, should sum over the spring forces from *N*! configurations, which quickly becomes intractable.

*et al.*

^{107,108}In this approach, PIMD simulations of bosons scale quadratically,

^{108}rather than exponentially, with

*N*. This is achieved by replacing the standard ring polymer potential with $VB(N)$, which is defined by the recurrence relation,

*N*−

*k*+ 1, …,

*N*in a cycle, and $VB(0)=0$. This potential and the forces it induces are computed in quadratic time with

*N*and in linear time with the number of beads.

^{108}

The i-PI implementation allows for the efficient simulation of systems composed of $\u223c1000$ particles. A comparison of the scalability of bosonic and standard PIMD simulations in i-PI is presented in Fig. 5, using the same integrator (bab, see below) in both cases, for a system of trapped non-interacting particles. The bosonic simulation has quadratic scaling, while the distinguishable one scales linearly. For *N* = 64 particles, the cost of bosonic exchange effects is ∼12 times that of standard PIMD, and for *N* = 1024, it is only 44 times slower with the same integrator.

Performing bosonic PIMD in i-PI is straightforward and involves adding the following block to the xml input file:

The bosonic atoms are listed using the bosons tag. This can be done as a list of atoms (zero-based), such as <bosons id=’index’> [0, 1, 2] </bosons>, or by a list of labels, such as <bosons id=’label’> [’He’] </bosons>. At the moment, the integration should be done in Cartesian coordinates (normal_modes propagator=’bab’) because an analytic expression for the normal-modes propagator of the free ring polymer part of the Hamiltonian is not known for bosons. Because the frequencies of the springs are typically higher than the physical frequencies, the use of multiple-time stepping in the Cartesian integrator is recommended.^{131} This is done by setting the nmts tag (standing for the number of iterations for each multiple-time stepping level) to M, where M is the ratio between the time step used for the physical forces and that used for the spring term.^{19} In Fig. 6, we present the average energy of 512 trapped, non-interacting cold bosons. By comparing with distinguishable particle simulations, the relevance of exchange effects becomes evident.

*i*in a separate ring. This quantity tends to 1 in the regimes where the exchange is negligible. (ii) exchange_all_prob, which is the probability of the ring-polymer configuration connecting

*all*the bosons, scaled by 1/

*N*,

*N*→ 1 in one large ring. This quantity is high in regimes of effective exchange between all the particles. In Fig. 7, we show the average of exchange_all_prob in converged simulations of trapped cold noninteracting bosons as a function of temperature. This quantity tends to 1 in the limit

*T*→ 0,

*β*→ ∞, in which all permutations are equally probable.

^{107}

Several estimators require modifications when performing bosonic PIMD simulations. For example, the primitive estimator for the kinetic energy (kinetic_td) is modified according to Eq. (1) [as shown in Eq. (2) in the supplementary material of Hirshberg *et al.*^{107}], while the standard virial kinetic energy estimator can be used for bosons in confined systems. On the other hand, the centroid-virial kinetic estimator is not applicable due to the lack of an appropriate centroid definition.

Finally, fermionic statistics can be obtained by first performing bosonic simulations and then reweighting the results.^{109,132,133} This is done by using the fermionic sign estimator (the fermionic_sign property of i-PI). For every position-dependent operator, the corresponding fermionic estimator is obtained from the bosonic estimator by multiplying with the fermionic sign and dividing by the average fermionic sign in the simulation [see Eq. (9) of Ref. 109]. The average sign is always positive, but for simulations at low temperatures or with large *N*, where the sign problem is severe, it gets close to zero. Figure 8 illustrates the behavior of the fermionic energy estimator for *N* = 3 non-interacting, trapped fermions at several temperatures. While the distribution of the bosonic energy estimator throughout the simulation is always positive, the fermionic energy estimator changes sign and its distribution is bi-modal around *E* = 0 and more symmetric at lower temperatures. The near-complete cancellations of positive and negative contributions make the mean fermionic energy harder to converge in comparison to the bosonic case.

### B. Nuclear propagation by additional dynamical variables

i-PI 3.0 can employ numerical quantities provided by the driver (beyond the usual forces and stress arrays) to run advanced types of dynamics. This supplementary communication is achieved through an additional string that can be passed in the extras field of i-PI within the same socket communication. When such string is JSON-formatted, it is automatically parsed, and the additional quantities returned by the driver are made available in the code and accessible as a dictionary. More importantly, when the data are numeric, it is added as a property of the force object, guaranteeing a seamless integration with other i-PI features and becoming available to be used in any type of algorithm. In Secs. IV B 1 and IV B 2, we present two examples of advanced simulations that employ electronic structure quantities as “additional” variables to modify the atomic forces and therefore alter the nuclear dynamics.

#### 1. Non-adiabatic tunneling rates in metallic systems

The vanishing bandgap in metals allows nuclear movements to induce electronic excitations, even at relatively low velocities.^{134,135} These effects are not captured by the Born–Oppenheimer approximation and, therefore, are collectively referred to as non-adiabatic effects (NAEs). NAEs in metals are qualitatively different from the ones found in molecules due to the continuous manifold of available empty electronic states present in the former, and thus, they require a different set of theoretical approaches. A frequently used technique to simulate these types of NAEs is to decorate the otherwise Born–Oppenheimer dynamics with an electronic friction kernel and a stochastic force.^{136–138} The electronic friction formalism has been derived from first principles starting from the exact factorization of the wave function, initially at 0 K,^{139,140} and more recently at finite temperatures.^{141} Moreover, in the case of a theory of non-interacting electrons, such as DFT, the friction kernel can be computed explicitly from *ab initio* calculations.^{142}

When studying the kinetics of light atom diffusing in metals, nuclear quantum effects (NQEs) can become as important as NAEs. To address this complex problem, some of the authors have recently derived a theoretical framework that can capture both types of effects.^{112} The method called ring polymer instanton with explicit friction (RPI-EF) was derived by combining the semiclassical instanton rate theory in its path integral discretized form,^{143} with the *ab initio* electronic friction formalism. The framework is general enough to incorporate the spatial and frequency dependence of the friction tensor, and owing to the computational efficiency of the parent methods,^{144,145} it is suitable for first-principles calculations of reaction rates in complex, high-dimensional systems.^{113}

^{146}In the limiting case of a position-independent friction kernel (see a more general expression, including the position dependence of the friction kernel in Ref. 112), the discretized Euclidean action

*S*

_{P}can be expressed as

*P*is the number of discretization points or beads,

*N*is the number of atoms,

*β*is the inverse temperature,

*Q*^{(l)}represents the free ring-polymer normal mode coordinates,

*ω*

_{l}are the free ring-polymer (RP) normal mode frequencies, and $\eta \u0303$ is the Laplace transform of the (electronic) friction kernel.

*S*

_{P}is related to the potential energy of the ring polymer

*U*

_{P}by

*S*

_{P}/

*ℏ*=

*β*

_{P}

*U*

_{P}. The RPI-EF calculation consists of a series of geometry optimizations on

*U*

_{P}using progressively more discretization points until the converged instanton pathway is achieved. A detailed description of the individual steps involved in RPI and RPI-EF simulations, as well as practical guidelines, can be found elsewhere.

^{113,147,148}

The i-PI implementation of RPI-EF builds upon the RPI implementation from the previous i-PI release. The primary distinction regards the additional terms that contain the friction kernel. In the case of position-dependent friction, these terms require updates at each step of the geometry optimization, which has been made possible by the automatic parsing of JSON-formatted data returned by the client. The friction terms are enabled by setting the friction flag inside the instanton block, frictionSD. Since the theory assumes that the frequency dependence of the friction kernel is the same for all positions, the user needs to provide the normalized spectral density, $\eta \u0303(\omega )$, that can be specified with the flag fric_spec_dens and the energy at which it has been normalized to one by using the flag fric_spec_dens_ener. A minimal example of the section motion to run RPI-EF simulations with i-PI is

In Fig. 9, we present the instanton geometry obtained for a 1D double well potential without including any friction, including a constant friction profile, and including an analytical position-dependent friction profile. The inclusion of constant friction causes the particle to behave more classically, resulting in the shrinkage of the instanton geometry when compared to the calculation without friction. The result with position-dependent friction shows that the instanton geometry avoids regions with large friction. As discussed in Refs. 113 and 149, this indicates that non-adiabatic effects can steer the main tunneling pathway and, consequently, steer quantum dynamics.

#### 2. Dynamics with time-dependent external field

Recent experiments have shown that the excitation of infrared-active phonons through THz laser pulses leads to a vast phenomenology in different kinds of materials.^{150} These phenomena are non-thermally driven by microscopic processes rather than originating from ultrafast heating effects.^{151} The possibility for driving materials, ranging from traditional solids to liquids and interfaces, across phase transitions and into metastable states that are not accessible at thermal equilibrium, has opened the possibility of dynamical materials design.^{152,153}

^{154,155}A popular approach to simulate these experiments and capture the associated nonlinear dynamics is based on the electric dipole approximation (EDA). Within EDA, one can assume that the changes induced by the electric field $E$ on the electronic structure are not significant and expand the electric enthalpy function of a system around the vanishing-field condition up to first order. This leads to the following expression for the (external field dependent) potential energy:

**is the dipole moment of the system in the absence of the electric field and**

*μ***represents the atomic positions. Equation (5) can be evaluated with a variety of theories. For example, when employing DFT, evaluating the expression above would require a full self-consistent convergence of the Kohn–Sham (KS) energy and the dipole corresponding to this converged electronic density. In periodic systems, the dipole would have to be expressed in terms of the system polarization, calculated with the Berry-phase formalism.**

*q*^{156}

**,**

*F**dynamical charges*or Born effective charges (BECs) $Z\alpha j*$, defined as

*e*is the elementary charge (1 in atomic units),

*α*=

*x*,

*y*,

*z*runs through the Cartesian components of the total dipole, and

*j*= 1, …, 3

*N*runs through the degrees of freedom of the system. With this definition, the last term in Eq. (6), henceforth

*F*^{ext}, reads

For a fixed atomic configuration, the BEC can be computed by finite differences of the dipole values with respect to the atomic positions. In the context of DFT, it is also possible to exploit density functional perturbation theory (DFPT).^{157,158}

The i-PI implementation for driven vibrational motion is based on the definition of a time-dependent integrator for the equations of motion, called EDAIntegrator, built on top of the existing integrators. The role of this integrator is simply to add *F*^{ext}(*t*) to the momenta within the Verlet algorithm. This was implemented in a new *Motion* class of i-PI, called DrivenDynamics, which inherits all the attributes and methods of the *Dynamics* class.

*ω*is the frequency of the plane wave, $Eamp$ is the polarization vector of the electric field, and

*m*and

*σ*

^{2}are the mean and variance of the Gaussian envelope function, respectively. Other types of field profiles can be added straightforwardly.

The BECs and dipoles are communicated by the client at every new nuclear configuration through the i-PI extras field as a JSON formatted string. An example of the communication of such a client is provided in the Python driver code distributed with i-PI (see driverdipole). Alternatively, BECs can be provided as a txt file and kept fixed during the dynamics.

An example of the section motion to run dynamics driven by a Gaussian-enveloped electric field with fixed BECs reads

In this example, the amp, freq, peak, and sigma tags refer to $Eamp$, *ω*, *m*, and *σ* as defined in Eq. (9), respectively. The file bec.txt contains a 2D array of shape $3N,3$. The columns correspond to the Cartesian components of the dipole, and the rows correspond to the degrees of freedom *j* in Eq. (7). The current implementation only supports the NVE ensemble, but can be easily combined with other ensembles.

As a benchmark example, a single water monomer has been driven using fixed BECs and the Partridge–Schwenke water model.^{159} The parameters adopted for the electric field are the ones shown in the input example above. The resulting trajectory initialized with vanishing velocities has then been projected along the three vibrational modes of the system, which have been computed using the harmonic vibrational analysis provided by i-PI. The vibrational modes are shown in Fig. 10(a). The oscillating frequency of the field at around 3836 cm^{−1} (115 THz) was chosen to be resonant with the symmetric-stretch mode.

Figure 10(b) shows the energy given by the sum of the three harmonic vibrational mode energies (E_{bend}, E_{sym.stretch}, E_{asym.stretch}). E_{anharm} is computed as the difference between the total energy and the sum of the harmonic energies. The profile of the applied field is also shown in Fig. 10. As expected, the resonant vibrational mode is excited and the energy of the system increases while the driving field assumes non-negligible values. Afterward, the field decays to zero, but the system remains excited due to the lack of a dissipation mechanism. By comparing the different components of the energy, it is evident that the harmonic contribution is dominated by the symmetric-stretch mode and that in this simple example, the anharmonicity plays a relatively minor role.

### C. Ensemble models and error estimation

This version of i-PI implements error estimation for MLIPs based on ensembles. The idea is that, for each configuration *A*, multiple potentials *V*^{(k)}(*A*) are computed, together with the associated forces. The simulation is then evolved based on the mean of the ensemble $V\u0304(A)=1nens\u2211kV(k)(A)$, where *n*_{ens} is the number of potentials in the ensemble, and the error is estimated as the standard deviation *σ*^{2}(*A*). While, in general, ensemble approaches entail a large computational overhead because *n*_{ens} potentials have to be evaluated at each time step, there are cases in which this overhead can be largely avoided, as, e.g., when employing linear or kernel methods^{160} or when applying a high-level of weight sharing between different neural networks, e.g., using a “shallow-ensemble” on the last-layer.^{114,161–165} i-PI provides two alternative modes (Fig. 11) to simulate nuclear (thermo)dynamics, including ensemble-based uncertainty estimates, depending on the characteristics of the underlying framework.

The first entails independent models that are run as separate clients, by specifying multiple socket connectors as part of a <ffcommittee> tag. For instance,

The associated class takes care of collecting potentials, forces, and stresses from the various sockets and averaging them. Even though a ffcommittee section contains multiple sockets, one should use a single <force> block referring to the name of the committee (my_committee in this example) in the definition of the system. A linear committee calibration scaling^{160} can also be applied with the keyword <alpha>.

In the second mode, which is suitable when using a shallow ensemble model with weight sharing, a single client computes all the ensemble members. In this case, the client passes the average energies, forces, and virials to i-PI in the standard way and also passes the ensemble information as a JSON string into the extras field. The string contains the list of potentials, forces, and virials into fields named committee_pot, committee_force, and committee_virial. We provide a reference implementation of such a client in the FORTRAN driver distributed with i-PI. It is then sufficient to specify a single client in the committee section

and to activate the flag parse_json that instructs the committee class to parse the ensemble data out of the string. In addition, in this case, the forces section should refer to my_committee and not to the name of the inner forcefield.

For both modes, committee potentials, forces, as well as the computed uncertainty are collected into an *extras* dictionary corresponding to the same schema used for the JSON string. The different diagnostics can then be printed out using the generic output features, e.g., specifying extra_type=’committee_pot’ to print the energies of all members of the ensemble.

*weighted baseline*model,

^{106}in which one forcefield, whose name is specified in the <baseline_name> flag, is interpreted as a reference potential, to which the model described by a committee is added as a correction. The baseline is assumed to have a constant uncertainty

*σ*

_{base}, which can be set with the option <baseline_uncertainty>. Rather than simply computing the overall potential as $V=Vbase+V\u0304$, the weighted baseline potential is computed as

#### 1. Computing the solid–liquid chemical potential for water

We discuss a representative example of the use of ffcommittee to compute errors of the solid–liquid chemical potential Δ*μ* for an ice/water system, which demonstrates several of the advanced features in i-PI. This type of simulation can be used to estimate the melting point of water *T*_{m} by repeating the procedure at different temperatures and finding the temperature at which Δ*μ* = 0. As shown in Ref. 106, once a collection of Δ*μ* values associated with the different ensemble members has been obtained, it is easy to determine the error on *T*_{m} by ensemble propagation.

^{166}based on smooth-overlap of atomic positions (SOAP)

^{167}features and trained on the dataset of Ref. 14. The model generates an ensemble of five energy predictions, differing only by the last-layer weights and trained by optimizing the negative log-likelihood following the direct propagation of shallow ensembles (DPOSE) protocol.

^{114}We run a simulation with 336 water molecules, arranged in an elongated simulation cell containing coexisting water and ice. A restraining

*pinning potential*to an anchor point, $n\u0304$,

*n*

_{S}(

*A*),

^{168}as implemented in PLUMED

^{39}and here used through its interface with i-PI. At the melting point, the potential restraints the solid fraction to the target value $n\u0304$, whereas away from equilibrium, the mean deviation is proportional to the solid–liquid molar free-energy difference, $\Delta \mu \u221dnS(A)\u2212n\u0304$. Further details on the setup can be inferred from the example files included in the demos/ensemble-deltamu directory. It should be noted that both size and duration of the simulations are insufficient to obtain a converged estimate of the melting point, as this is only intended as a demonstrative example.

Figure 12 shows a small segment of a trajectory obtained with the DPOSE model.^{114} The mean potential undergoes thermal fluctuations, and the ensemble of predictions has a varying spread along the trajectory. Similar to what was observed in Ref. 114, the members of the ensemble differ mostly by a rigid shift, reflecting a systematic bias in the prediction of the potential. Meanwhile, the order parameter fluctuates about a value that is slightly below the target, indicating that the simulation temperature (290 K) is above the melting point for this water model—at least for this under-converged supercell size. There is no obvious pattern in the correlation between the total potential energy and the order parameter: at this temperature, and for such a comparatively small system, random fluctuations dominate over the enthalpy change associated with the changes in the solid fraction. Importantly, the cross-correlations between the solid fraction and the ensemble potential shift $(V(k)(A)\u2212V\u0304(A))$ show approximately Gaussian behavior (see the inset in Fig. 12). This simplifies the estimation of the average deviation $(nS(A)\u2212n\u0304)$ from a single trajectory driven by $V\u0304$. The basic idea is to use Torrie–Valleau reweighting,^{169} i.e., weighting each snapshot by $e\u2212\beta (V(k)(A)\u2212V\u0304(A))$.

*k*

_{B}

*T*. The resulting statistical inefficiency

^{100}would become worse as the system size is increased, eventually making the procedure completely ineffective. Following Ref. 106, one can use instead a cumulant expansion approximation (CEA) that estimates reweighted average under the assumption that the target observable and the potential are joint Gaussians, and yields

*not*exactly Gaussian, but as can be seen in Fig. 12, the qualitative behavior reflects the one seen for the direct reweighting. The statistical error is well-behaved also for large simulations, making the CEA a more generally applicable strategy. We provide a commented example to perform this kind of simulation in the demos/ensemble-deltamu directory; the post-processing needed to evaluate the reweighted averages is implemented in the i-pi-committee-reweight script.

By repeating the simulation at different temperatures, one can determine the temperature dependence of Δ*μ* for each ensemble member separately and use the separately computed values of *T*_{m} to estimate the error in the determination of the melting point. It is worth noting that in Ref. 106, the melting point was determined to be 290 ± 5 K, with a setup similar to the one discussed here, but using a deep ensemble of models, fitted on the same dataset. This estimate is consistent with our observation that $\u27e8ns\u2212n\u0304\u27e9$ at 290 K is very close to zero. The discrepancy with Ref. 14 where the classical melting point for the reference DFT was determined by free-energy perturbation to be at $\u2248275$ K can be attributed to finite-size effects, which tend to destabilize the liquid and increase the melting point.

### D. Cavity molecular dynamics for polaritonics

For molecules confined in optical cavities, molecular transitions may form polaritons and hybrid light–matter states with cavity photon modes.^{170–174} Of particular interest is the vibrational strong coupling regime,^{175,176} in which vibrational polaritons are formed when a molecular vibrational mode is near resonant with an IR cavity photon mode. Under vibrational strong coupling, many molecular properties, including chemical reaction rates and energy transfer rates, can be significantly modified in the liquid phase.^{177–179} For a better understanding of these exciting experiments, both the molecules and the cavity photons need to be propagated self-consistently with realistic parameters. This requirement is beyond the scope of standard molecular dynamics schemes, whereby the time-dependent dynamics of the photon field are not explicitly propagated.

^{110,111}Within the framework of CavMD, both nuclei and cavity photons in the IR domain are treated classically, propagated by the following light–matter Hamiltonian under the ground electronic state:

*H*

_{QED}=

*H*

_{M}+

*H*

_{F},

^{110,180}where

*H*

_{M}denotes the standard molecular Hamiltonian composed of the kinetic and potential terms, and

*ω*

_{k,λ}, and

*m*

_{k,λ}denote the momentum, position, frequency, and auxiliary mass for the cavity photon mode defined by a wave vector

**k**and polarization direction

*ξ*_{λ}with

**k**·

*ξ*_{λ}= 0. For example, when the cavity is placed along the

*z*direction,

*λ*=

*x*,

*y*.

*d*

_{λ}is the electronic ground-state dipole moment for the molecular system projected along the direction of

*ξ*_{λ}and $\epsilon \u0303k,\lambda $ represents the effective light–matter coupling strength.

The basic implementation strategy of CavMD is as follows. The cavity photon modes are stored in i-PI as **L** (light) atoms. The coordinates and velocities of both the nuclei and the cavity photon modes are propagated in i-PI. A new *FFCavPhSocket* class, which inherits from the original *FFSocket* class, evaluates the forces of both the nuclei and the cavity photon modes. Similar to *FFSocket*, *FFCavPhSocket* calls external client codes to evaluate the nuclear forces associated with *H*_{M}, via the socket interface. After obtaining this component of the total forces, *FFCavPhSocket* performs additional calculations to obtain the photonic forces and cavity modifications on the nuclear forces as prescribed by Eq. (13). Because this implementation encapsulates the cavity-related calculations within the *FFCavPhSocket* class, many advanced features in i-PI, including RPMD^{110} and ring-polymer instantons, can be directly applied to study vibrational strong coupling without additional modifications to the i-PI code.

The CavMD implementation was used to simulate the IR spectra of liquid water under vibrational strong coupling.^{110} The key i-PI input section to simulate vibrational strong coupling is as follows:

In the *FFCavPhSocket* section above, a cavity mode with frequency *ω*_{c} = 3550 cm^{−1} was coupled to liquid water with an effective light–matter coupling strength $\epsilon \u0303=4\xd710\u22124$ a.u. Under the default setting, the cavity mode is polarized along both the *x* and *y* directions. The partial charges of all molecular atoms are given in the charge_array section of *FFCavPhSocket* and are used to evaluate the molecular dipole moment in Eq. (13). In Fig. 14, we show a schematic picture of the setup along with the IR spectra of water inside and outside the cavity. As shown in Fig. 14(c), to classical CavMD simulations the wide OH band of liquid water splits into a pair of polariton states, the upper polariton (UP) and the lower polariton (LP). Beyond this study, CavMD was also used to explore novel energy transfer and chemical reaction pathways under vibrational strong coupling.^{181–183}

## V. ADVANCED SIMULATION SETUPS

### A. Path integral coarse-grained simulations

While PIMD has demonstrated success in incorporating quantum nuclear motion for equilibrium properties, extracting accurate dynamical information from path-integral simulations is a longstanding challenge.^{184} For instance, in estimating the vibrational spectra, established path-integral approximations to quantum dynamics, such as with RPMD,^{89} CMD,^{91,92} and variants,^{75,93,185–187} present artifacts such as unphysical shifted or broadened peaks.^{93,188} In addition, the computational overhead of path-integral simulations and the impact of the artifacts become more pronounced with lower temperatures.

Musil *et al.*^{119} have recently introduced a machine learning framework to significantly enhance the accuracy and efficiency of approximating quantum dynamics based on CMD. Their approach, referred to as T_{e} PIGS, has two components. First, the PIGS method is introduced to estimate an effective potential with an MLIP, which represents the centroid potential of the mean force, to drive the dynamics of the centroid. Second, the T_{e} *ansatz* estimates the potential of mean force at an elevated temperature to effectively address the “curvature problem” of standard CMD.^{188} This *ansatz* attenuates the spurious redshift in the centroid-based vibrational spectra of high-frequency quantized modes and results in a temperature-transferable effective potential for approximate quantum dynamics. Combining these components, the T_{e} PIGS method yields accurate modeling of vibrational spectra of large systems via inexpensive classical MD on an effective centroid MLIP. Together with an accurate potential energy surface, we observe quantitative agreement with experiments for IR, Raman, and sum-frequency generation vibrational spectra in bulk and interfacial water.^{120}

This approach is easily implemented using the flexible functionality of i-PI as demonstrated for the calculation of accurate IR and Raman spectra of liquid water (see Fig. 15) using the MACE-OFF23(S) general organic MLIP^{38} and an ML model for the polarization and polarizability of bulk water.^{120} A fully documented example with scripts to generate the spectra is provided in the i-PI *demos* directory. For a given potential energy surface *V*(**q**), with **q** being the Cartesian positions, the T_{e} PIGS ML centroid potential of mean force can be trained using a single high-temperature PIMD simulation. The training data include the centroid positions $q\u0304=\u2211j=1Pq(j)/P$, with **q**^{(j)} being the position of the *j*th replica, the instantaneous estimator of the mean centroid force $f\u0304c=\u2211j=1Pf(j)/P$, where **f**^{(j)} is the force acting on the *j*th replica, and the physical force acting on the centroid $f\u0304p=\u2212\u2202V(q\u0304)/\u2202q\u0304$. The elevated temperature is set to *T*_{e} = 500 K, and the centroid potential of mean force is fit from a 10 ps long path integral simulation with *P* = 8, with the centroid positions and forces sampled every 10 fs (20 timesteps). The centroid positions and instantaneous forces ($q\u0304$ and $f\u0304c$) can be printed out as follows:

The quantity $f\u0304p$ can be calculated in a post-processing step with a “replay” simulation, which allows for the calculation and output of properties on a list of centroid positions. Alternatively, the physical force on the centroid can be printed on the fly by defining a “dummy” force component with a zero weight that receives centroid positions using

Printing out this force component requires a single line:

With these data, a force matching method can be used to obtain the T_{e} PIGS MLIP by regressing the centroid force against the centroid position. This can be done transparently using standard MLIP packages, such as n2p2,^{189} nequip,^{190} and mace,^{191} by replacing physical positions and forces, as is typically done while training MLIPs, with positions and forces associated with the centroid. To simplify the training problem, we advise to learn the difference between $f\u0304c$ and $f\u0304p$. The scripts necessary for dataset generation, curation, and training with mace are provided in the i-PI *demos* directory.

After the MLIP is trained in this fashion, performing a T_{e} PIGS simulation is as simple as running classical molecular dynamics on the sum of the potential energy surface *V*(**q**) and the T_{e} PIGS MLIP. This can also be done in i-PI by exploiting again the modularity of forces. We only need to define two forcefield sockets for the potential energy surface and the T_{e} PIGS MLIP:

The addition of the potentials, along with the calculation of all derivatives, such as the force and stress, is done under the hood by defining the forces of the system class comprising two force components that communicate with the forcefield sockets:

### B. 2D IR–Raman spectra of liquid water

^{193}Here, we show an example provided in the i-PI

*demos*directory, which simulates the two-dimensional IR–Raman spectrum of liquid water in which the sample interacts first with two mid- or far-IR pulses through a dipole interaction and then with a near-IR or visible pulse through a Raman process.

^{192,194}The spectrum is obtained as a two-dimensional Fourier sine transform of the two-time response function

*P*= 1, the method corresponds to the MD approach of Hasegawa and Tanimura.

^{116}In passing, we note that other methods based on imaginary-time path integrals have been reported to calculate non-linear response functions using approximations to fully symmetrized multi-time Kubo correlation functions.

^{195–197}

*q*

_{0},

*p*

_{0})—one backward equilibrium trajectory along which we evaluate the dipole moment and two nonequilibrium trajectories that are initialized with a momentum kick, i.e., starting from (

*q*

_{±,0}=

*q*

_{0},

*p*

_{±,0}=

*p*

_{0}±

*ɛμ*′(

*q*

_{0})/2) [see schematics in Fig. 16(a)]. The two-time response function is then computed as

*β*.

*ɛ*is a parameter that controls the magnitude of the perturbation (“kick”) applied to the nonequilibrium trajectories. Position-dependent operators

*α*(

*q*) and

*μ*(

*q*) in Eq. (15) are evaluated as averages over

*P*beads of the ring polymer, such as in the case of linear RPMD correlation functions.

^{89}

The evaluation of the nonequilibrium trajectories, two-time response function, and two-dimensional spectrum is implemented in the scripts provided in demos/2D-IR–Raman. The noneqm-traj.py script uses checkpoint files and *z*-dipole nuclear gradients stored along a longer RPMD trajectory to run multiple shorter nonequilibrium trajectories [see Fig. 16(a)]. In this way, the long RPMD trajectory serves for both sampling initial conditions and computing the response function. To properly sample the canonical ensemble, the whole calculation is typically repeated with independent RPMD trajectories. The noneqm-response.py script then computes the response function by reading in the dipole moments and polarizabilities along the long equilibrium trajectory and nonequilibrium trajectories. Finally, two-dimensional IR–Raman spectroscopy strongly depends on the nonlinearity of the polarizabilities and dipole moments (electrical anharmonicity). To study these and other phenomena that are sensitive to such nonlinearities, we implemented a simple truncated dipole-induced-dipole model for water in i-PI’s internal Fortran driver (option -m water_dip_pol). Classical and TRPMD two-dimensional IR–Raman spectra of liquid water at 300 K are displayed in Fig. 16(b) showing that NQEs play a minor role in this observable. Overall, this example illustrates how minor scripting and modifications outside of the core i-PI code can be used to implement new methods and study various types of dynamical properties.

### C. Melting of charge density waves in 2D materials

Downfolded lattice models aim to reduce the complexity of the full electronic problem by treating only a few low-energy electronic states (around the Fermi energy) explicitly and defining a simplified Hamiltonian where all other high-energy states are integrated out.^{198} The parameters required for this dimensionality reduction can be obtained directly from *ab initio* calculations. There are several ways to include the effect of screening of the high-energy electrons on the interaction terms related to the explicit degrees of freedom of the downfolded Hamiltonian,^{199} and making appropriate choices in this respect is necessary to reproduce the fully *ab initio* potential energy surface of certain materials.^{20}

i-PI has recently been employed in this context, through its new interface with the elphmod code.^{62} With elphmod, it is possible to set up downfolded models for the electronic interactions and for the electron-nuclear interactions, the latter being far less common to find in literature. Having access to the nuclear degrees of freedom in the Hamiltonian makes it possible to calculate nuclear forces efficiently in the order of milliseconds per MD step per atom.^{20}

In connection with i-PI, this capability allows for the simulation of different types of molecular dynamics with such downfolded models. While MLIPs could play a similar role, there is an added value of physical interpretability when employing these models. Moreover, even if downfolded models cannot be applied to all types of materials, for suitable systems, they are straightforward to parameterize.

In Ref. 20, large-scale replica-exchange and path-integral replica exchange simulations were performed with the i-PI/elphmod combination in order to study the charge-density-wave (CDW) phase transition in monolayer 1H–TaS_{2}. The downfolded model was obtained from PBE density-functional theory calculations. In particular, the phase transition between the 3 × 3 CDW and the symmetric structure was studied. We note that capturing the nuclear dynamics involved in such phase transitions calls for techniques that include non-perturbative anharmonic effects, such as (PI)MD. In addition, as explicitly shown in Ref. 20, finite-size effects on the phase transition are significant for small unit cells of these systems, calling for potentials that are efficient and accurate enough to allow for the simulation of larger unit cells containing thousands of atoms.

Experimentally, the “melting” temperature of the 3 × 3 CDW in bulk TaS_{2} is measured at around 75 K.^{200} While the phase-transition temperature is not known for the monolayer, it is accepted that it should not strongly deviate from the bulk temperature. In Fig. 17, we show the ratio between classical and quantum **q**-resolved structure factors in (a) and the intensity of the CDW 3 × 3 diffraction peaks with varying temperature in (b). The inflection point of these curves is shown to correlate with a peak in heat capacity calculated from the simulations. Nuclear quantum effects decrease the transition temperature by around 20 K, bringing it in closer agreement with the experimental estimate.

Such simulations open the path for a closer analysis of particular phonon modes that contribute to the CDW phase transition, an analysis of the co-existence of different CDW phases at metastable regions, and the mapping of more complex CDW phase diagrams. Given the close connection between CDW phases and superconductivity, this is a promising field of study. An example of how to set up and run the elphmod code in connection with i-PI is included in the *examples/clients* directory.

## VI. CONCLUSIONS

In this paper, we described the current capabilities of the i-PI code. The fundamental reasoning behind the design decisions made during its early development, namely, an architecture that allows for rapid prototyping in Python, and utilities to facilitate implementing complicated sampling algorithms still prove to be very useful. However, some of the underlying assumptions made in the past are no longer valid. For example, the widespread adoption of MLIPs has dramatically reduced the cost of energy evaluation relative to explicit electronic-structure calculations, making the overhead associated with the evolution of the atomic coordinates a more pressing concern. In this paper, we showed that relatively superficial changes in the implementation of some of the core classes in i-PI substantially accelerated the most severe bottlenecks, ensuring that i-PI can now be used in combination with state-of-the-art MLIPs without adding significant overhead, even when pushing the energy evaluation to high levels of parallelism.

At the same time, the modular design of i-PI makes it easy to implement radically new modeling ideas, bringing algorithmic speed-ups and new useful features to the codebase. A bosonic PIMD implementation with quadratic scaling, the implementation of the ring polymer instanton with explicit friction method, a framework for dynamics driven by an electric field, a framework for simulations of light–matter coupling, and a better interface with ensembles of MLIPs, including uncertainty estimation, are the five examples we highlight. Many useful, and sophisticated, simulation workflows can also be realized without major implementation efforts, by simply combining existing methods with external scripts and suitable post-processing. We demonstrate this with three applications: path integral coarse-grained simulations for vibrational spectra, calculations of the 2D-IR spectrum of water, and the melting of 2D charge-density waves.

As MLIPs become faster and enable simulations with larger lengths and time scales, it might be necessary to revisit some of the deeper architectural decisions of i-PI to further reduce the overhead inherent to the serial Python-based implementation. The work that was made in preparation for this release demonstrates that these kinds of optimizations can coexist with a modular design, which supports the fast implementation of new and exciting methods for advanced molecular simulations.

## ACKNOWLEDGMENTS

Y.L. was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Project No. 467724959. V.K. acknowledges support from the Ernest Oppenheimer Early Career Fellowship and the Sydney Harvey Junior Research Fellowship, Churchill College, University of Cambridge. V.K. is grateful for computational support from the Swiss National Supercomputing Centre under Project No. s1209, the UK national high-performance computing service, ARCHER2, for which access was obtained via the UKCP consortium and the EPSRC under Grant No. ref EP/P022561/1, and the Cambridge Service for Data-Driven Discovery (CSD3). T.B. acknowledges financial support from the Swiss National Science Foundation through the Early Postdoc Mobility Fellowship (Grant No. P2ELP2-199757). B.H. acknowledges support from the USA–Israel Binational Science Foundation (Grant No. 2020083) and the Israel Science Foundation (Grant Nos. 1037/22 and 1312/22). Y.F. was supported by Schmidt Science Fellows, in partnership with the Rhodes Trust. T.E.L. was supported by start-up funds from the University of Delaware Department of Physics and Astronomy. M.R. and E.S. acknowledge computer time from the Max Planck Computing and Data Facility (MPCDF) and funding from the IMPRS-UFAST program and the Lise–Meitner Excellence program. M.R. thanks Jan Berges for a careful read of Sec. V C. M.C., M.K., and D.T. acknowledge funding from the Swiss National Science Foundation (SNSF) under the projects CRSII5_202296 and 200020_214879. M.C. and D.T. also acknowledge the support from the MARVEL National Centre of Competence in Research (NCCR) M.C. acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Program under Grant No. 101001890-FIAMMA. We thank Eli Fields for helpful comments.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Yair Litman**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Venkat Kapil**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Yotam M. Y. Feldman**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Davide Tisi**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Tomislav Begušić**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Karen Fidanyan**: Software (equal). **Guillaume Fraux**: Software (equal). **Jacob Higer**: Software (equal). **Matthias Kellner**: Software (equal). **Tao E. Li**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Eszter S. Pós**: Software (equal). **Elia Stocco**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **George Trenins**: Software (equal). **Barak Hirshberg**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Mariana Rossi**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Michele Ceriotti**: Software (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

i-PI can be obtained downloaded from https://github.com/i-pi/i-pi. The tag v3.0.0-beta was used to generate the examples and benchmarks reported in this article. The data required to reproduce the figures and benchmark simulations are available at https://github.com/i-pi/ipiv3_data.

## REFERENCES

*ab initio*accuracy

*Ab initio*thermodynamics of liquid and solid water

*ab initio*path integral molecular dynamics simulations

*Ab initio*electron-lattice downfolding: Potential energy landscapes, anharmonicity, and molecular dynamics in charge density wave materials

*Ab initio*study of water dissociation on a charged Pd(111) surface

_{3}SO

_{3}H and H

_{2}O

_{2}in phenol

_{3}PS

_{4}solid electrolytes with

*ab initio*accuracy

*Ab initio*molecular simulations with numeric atom-centered orbitals