We present RASPA3, a molecular simulation code for computing adsorption and diffusion in nanoporous materials and thermodynamic and transport properties of fluids. It implements force field based classical Monte Carlo/molecular dynamics in various ensembles. In this article, we introduce the new additions and changes compared to RASPA2. RASPA3 is rewritten from the ground up in C++23 with speed and code readability in mind. Transition-matrix Monte Carlo is added to compute the density of states and free energies. The Monte Carlo code for rigid molecules is based on quaternions, and the atomic positions needed in the energy evaluation are recreated from the center of mass position and quaternion orientation. The expanded ensemble methodology for fractional molecules, with a scaling parameter *λ* between 0 and 1, now also keeps track of analytic expressions of d*U*/d*λ*, allowing independent verification of the chemical potential using thermodynamic integration. The source code is freely available under the MIT license on GitHub. Using this code, we compare four Monte Carlo (MC) insertion/deletion techniques: unbiased Metropolis MC, Configurational-Bias Monte Carlo (CBMC), Continuous Fractional Component MC (CFCMC), and CB/CFCMC. We compare particle distribution shapes, acceptance ratios, accuracy and speed of isotherm computation, enthalpies of adsorption, and chemical potentials, over a wide range of loadings and systems, for the grand canonical ensemble and for the Gibbs ensemble.

## I. INTRODUCTION

Monte Carlo (MC) codes are rarer than Molecular Dynamics (MD) codes. The MD algorithm is based on Newton’s equations of motion, while MC is based on probability distributions.^{1,2} MC is therefore limited to static properties on the one hand, but on the other hand, it has considerable opportunities to sample system states more efficiently than MD. MC can be easily applied to a wide variety of ensembles, such as the NPT-ensemble to compute fluid properties, the Grand-Canonical (GC) ensemble to compute adsorption, and the Gibbs ensemble to compute Vapor–Liquid Equilibrium (VLE).^{3,4} MC is also very suitable for advanced biased sampling and extended ensembles, using techniques such as Configurational-Bias MC (CBMC),^{4} Continuous Fractional Component MC (CFCMC),^{5–9} and CB/CFCMC.^{10,11} Several established MC codes and techniques were reviewed in Ref. 12, but some new codes have appeared in recent years.

GOMC^{13} is open-source software for simulating molecular systems using the Metropolis Monte Carlo algorithm. The software has been written in object-oriented C++ and uses OpenMP and NVIDIA CUDA to allow for execution on multicore central processing unit (CPU) and graphics processing unit (GPU) architectures, respectively. GOMC employs the widely used simulation file types (PDB, PSF, CHARMM-style parameter files) and can be used to study vapor–liquid equilibria, adsorption in porous materials, surfactant self-assembly, and condensed phase structure for complex molecules.

A relatively new code is Brick-CFCMC.^{9,14,15} This software package is designed for performing force field-based molecular simulations of single- and multi-component fluids using state of the art CFCMC techniques. Various ensembles can be combined: NVT, NPT, the Gibbs ensemble, the reaction ensemble, the grand-canonical ensemble, and the osmotic ensemble. Properties such as excess chemical potentials, fugacity coefficients, partial molar enthalpies, and partial molar volumes can be directly obtained from single simulations. Brick-CFCMC can also perform thermodynamic integration to compute free energies (including ionic systems).

RASPA2^{16} is our own open source software package for simulating adsorption and diffusion of molecules in flexible nanoporous materials. The code implements the latest state-of-the-art algorithms for molecular dynamics and Monte Carlo in various ensembles, including symplectic/measure-preserving integrators, Ewald summation, configurational-bias Monte Carlo, continuous fractional component Monte Carlo, reactive Monte Carlo, and Baker’s minimization. Applications of RASPA include computing coexistence properties, adsorption isotherms for single and multiple components, self- and collective diffusivities, reaction systems, and visualization. Gowers *et al.*^{17} analyzed several Monte Carlo codes, RASPA,^{16} Cassandra,^{18} DL Monte,^{19} Music,^{20} and Towhee,^{21} and found that their computational performance was quite comparable.

Integrating simulation code with visualization and application-based models is especially interesting for design pipelines. iRASPA^{22} is a GPU-accelerated visualization package (with editing capabilities) aimed at materials science. iRASPA is available on macOS, Linux, and Windows and leverages the latest visualization technologies with high performance. iRASPA extensively utilizes GPU computing. For example, void-fractions and surface areas can be computed in a fraction of a second for small/medium structures and in a few seconds for very large unit cells. RUPTURA^{23} is a free and open-source software package for (1) the simulation of gas adsorption breakthrough curves; (2) mixture prediction using methods such as the Ideal Adsorption Solution Theory (IAST), segregated-IAST, and explicit isotherm models; and (3) fitting of isotherm models on computed or measured adsorption isotherm data. The RASPA and RUPTURA software enables computation of breakthrough curves directly from adsorption simulations in the grand-canonical ensemble. The combination of the RASPA simulation code with iRASPA visualization and RUPTURA breakthrough modeling is an example of such a begin-to-end design pipeline.

The fundamental concept in adsorption science is the adsorption isotherm, i.e., the equilibrium relation between the quantity of the adsorbed material and the pressure or concentration in the bulk fluid phase at a constant temperature.^{24} One of the main applications of RASPA is to compute adsorption isotherms and to understand the atomic-level mechanism of adsorption and separations. This type of information is very hard to obtain experimentally, but in simulations, the positions of the molecules can be examined and “kinks” in isotherms can be explained.^{25–27}^{,} Figure 1 shows a volume-rendered picture of the density of methane in ZSM-5 (MFI-type) zeolite and that of CO_{2} in the Cu-BTC-type metal–organic framework (MOF). MFI is a three-dimensional silicon/oxygen structure with interconnected straight and zigzag ten membered rings.^{28} Copper benzene-1,3,5-tricarboxylate (Cu-BTC) is a pocket/channel structure, with a pore network that has a simple cubic symmetry.^{29} The density of adsorbates at a certain temperature and pressure can be analyzed to identify the location of strong adsorption. However, competition and segregation can also be elucidated.^{30}

RASPA2 has previously been used extensively in molecular simulation studies of adsorption and diffusion in nanoporous materials.^{16} The Grand-Canonical Monte Carlo (GCMC) simulation engine is used to computationally predict the adsorption behavior and can be used stand-alone or in support of performed experimental work. As such, one could validate and explain the adsorption of small gas molecules in synthesized porous materials.^{31–37} Structural and thermodynamic analyses can be carried out to analyze molecular (binding) geometries.^{38–42} Large-scale screening efforts are supported by the easy featurization of porous materials into surface area, pore volume, and adsorption isotherms.^{43–48} More recently, machine learning efforts have been using these featurizations at the core of dataset design for training generative models.^{49–52} With all these applications in mind, it is essential to improve RASPA’s user experience, computational efficiency, and interface.

In this paper, we present the new version of RASPA. RASPA3 is rewritten from the ground up in C++23, with large improvements in user experience and computational efficiency. In Sec. II, we describe our design decisions and programming details, followed by the new features in Sec. III. In Sec. IV, we validate RASPA3 using adsorption studies of methane and CO_{2} in MFI, an equimolar mixture of CO_{2} and N_{2} in the zeolite MFI, and CO_{2} in the Cu-BTC MOF, and VLE Gibbs simulations of methane. We show that RASPA3 reproduces the results from RASPA2, but RASPA3 is a factor of 3–6 times faster compared to RASPA2. Using RASPA3, we will show that the modern CBMC technique is a significant improvement over the standard conventional unbiased Metropolis MC technique and that recent techniques, such as CFCMC and CB/CFCMC, have overcome the downside of standard MC and CBMC, namely the drop in insertion/deletion rates in open ensembles at low temperatures and high densities.

## II. DESIGN CHOICES

RASPA3 is redesigned and rewritten from the ground up in C++23. This is a major improvement from the functional C code in RASPA2, both in terms of computational efficiency and in terms of a reduction in technical debt. Technical debt is caused by sub-optimally written and designed code that builds over time and severely limits the extensibility of the code. User experience has been improved through a variety of changes in input- and output files as well as the definition of an Application Programming Interface (API) for the user to interact with.

Major improvements in efficiency result from the implementation of C++ code based on composition and value semantics (in contrast to inheritance and reference semantics). Composition is a structural approach in which complex objects are built from simple building blocks. Value semantics avoid sharing mutable state and uphold the independence of values to support local reasoning. References are only used implicitly, at function boundaries, and are never stored. This style of coding has many advantages. Avoiding complex inheritance hierarchies and reducing the need for virtual functions lead to major code simplifications. It encourages code clarity and local reasoning. With composition, components are directly included in an object, removing the necessity for pointers, manual memory allocations, and indirections. This leads to straightforward memory patterns and access safety. Lifetimes of objects are coupled with composition, eliminating the need for explicit memory management. These lead to better compiler optimization and, therefore, performance of the code.

Value semantics make the code much easier to write and much easier to understand. The coding style is essentially C-like but uses convenient C++ abstractions and features from the C++ standard library. The standard library offers containers such as std::vector and std::set, which avoid any manual memory allocations. Containers offer convenient functionality for memory operations, such as insertion, appending, deletion, swapping, and sorting. The main data item in the code is the Atom structure, which contains the position of an atom, the type, the component-identifier, the molecule-identifier, and a group-identifier (for thermodynamic integration). The atoms are stored in a std::vector<Atom>. Rigid molecules are handled using center of mass positions and an orientation. We therefore also need a vector of molecules, std::vector<Molecule>, containing Molecule objects, which hold a center of mass position and a quaternion (to describe the orientations). After generating the Molecule structure, the atom positions for the molecule are generated since the computation of the energy is based on potentials that operate on the positions of the atoms. The order in the atom and molecule vectors are framework atoms first and the molecule atoms of the components one after the other.

The C++20 and C++23 language additions have recently offered substantial opportunities to write better code. The first is std::span. Spans offer views into data, such as a vector, and are a much better alternative than passing a memory reference and length. Spans have reference semantics but are only used as function arguments and not as data members. It is dangerous to keep references to vector elements around as any memory insertion or deletion to the vector object might cause an internal reallocation. Using spans, the routines of computing the energy can be quite generic and based on spans into the atom- and molecule-vectors. For example, the same functions can be used to compute framework–molecule energies of all components and to compute the interaction of the framework with a single component, as the function takes only the span as argument. Molecule–molecule energy computation differs fundamentally from framework–molecule energy calculations in that for molecule–molecule interactions, self-exclusion must be applied, i.e., a particle does not interact with itself. In C++23, span is extended to the multi-dimensional span std::mdspan. Data in a container, such as a vector or an array, can be described as multi-dimensional. This circumvents the manual mapping from 1D to multi-dimensional data. Mdspan is a non-owning view of a contiguous sequence of elements in memory, interpreted as a multi-dimensional array.

The second new addition, included in C++17, is std::optional. This is convenient in the routine to compute the difference in energy between a particle at its old position and at its new position. First, the energy is computed at the new position, but when an overlap is detected, the code bails out early by returning std::nullopt. When no overlap is detected, the routines return the energy value. Optional return values are therefore a very convenient way of stating that two cases can occur: (1) either the computation has failed (e.g., overlap detected) or (2) the computation succeeded and the value is returned.

Another C++20 feature employed are modules. Modules eliminate or reduce many of the problems associated with the use of header files. Modules do away with header files and offer significant compile-time improvements and isolation of macros. Modules are only imported once and therefore indifferent to which order the modules are imported.

## III. NEW FEATURES

### A. GitHub

GitHub is an online software development platform, owned by Microsoft since 2018, for software development and version control. GitHub is centered around the Git version control system, providing all the features of Git for managing versions and source code. The system ensures that all the team members are working on the latest version of the code and can work simultaneously on the project. If users want to contribute to the project, they can use a “fork and pull request” workflow. GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform that allows automation of the build, test, and deployment pipeline. Using GitHub Actions, after a git push command or a pull request, the unit tests are automatically run. Similarly, using GitHub Actions, the rpm-, deb-, and tzst-packages for Red Hat-, Debian-, and Arch-based Linux distributions are automatically generated (see Fig. S1). The RASPA3 package is available (MIT license) from GitHub at https://github.com/iraspa/RASPA3.

### B. JSON

The input of RASPA3 has been changed to JSON format.^{53} JSON stands for JavaScript Object Notation. It is a lightweight data-interchange format in plain text written in JavaScript Object Notation. It is language independent. Figure 2 shows an example input for a Gibbs ensemble simulation at 240 K using two boxes, one for the gas phase and one for the liquid phase.

As can be seen in Fig. 2 for the Gibbs ensemble input, the input for components and the systems is coupled. There are two systems (see Systems-keyword), and one therefore needs to specify how many molecules one would like for system 0 and for system 1, respectively, defined by the CreateNumberOfMolecules-keyword. Using JSON, all data are immediately available to inspect and this cross-type of input is easier to parse in JSON than a top-down approach, which requires several sequential parses.

Similarly, the JSON format is now also used for the output. Where previously all initial data (e.g., hardware info and unit conversion factors) and statistics (e.g., CPU timings and MC move statistics) were written to a text file, these are now written as a nested dictionary to a JSON file. We also support the universal standard archive file-format for writing out adsorption data.^{54}

### C. CMake

CMake is likely the most popular and most powerful build system generator for C++. The build system generator CMake is platform- and compiler-independent. When porting a project from one system to another, the code does not have to be rewritten for CMake. There is a language called the CMake language, which allows embedding of intricate logic into CMake scripts.

CMake 3.28 and later support C++ modules. C++20 named modules are now supported by the Ninja Generator 1.11 or newer in combination with the LLVM/Clang 16.0 and newer, MSVC toolset 14.34 and newer, or GCC 14 and newer. We recommend LLVM/Clang 18 or higher for compiling RASPA3. This version has support for std::format, std::print, and std::jthread. The output-files of RASPA3 are in UTF-8 encoding and contain Unicode characters (e.g., for Å).

CMake contains the CPack system, which allows the creation of cross-platform packages and installers for Linux, Windows, and macOS. These packages are automatically created in GitHub upon release, using Docker-images for each operating system, and the CPack system; see Fig. 3 and Fig. S1.

### D. Doxygen documentation

Doxygen is a documentation generator and static analysis tool for software source tree documentation. Information is extracted from specially formatted comments within the code when used as a documentation generator. When Doxygen is used for analysis, it uses its parse tree to generate diagrams and charts of the code structure. Documentation and code can be compared in Doxygen, making it easy for users when interpreting API documentation with code side by side.

Doxygen has built-in support for generating inheritance diagrams for C++ classes and composition visualization (see Fig. 4). Doxygen can use the graphviz tool to generate these advanced diagrams and graphs. Graphviz is an open-source, cross-platform graph drawing toolkit. The RASPA3 documentation can be found at https://iraspa.github.io/RASPA3/.

### E. Unit tests

Software testing helps ensure code quality and is an important part of software development. Unit tests test the smallest functional units of code and prevent developers breaking existing code when refactoring or adding new code. The GoogleBenchmark framework is used for benchmarking,^{55} and the GoogleTest framework is used for testing and mocking.^{56} It is a platform-independent device that automates testing. Using GitHub Actions, the code is automatically built and tested (see Fig. 5).

The unit tests in RASPA3 are arranged hierarchically. Atoms are placed at several distances, and the Lennard-Jones potentials are tested and compared with the analytically computed energy. Then, molecules are placed within a zeolite framework at predetermined positions and compared to the energies computed with RASPA2. The various energy routines used in biased-sampling are tested by comparing to the general routines. The various routines for the gradients are tested by comparing the computed gradients to the values computed by finite difference schemes based on the energy. Likewise, the strain-derivative tensor (related to the pressure) is tested by comparing to the values computed by finite difference schemes based on the energy of a strained cell. Some NIST reference calculations of intermolecular energy for the SPC/E water system have also been included in the unit tests (see Table S1).

### F. Python interface

The RASPA3 C++ simulation engine is made available to Python via a Pybind11 interface.^{57} For use in Python, RASPA3 is built as a shared library, allowing its functions to be used by Python users and enabling seamless interactions with the simulation routines. Through the API, the library allows for invocation of RASPA3’s simulation routines from Python scripts, calling the same simulation routines as via the JSON input. This execution directly from Python enables the ability to prototype simulation settings and incorporate RASPA3 into existing workflows.

Moreover, the RASPA3 library functions can directly be called from Python, enabling more advanced scripting and building pipelines. Such pipelines, for example, can be built by having modules interact with the simulation results and runtime settings. As such, one could device feedback loops that check convergence or check on multiple simulations simultaneously. In addition, in line with modern (generative) Artificial Intelligence (AI) applications, it allows the users to have an external agent modify the current state of the system.

As of publication (2024), current pip and Pybind11 build systems do not yet support C++23 modules, necessitating the build to be carried out as a shared library from CMake. While this does not impact functionality, it does limit the option to build the RASPA3 library in line with PEP-518 requirements. Examples demonstrating the functionality of the Python interface and pipelines are implemented and documented in the examples directory on GitHub. These examples serve as practical guides, illustrating how to effectively utilize the interface. An example of a simulation carried out with interfaced RASPA3 code is shown in Fig. S2.

### G. Cube-file output

The Gaussian software package defines the cube file format for describing volumetric data and atom positions. The file has a header that contains the atom’s information, box vectors, and size of the volumetric data. The volumetric data follow one scalar per voxel element. All aspects of the file are text (human readable). Originally, the numerical values were five characters wide for integers that started each header line, and floating point values were formatted 12.6 (12 characters wide with six decimal places). The file contains one line for each atom consisting of five numbers: the first is the atom number, the second is the charge, and the last three are the x, y, z coordinates of the atom center. This is followed by a list of *N*^{3} floating point numbers for each specified voxel that gives the normalized number density at that point.

This file format is used in RASPA3 to output the average density measured on the voxelgrid in a simulation. These densities can be used to visualize the density of an adsorbate in a nanoporous material as in Fig. 1, where methane is shown in MFI and CO_{2} is shown in Cu-BTC. RASPA3 stores the data in a unit-cube format using fractional positions and hence can handle triclinic cells (viewable with iRASPA).

### H. HDF5 output

RASPA3 includes a Hierarchical Data Format version 5 (HDF5) writer for logging data produced during the simulation.^{58} It offers many benefits compared to the previously used text output. HDF5 is a file format that uses a file system-like hierarchy to structure data outputs. The advantages include data compression, parallel I/O support, and easy accessibility from both python (with h5py and xarray) and other languages using the HDF5 library. Crucially, the dataset can be accessed without loading the full dataset into memory.

An HDF5 file is hierarchically structured as represented in the graph in Fig. 6, where the data are split into groups, datasets, and metadata. Groups act like directories, which hold multi-dimensional arrays called datasets and named attributes, herein named metadata. Accessing, for example, the RDF dataset can be done in a POSIX-like syntax by selecting /computed_properties/rdf.

In RASPA3, the output data are split into several groups to maintain clarity and ease of access. The primary groups created are runtime and computed_properties, containing data generated during the run and statistics of computed properties, respectively. Both of these groups hold datasets, which hold multi-dimensional data. The size of the dataset is determined by the sampling frequency and the dimensionality of the data itself. For example, a dataset BoxLengths in the group runtime_log will have size (*N*, 3, 3) for the current 3 × 3 box matrix at each of the *N* write steps.

### I. Quaternions

Flexible molecules have no internal constraints and are specified using internal potentials, such as bond-, bend-, and torsion-potentials. For efficiency reasons, many small molecules are modeled as rigid. Examples are water, CO_{2}, N_{2}, CO, O_{2}, and CH_{4}. Rigid molecules are described by a center of mass and an orientation.

*q*

_{0},

*q*

_{1},

*q*

_{2},

*q*

_{3}}, is introduced in order to generate a minimal, nonsingular representation of the rotation matrix from a space-fixed system, denoted “s,” to a body-fixed coordinate system, denoted “b,”

*q*. In the body-fixed coordinate system, the moment of inertia tensor

*I*is diagonal. Upon reading the molecule file, the moment of inertia tensor is computed and then diagonalized. The positions are then recomputed in the body-fixed coordinate system.

Monte Carlo moves on rigid molecules have been re-implemented to make use of quaternions. Translation moves change the center of mass, while rotation moves operate on the quaternion. Trial positions are then generated using Eq. (2). The advantage is that MD/MC hybrid moves are now consistent between MC and MD. Moreover, it is no longer necessary to compute quaternions from atomic positions. The MD integration of the equations of motions for rigid molecules in RASPA3 is based on the quaternion implementation of Miller *et al*.^{59}

### J. Expanded ensembles

Measuring chemical potentials in open ensembles at high densities is difficult due to the lack of sufficient insertions and deletions of particles. The Widom-insertion methodology similarly fails at high densities and/or low temperatures.^{11} As single-step insertions are not efficient, it is natural to consider insertions in multiple steps in such a way that the surrounding of a molecule that is inserted can simultaneously adapt. This is the central idea that is applied in expanded-ensemble methods. In the expanded ensemble methods in general, interactions of a so-called fractional molecule are scaled with a coupling parameter *λ* in such a way that interactions between the fractional molecule and the surrounding “whole” molecules (either other adsorbates or framework atoms) vanish for *λ* = 0 and that those interactions are fully developed for *λ* = 1.^{10,11} By including Monte Carlo trial moves for sampling *λ*, different (expanded) ensembles, such as the NPT-, grand-canonical-, reaction-, and Gibbs ensemble, can be simulated. The CFCMC method considerably improves the insertion or deletion of molecules while allowing for a direct computation of chemical potentials and partial molar properties.^{8,10,14}

*λ*-sampling implementation, as opposed to the continuous

*λ*-sampling often employed. Using

*n*

_{sample}sample points, with $\Delta \lambda = 1 n sample \u2212 1 $, we can map the indices

*i*= 0 …

*n*

_{sample}to

*λ*-values,

*λ*= 0, and the last sample-index

*n*corresponds to

*λ*= 1. Figure 7 shows a graphical representation of the mapping for

*n*

_{sample}= 7. It is preferable that an odd number of sample points are used and, hence, the numbers of intervals is even, because then $\lambda = 1 2 $ is explicitly defined at sample index (

*n*

_{sample}− 1)/2.

### K. Thermodynamic integration

^{8}The integration runs from state A, with

*λ*= 0, to state B, with

*λ*= 1, where the interactions of the inserted molecule are gradually scaled from 0 to that of a full molecule. The change in free energy between states

*A*with

*λ*= 0 and state

*B*with

*λ*= 1, i.e., the insertion of a particle, can be computed from the integral of the ensemble averaged derivatives of the potential energy over the coupling parameter

*λ*,

*λ*values, calculating the ensemble-averaged derivative of $U \lambda $ with respect to

*λ*at a series of

*λ*values from 0 to 1, and finally computing the integral over the ensemble-averaged derivatives.

*λ*

_{vdw}and

*λ*

_{el}. Increasing

*λ*

_{vdw}smoothly scales from 0 to 1 between $\lambda \u2208 [ 0 , 1 2 ] $ and

*λ*

_{el}smoothly scales from 0 to 1 in $\lambda \u2208 [ 1 2 , 1 ] $, ensuring that when inserting a molecule, the van der Waals interactions are first activated and then the Coulomb interactions. For the Coulomb interactions,

*λ*

_{el}scales the charges, whereas for the scaled Lennard-Jones, a soft-core potential is used to avoid divergence at

*r*= 0 when

*λ*→ 0. The value of

*∂U*/

*∂λ*is computed using the chain rule,

^{15}

*λ*-implementation of RASPA3 has the advantage that the sample point probes the exact

*λ*value, instead of an average over the bin. All states, including the states

*λ*= 0 and

*λ*= 1, are

*explicitly*sampled. Moreover, Simpson’s integration rules for thermodynamic integration can then be conveniently applied.

^{60}The Simpson’s 1/3 rule assumes a quadratic behavior for $f \lambda i $,

### L. Transition-matrix Monte Carlo

^{61,62}The goal of GC-TMMC is to calculate the particle number probability distribution (PNPD) instead of direct calculation of ensemble averages. A biasing scheme encourages accurate examination of metastable states. At its core, the method computes the transition matrix

**with probabilities of changing particle number. This transition matrix can be used to obtain the PNP,**

*P**μ*,

*V*,

*T*), where Π is the probability of observing N particles. The transition matrix

**can be computed through a biased sampling, where a biasing factor**

*P**η*is added to the acceptance criterion to sample uniformly from particle number

*N*,

*α*(o → n) is the probability of generating the new configuration n from o and

*π*

_{o}is the probability of being in state o. The biasing function

*η*is iteratively optimized to match

*η*(

*N*) = −ln Π(

*N*;

*μ*,

*V*,

*T*). The histogram reweighting of the PNPD at one thermodynamic state point can be used to generate that of another state point without running an additional simulation, using

The newly generated PNPD can be used to generate any of the ensemble averages and thermophysical properties that were derived from the initial PNPD. Thus, a single GC-TMMC simulation can produce an entire adsorption isotherm while also identifying the conditions of phase coexistence, limits of stability, and other properties of interest, at a particular temperature.

### M. Radial distribution function

*g*(

*r*), is one of the most useful tools to measure the structure of a fluid at molecular length scales. It is often defined as the ratio of the average local number density of particles $ \rho r $ at a distance

*r*from a given particle to the bulk density of particles

*ρ*,

^{63}Mathematically, the one-body density distribution is defined as

*δ*indicates the Dirac distribution, and

**r**

_{i}is the position of particle

*i*. The standard particle-based approach to sample $\rho r $ is to discretize the Dirac function and to count events in a histogram, labeled by position

*r*and with bins of a certain size Δ

*V*. Normalization by Δ

*V*and by the number of sampling sweeps ensures the correct normalization $\u222b\rho r dr=N$, where

*N*is the total number of particles. The problem of histogram-based (i.e., “binning”) estimates is that the variance diverges as the bin size h decreases.

^{64}

It is not so much the density that matters but rather the variations in density with respect to distance. The idea behind force-sampling strategies is to sample these variations using estimators involving the force acting on the atoms.^{65,66} Borgis *et al.*^{65} proposed a virial-like estimator as an alternative to Eq. (14). The variance of this force-based estimator is significantly reduced compared to histograms. Force sampling was shown to be ∼5 times more accurate than counting.^{66} The scheme can be seen as an integration of *g*′(*r*) = *dg*/*dr* from *r* → *∞*, where *g* = 1. Note that this is different from the standard method where the plateau value differs from 1 by a $O N \u2212 1 $ correction.^{64}

Figure 8 shows the RDF for water using the counting method vs the force-based method. For a rigid molecule, **f**_{i} is the sum of forces acting on particle *i* and all particles participating in a constraint with *i* (in particular, those belonging to the same rigid molecule as *i*) and where the average is made over configurations satisfying all constraints. In practice, the gradient of the water oxygen (respectively, hydrogen) density is computed by assigning the total force acting on each molecule to the position of the O atom (respectively, H atoms).^{64} Both RDF methods give the same results within statistical error.

## IV. RESULTS

### A. Validation

#### 1. Adsorption

In adsorption studies, one would like to know the amount of molecules adsorbed as a function of pressure and temperature of the reservoir with which the adsorbent is in contact. Therefore, the natural ensemble to use is the grand-canonical ensemble (or *μ*, *V*, *T* ensemble). In this ensemble, the temperature *T*, the volume *V*, and the chemical potential *μ* are fixed. The equilibrium conditions are that the temperature and chemical potential of the gas inside and outside the adsorbent must be equal. The imposed chemical potential *μ* can be related to the fugacity *f* computed directly from the equation of state of the vapor in the reservoir. RASPA3 was checked by comparing the computation of isotherms to RASPA2 in several systems: methane adsorption in MFI, CO_{2} in MFI, an equimolar mixture of methane and CO_{2} in MFI, and CO_{2} adsorption in Cu-BTC. The used force fields and settings are listed in the supplementary material (Tables S2, S9, S16, and S20). For all systems, 1 × 10^{6} cycles were used for the production time.

The adsorption results for all systems show statistically identical results for absolute adsorption loadings (Tables S3, S10, S17, and S21), enthalpies of adsorption (Tables S6, S11, S18, and S24), and chemical potentials (Tables S7 and S25) over a wide range of pressures. The chemical potentials are known in advance, since the chemical potential is directly related to the imposed chemical potential or imposed fugacity. In Tables S8, S15, and S26, we show that the measured fugacity is statistically identical to the imposed fugacity, which is a very good indicator of the consistency and correctness of the results.

In Fig. 9, we show that the particle distributions, closely resembling the PNPD, sampled with CB/CFCMC^{10} are identical in RASPA2 and RASPA3. We compared the results of four insertion/deletion methods: conventional unbiased insertion/deletion, CBMC, CFCMC, and CB/CFCMC. The particle distributions produced by all four used Monte Carlo insertion/deletion methods show statistically identical particle distributions as shown in Figs. S6 and S11. The distribution and the fluctuations are related to the slope of the adsorption isotherm.^{67} The steeper the slope, the larger the amount of fluctuations, which makes steep inflections (e.g., water) hard to sample. Even though the particle distributions might seem similar, properties that are computed based on these fluctuations do show differences depending on what insertion/deletion method is used. For example, the conventional insertion/deletion method shows larger error-bars in the enthalpy of adsorption compared to CBMC or CFCMC methods. This shows that the conventional swap becomes inadequate at higher loadings to predict the adsorption behavior of these systems.

Similarly to the enthalpy of adsorption, conventional or even biased Widom insertions are only valid to compute the chemical potential at low densities and fail at high loadings. We computed the chemical potential using three methods: (1) using the logarithm of the ensemble average of the Rosenbluth weights, (2) using *∂U*/*∂λ* with Simpson’s rule as described in Sec. III K, and (3) using the difference in probabilities *β*^{−1} log(*p*(*λ* = 1)/*p*(*λ* = 0)), as described in Ref. 68. The calculated chemical potential for methane in MFI is shown in Fig. 10, and it shows that for higher densities, the calculations based on the *λ* particles with CFCMC result in lower errors. A similar trend can be seen for the chemical potential of CO_{2} in Cu-BTC in Fig. S7. The results for the chemical potential and error are shown in Tables S7 and S25.

We also compared the acceptance rates of the insertion moves and see that for high fugacities, the acceptance rates for the conventional and CBMC swap methods drop off exponentially. An acceptance rate as low as 10^{−4} is observed for the CBMC swap method at saturation pressures as shown in Figs. S3 and S8. Acceptance rates this low cause the effective correlation time to increase and lead to large errors in the predicted observables.

#### 2. Gibbs-ensemble for VLE

^{4}

^{,}

*T*is the temperature,

*p*is the pressure,

*μ*

_{i}is the chemical potential of component

*i*, and

*m*is the number of components. Simulations in the Gibbs Ensemble (GE) are frequently used to study Vapor–Liquid Equilibrium (VLE) of pure components and mixtures.

^{69}The vapor and liquid phases are simulated using two boxes, which for single components, can change individual volumes (while keeping the total volume constant), leading to equal pressure in both boxes. In addition, particle transfer between the boxes is performed to obtain equal chemical potentials in both boxes. For mixtures, the volume move operates on the individual boxes. VLE-GE is routinely used to develop force fields, because the force field parameters are very sensitive to changes in environment. If a model successfully reproduces VLE data, it reproduces fluid densities over a wide range of pressures and temperatures. The TraPPE model

^{70–75}is an example of such a force field fitted to VLE data.

Similar to simulations in the grand-canonical ensemble, GE simulations rely on sufficiently large acceptance probabilities for particle exchanges between the simulation boxes. However, the acceptance probability for particle exchange can be very low when molecules are large or when the densities are high. When the acceptance probability for insertion/deletion is low, it is not straightforward to verify if the two phases have reached chemical equilibrium and that the chemical potentials of a certain component are equal in the simulation boxes. To overcome this problem, one possible solution is applying expanded ensemble methods.^{76–78} The CFCMC method, recently introduced by Shi and Maginn,^{5,6} is one of the most commonly used expanded ensemble approaches. In the CFCMC GE, there are fractional particles with reduced interactions with the surrounding molecules (instead of inserting whole molecules at once). The strength of this interaction is controlled by a coupling parameter lambda (*λ*) over which we integrate in the partition function, hence the name expanded ensemble. Poursaeidesfahani *et al.* introduced a more efficient formulation of GE combined with the CFCMC technique.^{8} In this formulation, there is only a single fractional component per molecule type that can be in either one of the boxes. This increases the acceptance ratio for particle insertion/deletion. The advantage of the CFCMC method lies within the fact that it does not depend on the occurrence of spontaneous cavities in the system (as in CBMC). The molecules are inserted gradually in the system instead of a single trial move. In the new formulation of the CFCMC GE, the chemical potential can be computed directly without any extra calculations.

In the supplementary material, we have listed the simulation data. The force field was listed in S27. From Table S28, we see that RASPA2 and RASPA3 give similar results, for both the CBMC and Poursaeidesfahani CFCMC method. Tables S29 and S30 show that we indeed have obtained mechanical and chemical equilibrium, since the pressure and the chemical potentials of the vapor and liquid phase are equal to within statistical error. Table S31 shows the acceptance ratios for the CFCMC-GE scheme. The fractional particle indeed shows near equal *λ*-occupancy in the vapor and liquid phases, due to the biasing applied. The *swap* move swaps a randomly chosen integer molecule of another box with the current fractional molecule. The acceptance of this move for this system is higher than 40%. The move to change the *λ* value is scaled to achieve 50% acceptance. The *shuffle* move to shuffle the fractional move to one of the other boxes is around 10%–15%. The *shuffle* move is very effective at low *λ* values, while the *swap* move has high acceptance at near unity *λ* values. Finally, Table S32 shows that the chemical potentials computed from Widom particle insertion and from the CFCMC *λ*-histograms and thermodynamic integration are very similar for this simple system. For more difficult systems, Widom particle integration will start to fail, while the CFCMC-methodology remains effective.^{15}

#### 3. Molecular dynamics

Molecular Dynamics (MD) mimics nature by computing the positions ** x**(

*t*) and velocities

**(**

*v**t*) of particles as a function of time

*t*according to the equations of motion of Newton:

^{3}

^{,}

**=**

*f**m*

**, which relates the force**

*a***to the mass**

*f**m*times the acceleration

**. Newton’s equations of motion yield a constant energy ensemble. To extend MD to other ensembles, the system of equations needs to be extended. The Nosé–Hoover equations of motion introduce temperature control.**

*a*^{79}Martyna and Tuckerman and co-workers derived equations of motion for temperature and pressure control.

^{79–82}For each case, a conserved quantity can be derived that can be used to check the integration error. Figures S12 and S13 show the excellent energy conservation for an NVT simulation of over 100 ns of methane and CO

_{2}, respectively, in 2 × 2 × 2 unit cells of IRMOF-1. Even though the Nosé–Hoover equations of motion perturb the motions, the changes are gentle enough that the obtained values are similar to the values obtained from NVE MD at the same average temperature. For diffusion, no significant influence is observed over a wide range of thermostat masses.

^{4}

*α*is computed by taking the slope of the mean-squared displacement (MSD) at long times,

*N*

_{α}is the number of molecules of component

*α*,

*d*is the spatial dimension of the system,

*t*is the time, and $ r i \alpha $ is the center-of-mass of molecule

*i*of component

*α*. The order-

*n*algorithm for measuring MSDs conveniently and efficiently captures correlations over short, medium, and long times.

^{83}In Fig. S14, we show simulated diffusivities of small gasses in IRMOF-1 at 298 K, compared to previous results of results of Skoulidas and Sholl.

^{84}

### B. Timings

The same simulations used in Sec. IV A were used to measure comparative efficiency of the simulation codes. Figure 11 shows that a significant speedup is observed. The speedup for the simulation of methane in MFI (non-Ewald) is 4–6×, and the speedup of the CO_{2} in Cu-BTC is about 2–4×. This reduces the total core hours necessary to do these isotherm calculations from 170 to 36 core hours for methane in MFI and from 408 to 113 core hours for CO_{2} in Cu-BTC (with CBMC/CFCMC).

In both cases, a higher speedup is observed in the higher fugacity ranges. In these ranges, a higher density is expected, and therefore more particles. Specifying the speedup per type of Monte Carlo move in Figs. S4 and S9 shows that the speedup is most noticeable in the CBMC Widom and reinsertion moves, with speedups of around a factor of 7. This likely explains the fugacity dependence of the speedup. Also promising is the factor 3.5 speedup of the translation move, which is a key move in all MC simulations.

These results clearly show that the speedup is a result of the design choices described in Sec. II. The improved memory layout for the atom lists and the value-based semantics make accessing the particles faster and more reliable. Hence, we see an improved performance for every Monte Carlo move. These results greatly improve the capacity to quickly screen porous materials by computing adsorption isotherms.

### C. Parallelization

The OpenMP API defines a portable, scalable model with a simple and flexible interface for developing parallel applications in C/C++ and Fortran. By adding some additional pragma statements, these OpenMP directives can exploit shared memory parallelism by defining various types of parallel regions. Figure 12 shows some preliminary results on parallelization of the framework–molecule energy computation used in the CBMC routine. As can be seen, parallelization for small systems can be detrimental because not enough work can be given to the cores to overcome the cost of spawning the threads and combining the results. For sufficiently large systems, it starts to pay off, but the gains are marginal. However, for obtaining an equilibrated system, this might be acceptable. Once an equilibrated system is available, task-farming parallelism can be exploited. GPU-implementations show substantial speedups compared to serial CPU implementations of Monte Carlo, and also compared to thread-based/OpenMP CPU parallelization.^{62} Currently, RASPA3 has no GPU acceleration.

## V. CONCLUSIONS

RASPA3 is a highly efficient Monte Carlo code with a particular emphasis on simulations of adsorption in porous materials and thermodynamic properties of fluids. It represents a significant upgrade to state-of-the-art technologies, such as CMake, unit-tests, and Doxygen documentation. The code was re-implemented in modern C++23, resulting in a factor of 3–6 speedup compared to RASPA2 while doing more work like the computation and bookkeeping of d*U*/d*λ*. A further quick speedup of a factor of 1.5–2 can be achieved by multi-threading using OpenMP, but only for relatively large systems. The code is freely available at GitHub and licensed under the MIT license.

Using the RASPA3 code, we have shown that the modern CBMC technique is a significant improvement over the standard conventional unbiased Metropolis MC technique and that recent techniques, such as CFCMC and CB/CFCMC, have overcome the downside of standard MC and CBMC, namely the drop in insertion/deletion rates in open ensembles at low temperatures and high densities. Even though the sampled particle distributions using all four methods look similar, there are significant differences observed at high densities for properties that are computed via fluctuation formulas and for the chemical potential. We recommend the use of CB/CFCMC for adsorption and vapor–liquid equilibrium computations.

## SUPPLEMENTARY MATERIAL

The supplementary material contains simulation results on the adsorption of methane and CO2 in MFI, an equimolar mixture of CO2 and N2 in MFI, and CO2 in Cu-BTC; a Gibbs VLE simulation of methane; and diffusivity results on small gas molecules in IRMOF-1.

## ACKNOWLEDGMENTS

R.Q.S. gratefully acknowledges support from the U.S. Department of Energy, Office of Basic Energy Sciences, Division of Chemical Sciences, Geosciences and Biosciences under Award No. DE-SC0023454. S.R.G.B. acknowledges Grant No. RYC2022-036070-I funded by MICIU/AEI/10.13039/501100011033 and by “ESF+”.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Y. A. Ran**: Investigation (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). **S. Sharma**: Methodology (equal); Software (equal). **S. R. G. Balestra**: Methodology (equal); Software (equal). **Z. Li**: Methodology (equal); Software (equal). **S. Calero**: Methodology (equal); Software (equal); Supervision (equal). **T. J. H Vlugt**: Methodology (equal); Software (equal); Supervision (equal). **R. Q. Snurr**: Methodology (equal); Software (equal); Supervision (equal). **D. Dubbeldam**: Investigation (equal); Methodology (equal); Software (equal); Supervision (equal).

## DATA AVAILABILITY

All our software is freely available from GitHub, and the simulations results are written down in the supplementary material.

## REFERENCES

*Computer Simulation of Liquids*

*Understanding Molecular Simulation*

_{3}(TMA)

_{2}(H

_{2}O)

_{3}]

_{n}

_{2}capture

*n*-alkanes

*Statistical Mechanics: Theory and Molecular Simulations*

*n*algorithm for computing correlation functions