We present the Python-based Molecule Builder for ESPResSo (pyMBE), an open source software application to design custom coarse-grained (CG) models, as well as pre-defined models of polyelectrolytes, peptides, and globular proteins in the Extensible Simulation Package for Research on Soft Matter (ESPResSo). The Python interface of ESPResSo offers a flexible framework, capable of building custom CG models from scratch. As a downside, building CG models from scratch is prone to mistakes, especially for newcomers in the field of CG modeling, or for molecules with complex architectures. The pyMBE module builds CG models in ESPResSo using a hierarchical bottom-up approach, providing a robust tool to automate the setup of CG models and helping new users prevent common mistakes. ESPResSo features the constant pH (cpH) and grand-reaction (G-RxMC) methods, which have been designed to study chemical reaction equilibria in macromolecular systems with many reactive species. However, setting up these methods for systems, which contain several types of reactive groups, is an error-prone task, especially for beginners. The pyMBE module enables the automatic setup of cpH and G-RxMC simulations in ESPResSo, lowering the barrier for newcomers and opening the door to investigate complex systems not studied with these methods yet. To demonstrate some of the applications of pyMBE, we showcase several case studies where we successfully reproduce previously published simulations of charge-regulating peptides and globular proteins in bulk solution and weak polyelectrolytes in dialysis. The pyMBE module is publicly available as a GitHub repository (https://github.com/pyMBE-dev/pyMBE), which includes its source code and various sample and test scripts, including the ones that we used to generate the data presented in this article.

Computer simulations using molecular models are a valuable tool to rationalize experimental observations, validate the applicability of simplified theories, and provide numerical results beyond the capabilities of analytical calculations. The most popular molecular models are atomistic models, in which each atom is represented by an explicit particle. While atomistic models provide a detailed description of a system, they are often limited to nanoscopic scales due to the current computing capabilities. Coarse-graining techniques circumvent this problem by reducing the number of degrees of freedom of a system, at the cost of reducing the model resolution. Coarse-grained (CG) models are often preferred to investigate macromolecular systems, because they allow one to reach longer time and length scales than atomistic models. Despite the popularity of CG models to simulate macromolecular systems, most of the current software packages for molecular modeling are designed for atomistic models. Among software packages that natively support CG models, remarkable examples are the Extensible Simulation Package for Research on Soft Matter (ESPResSo),1,2 ESPResSo++,3 Faunus,4,5 Free Energy and Advanced Sampling Simulation Toolkit (FEASST),6 GROMACS,7 Highly Optimized Object-Oriented Molecular Dynamics blue edition (HOOMD-blue),8 the modular molecular simulation (MOLSIM),9 and the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS).10 

CG models are also well-suited for Monte Carlo methods, which can be used to sample reversible chemical reactions in reactive systems. These reactions are usually not considered in molecular simulations. Nevertheless, they can significantly affect the properties of various macromolecular systems by affecting their ionization states.12–14 Constant pH Monte Carlo15 (cpH) simulations are a popular tool to study pH-sensitive molecules in bulk solution, such as solutions of weak polyelectrolytes,16–18 peptides,19–22 proteins,12,23–26 polyelectrolyte microgels,27–30 and charged nanoparticles.31,32 In contrast, the grand-reaction (G-RxMC) method has been designed to study two-phase systems,33,34 termed “system” and “reservoir.” Macromolecules or nanoparticles are present only in the system and cannot be exchanged with the reservoir. However, small molecules and ions can be exchanged between the system and reservoir. Examples of such two-phase systems include weak polyelectrolyte hydrogels,35–39 coacervates,40,41 brushes,42–47 and solutions of polyelectrolyte chains33 or proteins.48 The cpH and the G-RxMC methods are available in ESPResSo; however, setting them up for systems with many different reaction types often leads to repetitive and lengthy scripts, prone to errors, which may be difficult to identify. Therefore, a tool to automate the setup of these methods is highly desirable.

The first step before performing a computer simulation is building the molecular model in the software. Many molecule builders such as Avogadro,49 the TopoTools plugin for Visual Molecular Dynamics50 (VMD), the Builder tool in PyMOL,51 the link, edit, and parm (LEaP) molecular builder52 from the AmberTools suite,53 the mBuild54,55 tool in Molecular Simulation Design Framework (MoSDeF),56 Python-based Simulation Interface for Molecular Modeling (PySIMM),57 pyPolyBuilder,58 or Atlas59 are available to facilitate the setup of atomistic models in software for molecular simulation. While some of these all-atom molecule builders can be repurposed for CG modeling by defining custom molecular residue types, this procedure is time-consuming and requires detailed knowledge of the internal representation of the residues in the software.

Unlike all-atom models, CG models are not uniquely defined by the molecular structure. Therefore, they are often problem-specific, and providing a universal builder for CG models is not as straightforward as for all-atom models. Consequently, many users design their own scripts to build CG models for a particular simulation software application, which can consume a significant amount of their working time. Building CG models from scratch is susceptible to errors and often leads to repeating common mistakes in the simulation setup, especially by new users of a software application.

To bypass these shortcomings of setting up CG models, there are a growing number of tools to automate the coarse-graining of existing all-atom models: martinize.py,60 Martinize2,61 Auto_Martini,62 MArtini Database (MAD),63 Protein Data Bank → Unified Nanotechnology Format (PDB → UNF) Converter,64 and the Generalized-Ensemble Simulation System (GENESIS) CG tool.65 There are also some special-purpose CG molecule builders, for example, for DNA origami66,67 and lipid membranes.68 However, general-purpose CG molecule builders capable of generating arbitrary polymer topologies are still very scarce. Currently, the only molecule builders with such capabilities are Moltemplate69 and MoSDeF-GOMC.70 Moltemplate is designed to facilitate the setup of input files for LAMMPS, whereas MoSDeF-GOMC is a Python interface for GPU Optimized Monte Carlo (GOMC),71 designed to set up Monte Carlo simulations of CG models of small molecules and crystals in various statistical ensembles.

ESPResSo offers a flexible Python scripting interface capable of building custom CG models from scratch, resulting in a convenient software application to perform computer simulations of CG models. However, building CG models from scratch is an error-prone and tedious task for beginners, especially for molecules with complex structures. In practice, this has prevented some users of ESPResSo from using the software to simulate models with complex architectures or force-fields. Although Moltemplate can be used with a deprecated ESPResSo v3.3.1, none of the above-mentioned CG molecule builders is compatible with the current version of ESPResSo.

In this article, we present the Python-based Molecule Builder for ESPResSo (pyMBE), an open-source software developed as a collaborative project between various research groups modeling weak polyelectrolytes and biomacromolecules, which require rather complex model setups, including acid–base reactions and chemical equilibria in two-phase systems. The pyMBE module is a molecule builder that has been designed to build CG models of macromolecules with different complex architectures, ranging from flexible polyelectrolytes and peptides to globular proteins with a rigid structure. In addition, it facilitates the setup of the cpH and G-RxMC methods in ESPResSo, lowering the barrier to entry for new users of these methods. In Fig. 1, we present the logo of pyMBE with its mascot: a python snake that loves drinking espresso while coding. To demonstrate the applications of the pyMBE module, we showcase examples where it successfully reproduces reference data from previous studies done with ESPResSo and other simulation software. Altogether, pyMBE aims to aid researchers in the field to ensure both reproducibility and reusability of their simulations.

FIG. 1.

Logo of the Python-based Molecule Builder for ESPResSo (pyMBE). The logo has been edited from an image originally generated using Artificial Intelligence powered tools from Canva.11 

FIG. 1.

Logo of the Python-based Molecule Builder for ESPResSo (pyMBE). The logo has been edited from an image originally generated using Artificial Intelligence powered tools from Canva.11 

Close modal

pyMBE is distributed under the GPL-3.0 license, which guarantees its users to freely run, study, share, and modify the software. The pyMBE module has been designed as a toolbox to aid the setup of simulations in ESPResSo. Therefore, the standard use of pyMBE is to be imported as a library together with ESPResSo in Python scripts for molecular simulation. However, the functionalities of pyMBE as a molecule builder for coarse-grained models are not restricted to ESPResSo, and they could be extended to other software for molecular simulation. The pyMBE module is under active development, and new functionalities are developed depending on the needs of the community. Following the “four-eye” principle, every new functionality is reviewed by one or more members of the pyMBE development team before being merged into the main branch of pyMBE. The main branch of pyMBE is automatically periodically tested to reproduce previous reference data from our research groups. pyMBE is an open community, and we welcome new users and developers to join the project and contribute to its development. A stable version of pyMBE is publicly available as a GitHub repository (https://github.com/pyMBE-dev/pyMBE), including the source code, a documentation of the Application Programming Interface (API), a Jupyter notebook tutorial for beginners, and several sample scripts and test scripts. The API documentation is automatically generated from the source code of the library using the Python library pdoc,72 and it can be consulted in the GitHub website of pyMBE (https://pymbe-dev.github.io/pyMBE). Currently, pyMBE uses various Python libraries, listed in  Appendix A. However, we note that pyMBE is a live project, so the readers should consult the project repository for an up-to-date list of dependencies and documentation. We outline in Table I the main features of pyMBE, which we will explain throughout this section.

TABLE I.

Features of pyMBE.

• Molecule builder designed for ESPResSo 
• Build custom polymer molecules using simple blocks 
• Support for coarse-grained models of peptides and proteins 
• Automated setup of cpH and G-RxMC simulations 
• Bookkeeping of the topology of the molecules 
• Molecule builder designed for ESPResSo 
• Build custom polymer molecules using simple blocks 
• Support for coarse-grained models of peptides and proteins 
• Automated setup of cpH and G-RxMC simulations 
• Bookkeeping of the topology of the molecules 

The first step when using pyMBE is to import it as a Python library,

Code snippet 1. Import pyMBE and create an instance of the library.

which enables its use with standard ESPResSo scripts. When creating an instance of the library, it is mandatory to provide a for the various random number generators. pyMBE employs the Pint73 library to facilitate the work with physical quantities and units. When the user creates an instance of pyMBE, it creates an instance of a UnitRegistry object from Pint, which is stored as an attribute of pyMBE. The user can access this attribute to perform operations with units, as conventionally done when using Pint:

Code snippet 2. Defining physical quantities with units.

Here, k_B, T, and kT are pint.Quantity objects, storing information about the magnitude and the units of each variable. When operating with pint.Quantity objects, Pint internally handles unit conversion and it resolves the resulting dimensionality. For example, in Code snippet 2, Pint assigns the unit of Joule to kT when it executes the operation k_B∗T:

Code snippet 3. Printing a pint.Quantity object.

The use of Pint permits an easy and reliable transformation from the international system of units to the reduced system of units used in ESPResSo. By default, pyMBE uses a set of internal units, based on the commonly used set of reduced units in simulations of Lennard-Jones (LJ) systems.74 In this system of units, energies are divided by the thermal energy kBT, distances are divided by the particle diameter, σ, and charges are divided by the elementary charge e. In pyMBE, the internal unit of energy is the thermal energy kBT, and the default value of temperature is T = 298.15 K. The internal unit of charge is the elementary charge e = 1.602 × 10−19 C, and the internal unit of length is σ = 0.355 nm. In addition, the strength of electrostatic interactions is set by the Bjerrum length, lB ≈ 0.71 nm = 2σ, which corresponds to aqueous solutions at T = 298.15 K. This ratio between particle size and Bjerrum length ensures that activity coefficients (excess chemical potentials) of simple electrolyte solutions, modeled as Lennard-Jones spheres in implicit solvent, approximately match the experimental activity coefficients of aqueous solutions of NaCl.33 The same value has been commonly used in other simulations by some of the authors of the current article. The values of various quantities in the system of reduced units used by ESPResSo are then obtained by dividing their values in SI units by the internal units. For example, if the desired box length is L = 3.55 nm, then the value of the reduced box length will be l = 3.55 nm/0.355 nm = 10. The above choice of internal units is by no means universal, and a different choice may be preferred for some purposes. Therefore, user can choose a different set of internal units in pyMBE and print the active set of reduced units in the library:

Code snippet 4. Setting up a custom set of internal units and printing it to screen.

By default, there is no pre-defined internal unit of mass in pyMBE, because this choice does not influence the equilibrium properties of the system in the absence of an external gravitational field, which is a common case in molecular modeling. The conversion between the system of reduced units used by ESPResSo and the Système international (SI) units in which the users usually obtain their input parameters is a common problem, which leads to mistakes in the simulation setup. The pyMBE module alleviates this issue by allowing the user to provide the inputs in SI units and internally ensures that the conversion to reduced units is done correctly.

1. Particles

The pyMBE module uses a hierarchical object-oriented set of fundamental building blocks: particles, residues, and molecules, as schematically shown in Fig. 2. This hierarchical approach is partly inspired by the architecture used in the MOLSIM software.9 The smallest building block in pyMBE is called a particle. It can represent an atom or a coarse-grained group of atoms. Users can use pyMBE to define the properties of various particle types:

FIG. 2.

Schematic representation of the essential building blocks used in pyMBE: particles, residues, and molecules. (a) A particle is the smallest building block in the pyMBE module, representing an atom or a coarse-grained group of atoms. (b) Residues consist of a central particle, referred to as a central bead, with one or multiple side chains bonded to it. (c) Molecules are built as a linear sequence of residues with an arbitrary user-defined composition.

FIG. 2.

Schematic representation of the essential building blocks used in pyMBE: particles, residues, and molecules. (a) A particle is the smallest building block in the pyMBE module, representing an atom or a coarse-grained group of atoms. (b) Residues consist of a central particle, referred to as a central bead, with one or multiple side chains bonded to it. (c) Molecules are built as a linear sequence of residues with an arbitrary user-defined composition.

Close modal

Code snippet 5. Defining particle properties.

The only required argument is used as an identifier by which the user later refers to this particle type when building residues and molecules, setting up their interactions and chemical reactions. All other arguments are optional. The arguments and are used in the setup of pairwise Lennard-Jones (LJ) interactions. In addition, the user can provide the arguments and that allow further tuning of the LJ interactions. We provide a detailed explanation of these LJ-related parameters in Sec. II E. The optional argument corresponds to the charge number of the particle and thus sets a permanent charge q = ze of a particle. Alternatively, the user can specify of the particle, which can be or . In such a case, the particle may have two states: one neutral and one with z = −1 if it is acidic or z = +1 if it is basic. When setting the , the user must also specify the value of the acidity constant, . The default value of the is None, which means that the particle has only one state with a permanent charge q. The and are used when setting up the constant-pH or the grand-reaction method, as explained in Sec. II C.

2. Residues

Residues in pyMBE are composed of one or multiple particles, and they are used as monomeric units to build molecules. A residue is defined as a central particle to which other particles are connected, as shown in Fig. 2(b):

Code snippet 6. Defining simple residues.

More complex residues can be created by using multiple particles or residues as side chains, also shown in Fig. 2(b):

Code snippet 7. Defining complex residues.

Here, is the identifier of the residue. The should be the name of a particle, and should be a list of particles or residues defined in pyMBE. The names listed in determine which objects will be bonded to . These names must correspond to particle or residue objects that have been previously defined. If the user provides the name of a residue object, pyMBE defines that the of that residue will be bonded to the of the defined residue.

3. Custom molecules

Custom chain molecules are built as sequences of residues, as shown in Fig. 2(c):

Code snippet 8. Defining a molecule.

Here, is the identifier of the molecule, composed of residues specified in . The sequence of residues in the list defines the order in which they are connected to form the molecule. Chain molecules with multiple repeating units of the same type can be conveniently created with pyMBE using the built-in Python functionality for creating lists with repeating elements, for example (cf. Fig. 3):

FIG. 3.

Schematic representation of various chain molecules that can be created using pyMBE. (a) A polyacid. (b) An alternating polyampholyte copolymer. (c) A diblock polyampholyte copolymer.

FIG. 3.

Schematic representation of various chain molecules that can be created using pyMBE. (a) A polyacid. (b) An alternating polyampholyte copolymer. (c) A diblock polyampholyte copolymer.

Close modal

Code snippet 9. Defining various chain architectures.

In principle, pyMBE could also be used to create more complex branched molecules; however, it has been primarily designed to create linear chain molecules with short side chains. A generalization to conveniently create molecules with arbitrary architectures is planned for the future.

4. Loading parameters from a file

As an alternative to defining each particle and residue in the Python script, pyMBE also features a helper function to load these parameters from a file:

Code snippet 10. Loading interaction parameters.

where contains a path to the local file, which should be structured as a JavaScript Object Notation (JSON) dictionary:

Code snippet 11. General parameterization in JSON format.

More complete examples of such files can be found in the pyMBE repository, folder “parameters/peptides”. A similar helper function is available to load the and values for a set of particles

Code snippet 12. Loading acid–base properties.

where the input file has the following structure:

Code snippet 13. Acid–base properties in JSON format.

In the pyMBE repository, we provide some specific sets of pKA values for amino acid residues, based on well-established compilations in the literature: Bienkiewicz and Lumb,75, CRC Handbook of Chemistry and Physics,76 Hass and Mulder,77 Nozaki and Tanford,78 Platzer et al.,79 and Thurlkill et al.80 These pre-defined sets can be found in the pyMBE repository in the folder “parameters/pka_sets”. In addition to the above, some less common sets of pKA values are included in the repository. These sets refer to studies that we used for benchmarking the results produced by pyMBE. They are required for internal purposes, namely for integration and system testing of pyMBE within the continuous integration framework.

5. Pre-defined molecules

In addition to building custom molecules defined by the user, the pyMBE module also provides specific functions to build pre-defined coarse-grained models of flexible peptides and rigid proteins. To begin with, it is necessary to load a set of particle parameters and pKA values for amino acids from the pyMBE repository, as described in Sec. II B 4. Then, peptides can be defined in a way similar to custom molecules:

Code snippet 14. Defining a peptide.

where is again the identifier of the peptide molecule. The argument uses the standard one-letter or three-letter codes for amino acids.81 For example, valid inputs for the peptide Cys2His3 are , or . The small letters and in the peptide sequence specify that an additional residue should be added at the end of the molecule, corresponding to the amine group at the N-terminus and carboxylate group at the C-terminus of the peptide. If these are not explicitly specified, then the C-terminal and N-terminal groups are not added. The function internally defines a residue object in pyMBE for each type of amino acid in . The argument defines how individual amino acids are represented as residues. In a one-bead CG model (cf. Fig. 4(a)), , each amino acid residue is represented by a single particle. In a two-bead CG model (cf. Fig. 4(b)), , each amino acid residue is represented by two particles. These particles are internally identified in pyMBE as for the central bead (same for all residue types) and a side-chain particle with the same given by the corresponding one-letter code of the amino acid. Exceptions to this rule are glycine and the and end group residues, which are always represented by a single particle, named by the respective one-letter code. Although uses a nomenclature similar to the FASTA standard, pyMBE supports other non-standard amino acids beyond the FASTA alphabet with custom parameters provided by the user. Naturally, pyMBE is not restricted to using pre-defined parameters from the repository, but the user is free to define custom parameters of the amino acid particles.

FIG. 4.

Schematic representation of the peptide Cys2His3 using two pre-defined coarse-grained models: a one-bead representation (a) and a two-bead representation (b).

FIG. 4.

Schematic representation of the peptide Cys2His3 using two pre-defined coarse-grained models: a one-bead representation (a) and a two-bead representation (b).

Close modal

Unlike flexible peptides, coarse-grained models of globular proteins typically use protein structures from the Protein Data Bank (PDB), which are then assumed to be fixed throughout the simulation. The first step when setting up a globular protein using the pyMBE module is to load its topology. Currently, the easiest way to load a CG model of a globular protein into pyMBE is to provide a VTF file (VTF Trajectory Format82) with the coordinates of the beads and the information about the protein topology. To help creating such a VTF file, we provide a supporting script in the pyMBE repository, lib/create_cg_from_pdb.py. This script creates a VTF file with the coarse-grained model of a protein from its corresponding PDB file

Code snippet 15. Using the supporting script to create a pre-defined CG model of a globular protein from a PDB file.

The script supports two options to read a PDB file of a protein: (1) from a local PDB file using the argument --filename or (2) by downloading the PDB file from the database, using the argument --download_pdb and providing the PDB code. Optionally, the user can select a sub-chain of the protein by providing the argument --chain_id.

From the PDB coordinates, the script creates a CG model consisting of either one bead per amino acid (--model 1beadAA) or two beads per amino acid (--model 2beadAA), based on the same logic as explained for peptides. For the two-bead model, the script uses the same procedure as described by Torres et al.25,26 One backbone bead is placed at the position of the α-carbon of the amino acid with a radius equal to the radius of the alpha carbon. A second bead is placed at the center of mass of the side chain of the amino acid residue with a radius equal to its radius of gyration. For the one-bead model, one bead is placed in the center of mass of each amino acid residue. We chose this particular representation in order to enable validation of the pyMBE module against already published simulation results. In the future, we plan to make this procedure more general, allowing the user to define a custom coarse-graining approach.

The script creates a VTF file with all coordinates and radii of the beads. We provide examples of VTF files in the pyMBE repository in the folder “/parameters/globular_proteins/.” To read such files, the pyMBE module provides a function:

Code snippet 16. Reading a protein structure from a VTF file.

The function returns a dictionary (topology), which can be used as an input to another helper function that defines both protein particles and residues in pyMBE:

Code snippet 17. Defining a protein.

Internally, this function defines one residue and one or two particles for all different amino acids in the protein, but it does not define particles and residues for those amino acids that are not present in the protein. In Fig. 5, we depict an example of a coarse-grained model of α-lactalbumin (PDB code 1F6S83) built using pyMBE, consisting of a rigid object with two beads per amino acid ().

FIG. 5.

Snapshot of the coarse-grained model of α-lactalbumin (PDB code 1F6S83) using two beads per amino acid. An additional particle is placed in the interior of the protein, representing the Ca2+ ion caged in the protein structure. The particles are colored as follows: acidic bead (red), basic bead (blue), inert side chain bead (green), inert backbone bead (gray), and Ca2+ ion (brown). The size of the beads has been scaled by an arbitrary value for visualization purposes.

FIG. 5.

Snapshot of the coarse-grained model of α-lactalbumin (PDB code 1F6S83) using two beads per amino acid. An additional particle is placed in the interior of the protein, representing the Ca2+ ion caged in the protein structure. The particles are colored as follows: acidic bead (red), basic bead (blue), inert side chain bead (green), inert backbone bead (gray), and Ca2+ ion (brown). The size of the beads has been scaled by an arbitrary value for visualization purposes.

Close modal

The pyMBE module supports the automated configuration of two Monte Carlo methods to sample reversible acid–base reactions: the constant-pH method15 for the simulation of acid–base equilibria in a single phase and the grand-reaction method,33,34 specifically designed for two-phase systems. In Secs. II C 1 and II C 2, we give a short introduction to these methods and their usage in pyMBE. Later, we demonstrate the use of the constant-pH method in our benchmarks for charge-regulating peptides (Sec. III B) and globular proteins (Sec. III C) in bulk solution. In Sec. III D, we showcase the use of the G-RxMC method that we use to benchmark weak polyelectrolytes under dialysis.

1. The constant-pH method (cpH)

The constant-pH (cpH) method was originally developed by Reed and Reed15 to simulate the pH response of weak polyelectrolyte chains at a given pH in a buffer solution, as shown schematically in Fig. 6(a). In the cpH method, the pH of the buffer solution is only considered implicitly and it is a direct input parameter of the method. The method uses Monte Carlo moves to sample different ionization states of weak acid and/or base groups according to the following chemical equations:
(1)
(2)
with the protonated state of an acid (HA) and a base (HB+) and the corresponding deprotonated states A and B. X+ is a generic neutralizing counterion whose chemical identity could correspond to any monovalent cation (H+, Na+, K+, etc.). Since in the cpH method, the pH is only considered implicitly, it is decoupled from the activity of the explicit X+ ions. This feature of the algorithm limits the applicability of the cpH method to pH values where the ionic screening is dominated by the added salt rather than by H+ or OH ions.13,84,85 In the cpH method, trial moves consist of changing the ionization state of a charge-regulating particle and randomly inserting/deleting a neutralizing counterion according to Eq. (1) or Eq. (2). These trial moves are accepted with a probability given by15,
(3)
Here, β = 1/kBT is the inverse thermal energy, ξ is the extent of reaction (ξ = +1 for deprotonation and ξ = −1 for protonation), and ΔU is the change in potential energy between the old state and the proposed new state. pH = log10(aH) and pKA = log10(Ka), where aH is the activity of proton and Ka is the acid dissociation equilibrium constant. One of the outputs of cpH simulations is the average degree of protonation of each acidic and basic group. Notably, the above implementation produces results, which are systematically shifted on the pH scale with respect from the correct result, as explained in Refs. 85 and 86. In many situations, this deviation is small enough to be neglected. Nevertheless, it should be considered when presenting results obtained using the method.
FIG. 6.

Schematic representation of weak polyelectrolyte systems (here represented by weak polyacids with monomers HA) in different statistical ensembles. (a) In the constant-pH-ensemble (cpH), a single-phase system at a given buffer pH and salt concentration is considered. (b) In the grand-reaction method (G-RxMC), a two-phase system at a fixed reservoir composition is considered. The polyelectrolyte chains cannot leave the “system” phase, while small ions can be partitioned between the “system” and the “reservoir,” leading to a Donnan equilibrium.

FIG. 6.

Schematic representation of weak polyelectrolyte systems (here represented by weak polyacids with monomers HA) in different statistical ensembles. (a) In the constant-pH-ensemble (cpH), a single-phase system at a given buffer pH and salt concentration is considered. (b) In the grand-reaction method (G-RxMC), a two-phase system at a fixed reservoir composition is considered. The polyelectrolyte chains cannot leave the “system” phase, while small ions can be partitioned between the “system” and the “reservoir,” leading to a Donnan equilibrium.

Close modal

The pyMBE module automates the setup of the cpH method in ESPResSo for all particles that the user defined as having either an or a :

Code snippet 18. Setting up the cpH method.

As inputs, one has to provide the identifier of the generic neutralizing counterion X+ () and the pH at which the system is simulated (). The pKA-values of each type of acidic and basic particles must have been previously defined for each particle, as explained in Sec. II B. It returns an instance of the reaction_methods.ConstantpHEnsemble object of the ESPResSo library (cpH) and a list containing the of the particles for which reactions have been set up (labels).

2. The grand-reaction method (G-RxMC)

In contrast to the cpH method, the G-RxMC method developed by Landsgesell et al.33,34 was conceived to study the behavior of two-phase systems, as shown schematically in Fig. 6(b). In these systems, the charge-regulating macromolecules are confined to one phase, which we term the “system.” This confinement can be due to covalent cross-linking in the case of hydrogels,38,39 due to a semi-permeable membrane in the case of dialysis,33 or due to spontaneous phase separation in the case of polyelectrolyte complexes.41,87 While the macromolecules are confined to the system phase, small ions (H+, Na+, OH, Cl) can freely partition between the system and the other phase. This other phase is termed the “reservoir,” and it contains only small ions and other solutes that can be exchanged with the system. The macroscopic constraint that each of the phases has to be electroneutral leads to an uneven partitioning of small ions between the system and the reservoir, the so-called Donnan equilibrium.33 

The G-RxMC method was designed to mimic a setting where the reservoir is much larger than the system and thus the reservoir properties are not affected by the partitioning of ions. This allows one to study the properties of the system at a given fixed composition of the reservoir. By construction, the reservoir composition in the grand-reaction method is specified using linear combinations of the chemical potentials of the exchangeable species (ions). The coupling of the system to the reservoir can be formally represented by the following virtual chemical reactions:33,
(4)
(5)
(6)
(7)
where ∅ denotes an empty set. The above reactions effectively result in the insertion or deletion of the respective ion pairs in the system, conserving the overall electroneutrality. Furthermore, the rules for the acceptance probability of these insertions ensure that the chemical potential of each ion pair is the same as in the reservoir.33 It can be shown that these insertions are equivalent to a grand-canonical simulation. If only the exchange of salt ions with the reservoir is desired, the function pmb.setup_gcmc allows us to set up a pure grand-canonical Monte Carlo (GCMC) simulation. However, if there are also acid–base reactions taking place in the system,
(8)
(9)
then one needs to set up the G-RxMC method instead. The acceptance probability for Eqs. (4)(9) is given by33,88,89
(10)
where Kc is the concentration-based equilibrium constant of the considered reaction, V denotes the box volume, NA is the Avogadro constant, Ni the number of particles of type i, νi is their stoichiometric coefficient, and ν̄=iνi.

Notably, the numerical value of Kc depends on the chosen reference concentration, which in the system of reduced units of ESPResSo corresponds to (cf. Code snippet 4). Therefore, the user needs to convert the equilibrium constants from the literature, based on the reference concentration c = 1 mol/kg ≈ 1 mol/l, to the system of reduced units used in ESPResSo.13 This conversion often results in errors, which can be avoided by the tools provided in the pyMBE module. Furthermore, there is a complicated relation between the desired reservoir composition (pH value and salt concentration) and the equilibrium constants required as inputs, as explained in  Appendix C. It can be determined numerically either by running a set of auxiliary simulations of the reservoir33,90 or by using an approximate theoretical model for the excess chemical potential of a salt solution, such as the Debye–Hückel limiting law. Only then, the user can run G-RxMC simulations at a given salt concentration and a given pH of the reservoir. pyMBE currently only supports setting up a reservoir containing a monovalent salt. In principle, the G-RxMC method can also accommodate multivalent salts, but the setup becomes more complicated in this case.90 

The pyMBE module provides a convenient and user-friendly way of specifying the reservoir composition and the corresponding equilibrium constants. In pyMBE, the user specifies the desired pH in the reservoir () and the concentration of added monovalent salt (e.g., NaCl) in the reservoir (c_salt_red) in the following way:

Code snippet 19. Setting up the G-RxMC method.

In addition to the concentration of added salt and pH in the reservoir, this function takes as required arguments the names of the various small ions (, etc.). The argument is a function that calculates the activity coefficient of an ion pair in a solution of ions as a function of the ionic strength. For example, using the Debye–Hückel limiting law for the activity coefficient in an aqueous solution under standard conditions,91 a possible implementation of the function would look like this:

Code snippet 20. Calculating the activity coefficient.

Alternatively, one can use a more sophisticated model such as the Davies equation91 or data of the excess chemical potential from simulations, e.g., obtained using the method of Widom particle insertion.92 Given the above inputs, pyMBE internally solves a coupled system of nonlinear equations, relating the required equilibrium constants to and , until a self-consistent solution is reached (more details are provided in  Appendix C). When solving these equations, pyMBE assumes that the solution contains only monovalent salt and that the pH was adjusted to the desired value by adding a strong base (NaOH) or a strong acid (HCl) to the salt solution. Eventually, it uses the determined equilibrium constants to set up the full set of reactions required for the G-RxMC simulation. Overall, the function greatly simplifies the setup of G-RxMC simulations, helping the user to automate multiple non-trivial steps that are susceptible to errors.

Instead of distinguishing between different ion types of the same charge (which differ only in their label but not with regards to their interactions in the coarse-grained representation), one can also formulate the G-RxMC method in terms of unified ion types.93 To set up the G-RxMC method using the implementation in terms of unified ion types, the function is employed in fashion analogous to .

Before creating all the molecules in ESPResSo, the user needs to provide some additional information about the desired interactions. First, parameters of the bonding potentials need to be provided using the following function:

Code snippet 21. Defining bonds.

Here, is a string identifying the bond type. Currently, pyMBE supports the harmonic potential () and the Finite Extension Non-linear Elastic potential () as model interactions for bonding the particles.1,2 is a dictionary with the values and the keys of the parameters for the potential energy of the bond. For consistency, the variable names of the parameters of each type of bond are the same as the ones in ESPResSo. is a list with a set of pairs of particle names, which identifies types of particles to be bonded by pyMBE with the defined bond type. All bonds defined in pyMBE are stored in the pyMBE dataframe, as explained in  Appendix B.

The user can add all bonds defined in the pyMBE dataframe in the ESPResSo system using the helper function

Code snippet 22. Adding bonds to ESPResSo.

Here, is an instance of an ESPResSo system from the ESPResSo library.

The pyMBE module has another helper function that automatically sets up Lennard-Jones (LJ) interactions between the particles using the following pairwise potential:
(11)
where r is the distance between the interacting particles i and j, σ determines the effective range of the repulsion, ɛ is the depth of the attractive well, rcut is the distance at which the potential is cut off to 0, and roff is an offset that shifts the values of the potential. The constant C=4ε((σ/rcut)12(σ/rcut)6) is added to make the potential continuous at rcut. The augmented LJ potential presented in Sec. II D reduces to the classical LJ potential with roff = 0 and rcut. The user can set up the LJ potential between all particles that have been previously defined in pyMBE using the helper function

Code snippet 23. Setting up the Lennard-Jones interactions.

The values of the LJ parameters σ, ɛ, rcut, and roff for each particle pair are calculated according to the Lorentz–Berthelot combining rules94,95 using the corresponding , , , and parameters provided by the user when defining each particle type in pyMBE (cf. Code snippet 5). By default, pyMBE sets rcut=26σ and roff = 0, corresponding to a purely repulsive LJ potential, also known as the Weeks–Chandler–Andersen potential. If the user defines a particle with = 0, pyMBE will not set up LJ interaction between that particle type and any other particle, corresponding to the case of an ideally behaving particle. If the user has not defined either the value of or of a particle type (cf. Code snippet 5), pyMBE will not set up any LJ interaction between particles of that type and the rest of particles.

After defining particles, residues, molecules, chemical reactions, and interactions, everything is ready to create the particle objects in ESPResSo. However, to make it all work, the above steps have to be executed in a particular order. First, one should define all particles, residues, and molecules:

Code snippet 24. Defining the properties of the different types of particles, residues, and molecules in the model.

In the next step, one defines the interactions and chemical reactions, which requires that the corresponding particles and residues have been previously defined in pyMBE:

Code snippet 25. Defining the interactions in the model and setting up chemical reactions.

Note that if one defines new particles after setting up LJ interactions, then these new particles will not be assigned any interactions, which will probably lead to an undesired behavior. In contrast, activating electrostatic interactions should be done only after creating all objects in ESPResSo and relaxing the initial simulation setup.

After completing the above steps, pyMBE is ready to create these objects in the ESPResSo system. For example, if has been defined (cf. Code snippet 9), it can be created using the following helper function:

Code snippet 26. Creating a pyMBE object in ESPResSo.

In the above code snippet, is the pyMBE string identifier of the previously defined pyMBE object, which will be created in ESPResSo. In practice, it means that the corresponding particles are created in the ESPResSo system and bonds are assigned to the specific particle pairs. The parameter is the number of individual objects to be created. Currently, the above function can create the following types of pyMBE objects: particles, residues, molecules, and peptides. By default, these objects will be created in the ESPResSo system at a random position in the simulation box. By providing an optional argument , the user can specify the position of the first particle of a given object. For simplicity, the initial state of flexible molecules is fully stretched. For acidic and basic particles, whose charges fluctuate during the simulation, the initial charge state is set to match that of the protonated form, i.e., q = 0 for an acid and q = +1e for a base. In principle, the choice of the initial state is arbitrary and its impact on the simulated system should vanish during equilibration of the system. When creating these objects, pyMBE assigns a specific identifier to each particle, residue, and molecule created in the ESPResSo system. This identifier can be accessed via the pyMBE dataframe, as described in Sec. II F.

For the purpose of creating rigid models of globular proteins in ESPResSo, the pyMBE module uses a different function. Once a protein topology has been loaded to pyMBE (cf. Code snippet 17), it can be created in the ESPResSo system as follows:

Code snippet 27. Creating a protein in ESPResSo.

Here, is the identifier of the protein object previously defined in pyMBE, is an instance of an ESPResSo system from the ESPResSo library, and is the number of proteins of that type to be created in es_system. By default, the proteins are created at random positions. Similar to the previous case, pyMBE assigns a unique identifier to each created particle, residue, and protein, and this identifier can be accessed via the pyMBE dataframe, as described in Sec. II F.

When creating proteins in ESPResSo, pyMBE uses the rigid object framework of ESPResSo to ensure that the protein as a whole remains rigid, i.e., it does not deform. In addition, pyMBE fixes the position of all particles belonging to the proteins. This setup is convenient for simulations with a single protein in the simulation box, for which the module features the option to center the protein in the simulation box:

Code snippet 28. Centering a molecule.

Here, is the numeric identifier within the pyMBE dataframe of the protein to be centered in the simulation box. Although this function was designed for this particular application, its implementation is general and it can be used with any molecule defined using the pyMBE module. To set up simulations having multiple proteins in the simulation box, pyMBE permits us to enable the motion of the rigid object as a whole through the simulation box:

Code snippet 29. Enabling motion.

where is the identifier of the type of proteins whose motion should be enabled.

The pyMBE module internally uses a Pandas dataframe96,97 to store all information about the particles, residues, molecules, bonds, and interactions defined by the user. It also stores all information about additional parameters that have been defined automatically by calling various helper functions. The dataframe is stored as an attribute of the pyMBE library, and it can be accessed as follows:

Code snippet 30. Example of how to access the pyMBE dataframe to print it.

The pyMBE dataframe can be operated on as a standard Pandas dataframe. It allows the user to easily access all parameters defined in the pyMBE module. To facilitate this access, the user can filter the dataframe by specifying the pyMBE object of interest in the argument :

Code snippet 31. Example of how to filter the pyMBE dataframe by particle.

The argument supports pyMBE objects of the following types: , , , and . In Table II, we show an example of the output of this function when filtering by = . In this example, each column corresponds to a parameter of a pyMBE particle object and each row to a different particle in the system. Additional examples of subsets of the pyMBE dataframe can be found in  Appendix B.

TABLE II.

Example of a subset of the pyMBE dataframe, filtered by = .

state_onestate_two
namepmb_typeparticle_idresidue_idmolecule_idaciditypKAsigma (nm)epsilonlabel es_type zlabel es_type z
particle NaN NaN 0.3 1 reduced_energy NaN NaN NaN 
particle acidic 4.0 0.4 1 reduced_energy AH −1 
particle basic 9.0 0.5 1 reduced_energy BH +1 
state_onestate_two
namepmb_typeparticle_idresidue_idmolecule_idaciditypKAsigma (nm)epsilonlabel es_type zlabel es_type z
particle NaN NaN 0.3 1 reduced_energy NaN NaN NaN 
particle acidic 4.0 0.4 1 reduced_energy AH −1 
particle basic 9.0 0.5 1 reduced_energy BH +1 

Internally, the pyMBE module assigns a numerical identifier to each individual particle (), residue (), or molecule () object that pyMBE creates in the ESPResSo system. These identifiers are stored in the pyMBE dataframe in which pyMBE creates one row per each individual object. They permit us to map which specific particles belong to specific residues and molecules, allowing us to trace back the exact topology of the system. The identifiers and store information about the possible ionization states of the particles. Each state has a secondary header with the following indexes: , , and . The string-like identifier labels a particular ionization state of the particle. The numerical identifier stores the particle type used in ESPResSo for that particular type of pyMBE particle. The parameter stores the valency of a particle in a given state. Inert particles are defined with only one state () whose label matches with its used by the pyMBE module. For acidic and basic particles, corresponds to the protonated state and corresponds to the deprotonated state. The pyMBE module labels with the of the particle with an extra “H” character at the end, while is labeled using the same label as in the of the particle. In addition, pyMBE assigns the charge of each state using the of the particle. For an acidic particle, z is set to 0 in (corresponding to a protonated monoprotic acid) and z is set to −1e in (corresponding to a deprotonated monoprotic acid). For a basic particle, is set to +1e in (corresponding to a protonated monoprotic base) and z is set to 0 in (corresponding to a deprotonated monoprotic base).

The user can store the pyMBE dataframe in a local file:

Code snippet 32. Locally storing the pyMBE dataframe.

This function uses the Pandas native function pandas_df.to_csv() to save the pyMBE dataframe into a file in the comma-separated values (CSV) format. Due to the multi-index organization of the pyMBE dataframe, the user cannot directly use the Pandas native function to load files in CSV files. However, the pyMBE module provides a helper function to read the information from a pyMBE dataframe stored in a local CSV file:

Code snippet 33. Reading an existing pyMBE dataframe.

where is the path to the local CSV file with the pyMBE dataframe.

Here, we show a set of benchmark results, where we used pyMBE to reproduce the results from some of our previous publications. Some of these previous results were obtained using ESPResSo, while others were obtained using different software packages. Nonetheless, their common feature is that they all calculated the net charge of various molecules containing weakly acidic or basic groups. Before describing the results, which we used as benchmarks, we describe some additional helper functions, provided by pyMBE to facilitate the calculation of acid–base properties and net charge of molecules in complex simulation setups.

As a reference to evaluate the effect of interactions, we use the Henderson–Hasselbalch (HH) equation, which is the analytical solution of the acid–base chemical equilibrium (Eqs. (1) and (2)) under ideal conditions, i.e., in the absence of interactions. Under this assumption, the degree of ionization, αi, of a monoprotic acid or base i is
(12)
where KAi=10pKAi is the acidity constant and pH=logaH+ is defined in terms of the proton activity aH+. The charge number of the group i in the ionized state is denoted by zi, i.e., zi = +1 for a base and zi = −1 for an acid. The average net charge Q of a macromolecule with N ionizable groups can be calculated by summing over the average degrees of ionization of all its ionizable groups,
(13)
The pyMBE module provides a function that calculates the ideal net charge of any molecule defined with a given :

Code snippet 34. Calculating the ideal net charge of a molecule.

Here, is the list of pH-values at which the Henderson–Hasselbalch equation is evaluated. This function is particularly handy for molecules with many different acidic and basic groups, for example peptides and proteins. A comparison of simulation results with the above ideal theory serves as an important metric for quantifying the effects of interactions on the ionization behavior of macromolecules.

The deviations from the ideal theory, observed in simulations of isolated macromolecules in bulk solution, are predominantly caused by electrostatic interactions between various ionizable groups within the molecule, which has been termed the “polyelectrolyte effect.”33 In two-phase systems in which macromolecules are confined in one phase (cf. Fig. 6(b)), there is an additional contribution to deviations from the ideal behavior, which has been termed the “Donnan effect.”33 This name refers to the Donnan potential ψDon, which is the electrostatic potential difference between the two phases due to the uneven partitioning of the small ions. The Donnan potential leads to a difference in pH between the system (pHsys) and the reservoir (pHres),33,38
(14)
Both the Donnan effect and polyelectrolyte effect contribute to the shift of the ionization curves in two-phase systems, whereas only the polyelectrolyte effect is present in single-phase systems (bulk solutions).

The pyMBE module provides a function , which allows the user to calculate the charge of the molecules including the Donnan effect in the ideal gas limit, i.e., assuming that the acid–base equilibrium can be calculated using the Henderson–Hasselbalch equation:

Code snippet 35. Calculating the ideal net charge accounting for the Donnan effect.

In contrast to the function , one needs to provide a dictionary containing the concentration all charged molecules in the system as well as the salt concentration in the reservoir . For example, the setup in Code snippet 35 corresponds to a system with two types of peptides: pep_1 and pep_2. This additional information is required to correctly calculate the Donnan partitioning, which depends on the ionic strength of the reservoir and the concentration of charges in the system that cannot be exchanged with the reservoir. The performed calculation consists of solving a system of coupled non-linear equations involving the degrees of ionization according to the Henderson–Hasselbalch equations for the various ionizable particles in the system and partition coefficients of small ions according to the ideal Donnan theory (see  Appendix D). The function returns a dictionary containing the calculated charges of the various species, a list of the pH-values in the system pHsys, and a list of the partition coefficients of monovalent cations.

Finally, pyMBE provides a function , which enables the calculation of the instantaneous net charge of a given molecule type, existing in the simulation box of ESPResSo. If more molecules of such type exist in the simulation box, then net charge is averaged over all molecules of the same type:

Code snippet 36. Calculating the net charge of a molecule.

The returned dictionary contains the mean net charge per molecule, as well as a dictionary containing the mean net charge of the individual residues. All three helper functions described above can be conveniently used in post-processing of the benchmark results described in Secs. III B–III D.

As a benchmark of the pyMBE module, we calculated the net charge of various peptides as a function of pH. The first two peptides, Lys5Asp5 and Glu5His5, each contain a block of five acidic and five basic amino acids. These artificial amino acid sequences have been custom synthesized and used by Lunkad et al.98 to validate the net charge predicted by their coarse-grained model against experimental measurements. These peptide sequences have been selected to exhibit a rather simple response to a change of the pH. The amino acids in Lys5Asp5 have a rather big difference in pKA values. Therefore, the corresponding ionization curve contains two distinct inflection points. In contrast, the amino acids in Glu5His5 have a much smaller difference in pKA values, resulting in an almost linear decrease in net charge in a broad range of pH values. The model used by Lunkad et al.98 consisted of two beads per amino acid. Their results were obtained using the constant pH (cpH) ensemble implemented in ESPResSo.

As explained in Sec. II B 5, the pyMBE module allows us to easily set up two-bead models of peptides in ESPResSo, simply by specifying the amino acid sequence and selecting the desired set of model parameters. In Fig. 7(a) and Fig. 7(b), we show that the simulations with the two-bead model created using pyMBE quantitatively reproduce the results of Lunkad et al.98 

FIG. 7.

Net charge Q of various peptides as a function of pH. Panels (a) and (b): titration curve of the synthetic peptides Lys5Asp5 (a) and Glu5His5 (b) obtained with the constant pH Monte Carlo (cpH) simulation using ESPResSo by Lunkad et al. (blue markers).98 Panel (c): titration curve of histatin-5 computed by the cpH simulation using MOLSIM by Blanco et al. (green markers).19 The data from these references can be easily reproduced using the pyMBE module to set up these systems in ESPResSo (orange markers). The black lines follow the Henderson–Hasselbalch (HH) equation (Eqs. (12) and (13)) using different sets of pKA-values: the Chemical Rubber Company (CRC) Handbook of Chemistry and Physics76 (a) and (b) and Nozaki and Tanford78 (c).

FIG. 7.

Net charge Q of various peptides as a function of pH. Panels (a) and (b): titration curve of the synthetic peptides Lys5Asp5 (a) and Glu5His5 (b) obtained with the constant pH Monte Carlo (cpH) simulation using ESPResSo by Lunkad et al. (blue markers).98 Panel (c): titration curve of histatin-5 computed by the cpH simulation using MOLSIM by Blanco et al. (green markers).19 The data from these references can be easily reproduced using the pyMBE module to set up these systems in ESPResSo (orange markers). The black lines follow the Henderson–Hasselbalch (HH) equation (Eqs. (12) and (13)) using different sets of pKA-values: the Chemical Rubber Company (CRC) Handbook of Chemistry and Physics76 (a) and (b) and Nozaki and Tanford78 (c).

Close modal

The third peptide is histatin-5, a natural peptide present in human saliva with antifungal and antibacterial properties.99–101 Unlike the previous two, this natural peptide contains a more complex combination of acidic and basic groups. Specifically, it contains 15 basic groups of four different types: N-terminal amine, lysine, histidine, arginine; and five acidic groups of four different types: C-terminal carboxylic acid, tyrosine, aspartic acid, and glutamic acid. Therefore, the net charge of this protein as a function of pH exhibits a more complex trend. This peptide was simulated by Blanco et al.19 using a one-bead model and constant-pH ensemble simulations performed in MOLSIM. In Fig. 7(c) , we show that our simulation in the cpH ensemble with the one-bead model created using pyMBE reproduced the original results for histatin-5 from Ref. 19.

In summary, the above results demonstrate the use of pyMBE for setting up constant-pH simulations of flexible peptides, using the amino acid sequence and pre-defined sets of pKA values as inputs of the simulations. Importantly, the use of pyMBE dramatically simplifies setting up of these simulations in ESPResSo, making it use more user-friendly and less prone to errors.

Simulations of rigid globular proteins represent another example of the advantages of using pyMBE to create complex molecules, composed of many sub-units of different types and acidity. To demonstrate this, we used two globular proteins most abundant in milk whey: α-lactalbumin and β-lactoglobulin. The acid–base properties of these proteins were simulated by Torres et al.25,26 They were represented as rigid objects with two beads per amino acid residue, using the crystallographic structures available in the Protein Data Bank (PDB). The original simulations were performed using an in-house Monte Carlo software application developed in their group.25,26 We used the pyMBE functions for creating the models of globular proteins from their PDB structures, as described in Sec. II B 5. In the case of α-lactalbumin, an additional charge of +2e is included to account for the Ca2+ ion caged in the protein structure. In Fig. 8, we show that constant-pH simulations in ESPResSo reproduced the net charge Z as a function of pH, reported in the original study by Torres et al.,25,26 obtained using their own Monte Carlo software.

FIG. 8.

Net charge Q of α-lactalbumin [panel (a)] and β-lactoglobulin [panel (b)] as a function of the pH. The pyMBE module has been used to set up a coarse-grained model of each protein from their PDB crystallographic data: α-lactalbumin (PDB code: 1F6S83) and β-lactoglobulin (PDB code: 1BEB102). The net charge measured using pyMBE (orange circles) matches the reference data reported by Torres et al.25,26 (blue squares) within the estimated error. For reference, the analytical solution of the Henderson–Hasselbalch (HH) equations(12) and (13) for each protein is plotted as a black line.

FIG. 8.

Net charge Q of α-lactalbumin [panel (a)] and β-lactoglobulin [panel (b)] as a function of the pH. The pyMBE module has been used to set up a coarse-grained model of each protein from their PDB crystallographic data: α-lactalbumin (PDB code: 1F6S83) and β-lactoglobulin (PDB code: 1BEB102). The net charge measured using pyMBE (orange circles) matches the reference data reported by Torres et al.25,26 (blue squares) within the estimated error. For reference, the analytical solution of the Henderson–Hasselbalch (HH) equations(12) and (13) for each protein is plotted as a black line.

Close modal

As our final benchmark, we present a simulation of an acid–base equilibrium in a two-phase system that requires the grand-reaction (G-RxMC) method in order to correctly model the interplay of charge regulation and ion partitioning. The model system consists of a solution of weak polyacid chains, which is separated by a semi-permeable membrane from an aqueous solution of small ions, termed reservoir. This system has been used by Landsgesell et al.33 to validate the G-RxMC method. In brief, the polyelectrolyte is represented by a bead–spring polymer model derived from the Kremer–Grest model.103 The simulated system contains 16 chains, each composed of N = 50 weakly acidic monomers with pKA = 4.0. The simulations are performed at different reservoir compositions (salt concentration and pH) and concentrations of the polyelectrolyte within the system. A full description of the model and its parameters can be found in Ref. 33.

The G-RxMC simulation requires setting up of multiple chemical reactions, which are mutually coupled. Some of these reactions represent the acid–base ionization, and others represent the exchange of ion pairs between the system and the reservoir. To ensure that the simulation works correctly and efficiently in a broad range of pH and salt concentrations, additional reactions should be set up, obtained as linear combinations of the previous ones. Because the number of reactions is greater than the number of independent parameters, an error in the setup may result in an unphysical behavior of the system. The use of pyMBE in combination with ESPResSo facilitates the setup of such a system, helping the user avoid common errors.

In Fig. 9, we plot the degree of ionization α of the polyacid chains as a function of the pH in the reservoir, at a monomer concentration inside the system of cmon = 435 mM and salt concentration in the reservoir of cNaClres=10mM. Unlike the previous benchmark results, Fig. 9 contains two sets of reference curves. The solid line represents the ideal Henderson–Hasselbalch equation, using the pH in the reservoir as an input, and disregarding the Donnan partitioning of H+ ions. It was obtained using the function from pyMBE. The dashed line represents the ideal Henderson–Hasselbalch prediction, using the pH in the system as an input, i.e., accounting for the Donnan partitioning of ions (HH+Don). It was obtained using the function of pyMBE. The key feature of the G-RxMC method is that it accounts for both, Donnan partitioning and direct electrostatic interactions between the particles. Figure 9 shows that our simulation results, obtained with ESPResSo and using pyMBE to automatically set up all the required reactions, quantitatively agree with the reference results.

FIG. 9.

Degree of ionization α of a weak polyacid solution coupled to a reservoir vs the pH in the reservoir, obtained for a salt concentration of cNaClres=10mM in the reservoir and a monomer concentration of cmon = 435 mM in the solution. The reference data by Landsgesell et al.33 were also obtained using ESPResSo. “HH” corresponds to the result calculated using the ideal Henderson–Hasselbalch equation (Eq. (12)), while “HH+Don” results from a coupled systems of equations involving the Henderson–Hasselbalch equation and the ideal Donnan theory.

FIG. 9.

Degree of ionization α of a weak polyacid solution coupled to a reservoir vs the pH in the reservoir, obtained for a salt concentration of cNaClres=10mM in the reservoir and a monomer concentration of cmon = 435 mM in the solution. The reference data by Landsgesell et al.33 were also obtained using ESPResSo. “HH” corresponds to the result calculated using the ideal Henderson–Hasselbalch equation (Eq. (12)), while “HH+Don” results from a coupled systems of equations involving the Henderson–Hasselbalch equation and the ideal Donnan theory.

Close modal

We presented the Python-based Molecule Builder for ESPResSo (pyMBE). pyMBE is an open-source software application that facilitates the setup of coarse-grained (CG) models of complex molecules in the ESPResSo simulation software. It provides a user-friendly way of building models of flexible peptides and disordered proteins, as well as rigid globular proteins. In addition, it facilitates an automated setup of acid–base reactions for constant-pH simulations, as well as exchange of small ions between two phases in simulations using the grand-reaction method. The use of pyMBE reduces the risk of user errors when setting up such complex simulations, thereby lowering the barrier for new users to start using these methods, implemented in the ESPResSo simulation software. The development of a common tool to build molecules and set up chemical reactions in such simulations represents also an important step toward reproducibility of results.

In a set of benchmark results, we showed that simulation setups using pyMBE reproduced reference data from constant-pH and grand-reaction simulations produced by different research groups and software. In addition to benchmarking, these reference results will also serve as a set of test cases for future development of the pyMBE module.

The pyMBE module currently supports only a limited set of interaction potentials (force fields). These force fields have been selected to enable reproducing the reference results mentioned above. Therefore, they may not necessarily represent the most common set of force fields used for modeling of peptides and proteins. However, thanks to pyMBE, replacing a force field or using a different set of interaction parameters becomes a straightforward task.

The pyMBE module is still under active development as a collaborative project between researchers in the field of physics and chemistry of weak polyelectrolytes and biomacromolecules. We plan to continue developing the pyMBE module by extending it to build CG models of other molecules, such as polymer networks, dendrimers, and other branched structures and nanoparticles. We also plan to extend the pyMBE module to include other established force fields for CG models of biomolecules, such as MARTINI.104 

We thank the early users of the pyMBE module who have contributed to its development by providing valuable feedback on their experience when using the library: Marius Aarsten, Corinna Dannert, Rita S. Dias, Sergio Madurga, Alberto Martinez-Serra, Francesc Mas, Magdaléna Nejedlá, Cristina Landa Barrio, and Raju Lunkad. D.B. acknowledges the German Research Foundation (DFG) for funding within the Research Unit FOR2811 “Adaptive Polymer Gels with Model Network Structure” under Grant No. 423791428 along with Grant No. 397384169 (TP7). P.B.T. acknowledges Ph.D. fellowship from CONICET. S.P. and P.K. acknowledge the financial support of the Czech Science Foundation, Grant No. 21-31978J and grant agency of the Charles University, GAUK Project No. 150224. C.F.N. acknowledges the financial support from AGENCIA I+D+i (FONCyT), Grant No. PICT-2021-GRFTI-00090 and from Universidad Tecnológica Nacional (Grant Nos. PIDs PATCASR0008459 and PATCASR0008463). J.-N.G. acknowledges the DFG for funding under Grant No. 528726435 (PI: Holm) and funding by the European Union; this work has received funding from the European High Performance Computing Joint Undertaking (JU) and countries participating in the project under Grant Agreement No. 101093169. P.M.B. acknowledges the financial support from the Spanish Ministry of Universities (Margarita Salas Grant No. MS98), from the Generalitat de Catalunya (Grant No. 2021SGR00350), and from the European Union’s Horizon Europe research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 101062456 (ModEMUS).

The authors have no conflicts to disclose.

D.B. and P.B.T. contributed equally to this work.

David Beyer: Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Paola B. Torres: Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). Sebastian P. Pineda: Software (equal); Visualization (equal); Writing – original draft (equal). Claudio F. Narambuena: Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal). Jean-Noël Grad: Data Curation (equal); Software (equal); Writing – review & editing (equal). Peter Košovan: Conceptualization (equal); Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal). Pablo M. Blanco: Conceptualization (lead); Funding acquisition (equal); Project administration (equal); Software (lead); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).

The original data showcased in this article in Figs. 79 are available in the pyMBE repository in the folder “/testsuite/data/src” with the permission of its original authors. These datasets were originally published in Refs. 19, 25, 26, 33, and 98. We also provide the set of scripts that we used to reproduce the data of these articles with pyMBE in the repository of the software in the folder “/samples/Beyer2024”. These scripts are part of pyMBE release 0.8.0105 available on Zenodo.

Currently, the pyMBE module depends on the following libraries: ESPResSo,1,2 NumPy,106 Pandas,96,97 Pint,73 Pint-Pandas,107 and, only for developers, pdoc.72 Continuous testing and documentation generation are automated using European Environment for Scientific Software Installations (EESSI)108 via GitHub Action eessi/github-action-eessi.109 We note that these dependencies apply to the current version of pyMBE at the time of writing, and they might change in the future as pyMBE is a live project. We, therefore, recommend the reader to visit the repository of pyMBE for up-to-date information about the dependencies of the pyMBE module.

To extend the description of the pyMBE dataframe that we provided in Sec. II F, let us consider one of the example cases presented in Sec. II B 3. For simplicity, let us consider a user that uses the pyMBE module to create two molecules of the type with a small number of monomeric units, for example two. When creating objects in ESPResSo, pyMBE adds one row per object into the pyMBE dataframe to bookkeep the information of every individual object created. For example, if the user filters the pyMBE dataframe by , pyMBE returns the dataframe depicted in Table III. The first two rows correspond to residues belonging to a molecule with = 0, while the last two correspond to a second molecule with a = 1. The user can also filter by = to check how pyMBE has bonded the particles to create the two molecules in the system as shown in Table IV. In this case, the subset of the pyMBE dataframe displays six rows, corresponding to the number of bonds needed to build the two example molecules. Numerical identifiers of the two bonded particles within the ESPResSo system are stored in and . The stores the instance of the bond object from the ESPResSo library used to set up the bonding potential between the particle, in which the parameters of the bonding potential can be consulted. If the user filters the pyMBE dataframe by = , pyMBE returns the dataframe depicted in Table V. Each row of the dataframe corresponds to a different molecule in the ESPResSo system, where the user can observe that the molecules with of 0 and 1 correspond to molecules of the type given by =

TABLE III.

Example of a subset of the pyMBE dataframe, filtered by .

Namepmb_typeresidue_idmolecule_idcentral_beadside_chains
IA residue [I, A] 
IB residue [I, B] 
IA residue [I, A] 
IB residue [I, B] 
Namepmb_typeresidue_idmolecule_idcentral_beadside_chains
IA residue [I, A] 
IB residue [I, B] 
IA residue [I, A] 
IB residue [I, B] 
TABLE IV.

Example of a subset of the pyMBE dataframe, filtered by .

namepmb_typeparticle_idparticle_id2bond_object
I–A bond HarmonicBond() 
I–I bond HarmonicBond() 
I–B bond HarmonicBond() 
I–A bond HarmonicBond() 
I–I bond HarmonicBond() 
I–B bond HarmonicBond() 
namepmb_typeparticle_idparticle_id2bond_object
I–A bond HarmonicBond() 
I–I bond HarmonicBond() 
I–B bond HarmonicBond() 
I–A bond HarmonicBond() 
I–I bond HarmonicBond() 
I–B bond HarmonicBond() 
TABLE V.

Example of a subset of the pyMBE dataframe, filtered by .

namepmb_typemolecule_idresidue_list
alternating_polymer molecule [IA, IB] 
alternating_polymer molecule [IA, IB] 
namepmb_typemolecule_idresidue_list
alternating_polymer molecule [IA, IB] 
alternating_polymer molecule [IA, IB] 
As explained in Sec. II C 2, composition of the reservoir in the grand-reaction method is specified via the chemical potentials of ions in the reservoir. These ions are exchanged between the system and reservoir as electroneutral ion pairs. Within the reaction ensemble framework, these exchanges are represented as virtual chemical reactions, Eqs. (4)(7). The acceptance probabilities of these virtual chemical reactions are the same as the exchange of ion pairs in the grand canonical ensemble.33 The equilibrium constant for each ion pair is related to the sum of their chemical potentials in the reservoir:
(C1)
Furthermore, the value of Kij is related to the concentrations of these ions in the reservoir,
(C2)
where γ±=(exp(βμiex)+exp(βμjex))/2 is the mean activity coefficient at the given composition of the reservoir and μiex is the excess chemical potential of ion i. In an ideal system, γ± = 1, and the constants Kij are simply given by the products of concentrations of individual ions. The self-ionization of water is given by its equilibrium constant at room temperature,
(C3)
Out of the three remaining constants, only two can be chosen independently, because the last one is determined by the requirement of electroneutrality in the reservoir. In a non-ideal system containing only monovalent ions, γ± depends on the ionic strength of the solution,
(C4)
where the summation runs over all small ions present in the reservoir. Therefore, constants Kij depend not only on the concentrations of individual ions but also on the ionic strength.
In contrast with simulations in the grand-reaction ensemble, the reservoir composition in experiments would be typically specified by the concentrations of individual ionic species and by pH in the reservoir, defined as
(C5)
For simplicity, we assume that the reservoir contains only monovalent salt ions, which we call Na+ and Cl, and the H+ and OH ions, which are inevitably present in aqueous solutions. This could be prepared by dissolving a given amount of NaCl in pure water and then adjusting the pH by adding extra NaOH or HCl. Because pyMBE currently implements all ions as point charges with WCA potential for steric repulsion, the discussion below applies to any reservoir that contains only monovalent ions represented in the same way.
When the ionic strength of the solution is determined predominantly by the NaCl concentration, one can use for example the extended Debye–Hückel theory to estimate the activity coefficients from the ionic strength,91 
(C6)
where a is the effective diameter of the ion in nanometers and A and B are constants, whose values for aqueous solutions at room temperature are A = 0.5085 l1/2 mol−1/2 and B = 0.3281 l1/2 mol−1/2 nm−1. The calculation becomes more complex at extreme pH values, when the amount of added NaOH or HCl exceeds the amount of NaCl. In the general case, the activity coefficients (excess chemical potentials) are uniquely determined by the ionic strength, which depends on the concentrations of all ions,
(C7)
At low ionic strengths, f can be approximated by Eq. (C6), while at high ionic strengths, it can be estimated from simulations using the Widom insertion method.92 As explained in Sec. II C 2, in pyMBE, the functional form of f is not fixed but has to be provided by the user when setting up the G-RxMC method. In the general case, Eqs. (C3), (C5), and (C7) form a coupled system of nonlinear equations that can be solved numerically for the concentrations and the excess chemical potential, given values for pHres and cNaClres. In pyMBE, this calculation is performed iteratively until self-consistency.

In reactive two-phase systems, charge regulation and ion partitioning are inherently coupled in a highly non-linear fashion, as shown in Fig. 6(b). For ideal systems, this leads to a system of equations that can be solved numerically to obtain the Donnan potential and ionization degrees under the given conditions. As in the case of single-phase systems, this ideal description is often an important reference point to assess the role of (electrostatic) interactions.

In an ideal model, the partitioning of small ions is described by the Donnan theory, and the charge regulation of the weak acid and base groups is described by the Henderson–Hasselbalch equation. The partitioning of exchangeable ionic species is determined by the equality of electrochemical potentials,
(D1)
where n ∈ {H+, OH, Na+, Cl}. Here, qn is the charge of ion n and ψDon is the Donnan potential, which plays the role of a Lagrange multiplier that ensures the electroneutrality of both phases.33 Consequently, the resulting equilibrium is also often termed the “Donnan equilibrium.” Assuming an ideal system, i.e., neglecting all interactions, the Donnan potential can be related to the partition coefficient of cations,
(D2)
and a closed expression for the Donnan potential can be obtained as follows:33 
(D3)
In this equation,
(D4)
is the ionic strength of the reservoir, taking into account the small ions H+, OH, Na+, and Cl. The sum iziαici runs over all ionizable groups in the system, which cannot be exchanged with the reservoir. More specifically, zi is the charge number of group i when it is ionized, αi is the corresponding degree of ionization, and ci is the total concentration of groups of type i in the system, irrespective of the ionization state. For charge-regulating systems, the degrees of ionization αi can be calculated from the pH in the system using the Henderson–Hasselbalch equation (12). However, the pH in the system differs from pH in the reservoir by the following Donnan contribution:
(D5)
Under the ideal gas approximation, the degree of ionization αi is thus given by
(D6)
where pHsys depends on pHres and ψDon through Eq. (D5). Equations (D3) and (D6) form a system of non-linear equations relating the degrees of ionization and the Donnan potential; charge regulation and ion partitioning are thus inherently coupled. Inserting the equations into each other, it is possible to reduce the set of equations to a non-linear equation for a single variable. In practice, it is convenient to formulate this equation in terms of the partition coefficient ξ+,
(D7)
where
(D8)
While not solvable analytically, this equation can be solved numerically using standard root finding techniques. In the pyMBE module, this calculation can be performed using the function , as demonstrated in Sec. III D.
1.
F.
Weik
,
R.
Weeber
,
K.
Szuttor
,
K.
Breitsprecher
,
J.
de Graaf
,
M.
Kuron
,
J.
Landsgesell
,
H.
Menke
,
D.
Sean
, and
C.
Holm
,
Eur. Phys. J.: Spec. Top.
227
,
1789
(
2019
).
2.
R.
Weeber
,
J.-N.
Grad
,
D.
Beyer
,
P. M.
Blanco
,
P.
Kreissl
,
A.
Reinauer
,
I.
Tischler
,
P.
Košovan
, and
C.
Holm
, in
Comprehensive Computational Chemistry
, 1st ed., edited by
M.
Yáñez
and
R. J.
Boyd
(
Elsevier
,
Oxford
,
2024
), pp.
578
601
.
3.
H. V.
Guzman
,
N.
Tretyakov
,
H.
Kobayashi
,
A. C.
Fogarty
,
K.
Kreis
,
J.
Krajniak
,
C.
Junghans
,
K.
Kremer
, and
T.
Stuehn
,
Comput. Phys. Commun.
238
,
66
(
2019
).
4.
M.
Lund
,
M.
Trulsson
, and
B.
Persson
,
Source Code Biol. Med.
3
,
17
(
2008
).
5.
B.
Stenqvist
,
A.
Thuresson
,
A.
Kurut
,
R.
Vácha
, and
M.
Lund
,
Mol. Simul.
39
,
1233
(
2013
).
6.
H. W.
Hatch
,
N. A.
Mahynski
, and
V. K.
Shen
,
J. Res. Natl. Inst. Stand. Technol.
123
,
123004
(
2018
).
7.
M. J.
Abraham
,
T.
Murtola
,
R.
Schulz
,
S.
Páll
,
J. C.
Smith
,
B.
Hess
, and
E.
Lindahl
,
SoftwareX
1–2
,
19
(
2015
).
8.
J. A.
Anderson
,
J.
Glaser
, and
S. C.
Glotzer
,
Comput. Mater. Sci.
173
,
109363
(
2020
).
9.
R.
Jurij
and
L.
Per
,
J. Comput. Chem.
36
,
1259
(
2015
).
10.
A. P.
Thompson
,
H. M.
Aktulga
,
R.
Berger
,
D. S.
Bolintineanu
,
W. M.
Brown
,
P. S.
Crozier
,
P. J.
in’t Veld
,
A.
Kohlmeyer
,
S. G.
Moore
,
T. D.
Nguyen
,
R.
Shan
,
M. J.
Stevens
,
J.
Tranchida
,
C.
Trott
, and
S. J.
Plimpton
,
Comput. Phys. Commun.
271
,
108171
(
2022
).
11.
The Canva Development Team
, Canva: Visual suite for everyone,
2023
.
12.
M.
Lund
and
B.
Jönsson
,
Q. Rev. Biophys.
46
,
265
(
2013
).
13.
J.
Landsgesell
,
L.
Nová
,
O.
Rud
,
F.
Uhlík
,
D.
Sean
,
P.
Hebbeker
,
C.
Holm
, and
P.
Košovan
,
Soft Matter
15
,
1155
(
2019
).
14.
P. M.
Blanco
,
C. F.
Narambuena
,
S.
Madurga
,
F.
Mas
, and
J. L.
Garcés
,
Polymers
15
,
2680
(
2023
).
15.
C. E.
Reed
and
W. F.
Reed
,
J. Chem. Phys.
96
,
1609
(
1992
).
16.
A.
Laguecir
,
S.
Ulrich
,
J.
Labille
,
N.
Fatin-Rouge
,
S.
Stoll
, and
J.
Buffle
,
Eur. Polym. J.
42
,
1135
(
2006
).
17.
M.
Ullner
,
B.
Jönsson
,
B.
Söderberg
, and
C.
Peterson
,
J. Chem. Phys.
104
,
3048
(
1996
).
18.
M.
Ullner
and
B.
Jönsson
,
Macromolecules
29
,
6645
(
1996
).
19.
P. M.
Blanco
,
S.
Madurga
,
J. L.
Garcés
,
F.
Mas
, and
R. S.
Dias
,
Soft Matter
17
,
655
(
2021
).
20.
R.
Lunkad
,
A.
Murmiliuk
,
Z.
Tošner
,
M.
Štěpánek
, and
P.
Košovan
,
Polymers
13
,
214
(
2021
).
21.
R.
Lunkad
,
P.
Biehl
,
A.
Murmiliuk
,
P. M.
Blanco
,
P.
Mons
,
M.
Štěpánek
,
F. H.
Schacher
, and
P.
Košovan
,
Macromolecules
55
,
7775
(
2022
).
22.
S. P.
Pineda
,
R.
Staňo
,
A.
Murmiliuk
,
P. M.
Blanco
,
P.
Montes
,
Z.
Tošner
,
O.
Groborz
,
J.
Pánek
,
M.
Hrubý
,
M.
Štěpánek
, and
P.
Košovan
,
JACS Au
4
(
5
),
1775
1785
(
2024
).
23.
M.
Lund
and
B.
Jönsson
,
Biochemistry
44
,
5722
(
2005
).
24.
F. L. B.
da Silva
and
B.
Jönsson
,
Soft Matter
5
,
2862
(
2009
).
25.
P. B.
Torres
,
E.
Quiroga
,
A. J.
Ramirez-Pastor
,
V.
Boeris
, and
C. F.
Narambuena
,
J. Phys. Chem. B
123
,
8617
(
2019
).
26.
P. B.
Torres
,
P. M.
Blanco
,
J. L.
Garcés
, and
C. F.
Narambuena
,
J. Chem. Phys.
157
,
205101
(
2022
).
27.
T.
Hoare
and
R.
Pelton
,
Macromolecules
37
,
2544
(
2004
).
28.
C.
Hofzumahaus
,
P.
Hebbeker
, and
S.
Schneider
,
Soft Matter
14
,
4087
(
2018
).
29.
C.
Hofzumahaus
,
C.
Strauch
, and
S.
Schneider
,
Soft Matter
17
,
6029
(
2021
).
30.
C.
Strauch
and
S.
Schneider
,
Soft Matter
19
,
938
(
2023
).
31.
A.
Clavier
,
M.
Seijo
,
F.
Carnal
, and
S.
Stoll
,
Phys. Chem. Chem. Phys.
17
,
4346
(
2015
).
32.
M.
Stornes
,
P. M.
Blanco
, and
R. S.
Dias
,
Colloids Surf., A
628
,
127258
(
2021
).
33.
J.
Landsgesell
,
P.
Hebbeker
,
O.
Rud
,
R.
Lunkad
,
P.
Košovan
, and
C.
Holm
,
Macromolecules
53
,
3007
(
2020
).
34.
D.
Beyer
,
J.
Landsgesell
,
P.
Hebbeker
,
O.
Rud
,
R.
Lunkad
,
P.
Košovan
, and
C.
Holm
,
Macromolecules
55
,
1088
(
2022
).
35.
J.
Rička
and
T.
Tanaka
,
Macromolecules
17
,
2916
(
1984
).
36.
J.
Tang
,
T.
Katashima
,
X.
Li
,
Y.
Mitsukami
,
Y.
Yokoyama
,
N.
Sakumichi
,
U.-i.
Chung
,
M.
Shibayama
, and
T.
Sakai
,
Macromolecules
53
,
8244
(
2020
).
37.
O. E.
Philippova
,
D.
Hourdet
,
R.
Audebert
, and
A. R.
Khokhlov
,
Macromolecules
30
,
8278
(
1997
).
38.
J.
Landsgesell
,
D.
Beyer
,
P.
Hebbeker
,
P.
Košovan
, and
C.
Holm
,
Macromolecules
55
,
3176
(
2022
).
39.
D.
Beyer
,
P.
Košovan
, and
C.
Holm
,
Macromolecules
55
,
10751
(
2022
).
40.
C.
Love
,
J.
Steinkühler
,
D. T.
Gonzales
,
N.
Yandrapalli
,
T.
Robinson
,
R.
Dimova
, and
T. D.
Tang
,
Angew. Chem., Int. Ed.
59
,
5950
(
2020
).
41.
R.
Staňo
,
J. J.
van Lente
,
S.
Lindhoud
, and
P.
Košovan
,
Macromolecules
57
(
3
),
1383
1398
(
2024
).
42.
G.
Ferrand-Drake del Castillo
,
R. L. N.
Hailes
,
Z.
Adali-Kaya
,
T.
Robson
, and
A.
Dahlin
,
Chem. Commun.
56
,
5889
(
2020
).
43.
G.
Ferrand-Drake del Castillo
,
R. L. N.
Hailes
, and
A.
Dahlin
,
J. Phys. Chem. Lett.
11
,
5212
(
2020
).
44.
J.
Yuan
and
Y.
Wang
,
J. Phys. Chem. B
125
,
10589
(
2021
).
45.
D.
Beyer
,
P.
Košovan
, and
C.
Holm
,
Phys. Rev. Lett.
131
,
168101
(
2023
).
46.
X.
Yuan
,
H. W.
Hatch
,
J. C.
Conrad
,
A. B.
Marciel
, and
J. C.
Palmer
,
Soft Matter
19
,
4333
(
2023
).
47.
C.
Balzer
and
Z.-G.
Wang
,
Eur. Phys. J. E
46
,
82
(
2023
).
48.
T.
Briskot
,
N.
Hillebrandt
,
S.
Kluters
,
G.
Wang
,
J.
Studts
,
T.
Hahn
,
T.
Huuk
, and
J.
Hubbuch
,
J. Membr. Sci.
648
,
120333
(
2022
).
49.
M. D.
Hanwell
,
D. E.
Curtis
,
D. C.
Lonie
,
T.
Vandermeersch
,
E.
Zurek
, and
G. R.
Hutchison
,
J. Cheminf.
4
,
17
(
2012
).
50.
W.
Humphrey
,
A.
Dalke
, and
K.
Schulten
,
J. Mol. Graphics
14
,
33
(
1996
).
51.
Schrödinger, LLC
, The PyMOL molecular graphics system, version 2.5.7,
2023
.
52.
R.
Salomon-Ferrer
,
D. A.
Case
, and
R. C.
Walker
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
3
,
198
(
2012
).
53.
D. A.
Case
,
H. M.
Aktulga
,
K.
Belfon
,
D. S.
Cerutti
,
G. A.
Cisneros
,
V. W. D.
Cruzeiro
,
N.
Forouzesh
,
T. J.
Giese
,
A. W.
Götz
,
H.
Gohlke
,
S.
Izadi
,
K.
Kasavajhala
,
M. C.
Kaymak
,
E.
King
,
T.
Kurtzman
,
T.-S.
Lee
,
P.
Li
,
J.
Liu
,
T.
Luchko
,
R.
Luo
,
M.
Manathunga
,
M. R.
Machado
,
H. M.
Nguyen
,
K. A.
O’Hearn
,
A. V.
Onufriev
,
F.
Pan
,
S.
Pantano
,
R.
Qi
,
A.
Rahnamoun
,
A.
Risheh
,
S.
Schott-Verdugo
,
A.
Shajan
,
J.
Swails
,
J.
Wang
,
H.
Wei
,
X.
Wu
,
Y.
Wu
,
S.
Zhang
,
S.
Zhao
,
Q.
Zhu
,
T. E.
Cheatham
,
D. R.
Roe
,
A.
Roitberg
,
C.
Simmerling
,
D. M.
York
,
M. C.
Nagan
, and
K. M.
Merz
,
J. Chem. Inf. Model.
63
,
6183
(
2023
).
54.
J.
Sallai
,
G.
Varga
,
S.
Toth
,
C.
Iacovella
,
C.
Klein
,
C.
McCabe
,
A.
Ledeczi
, and
P. T.
Cummings
,
Procedia Comput. Sci.
29
,
2034
(
2014
).
55.
C.
Klein
,
J.
Sallai
,
T. J.
Jones
,
C. R.
Iacovella
,
C.
McCabe
, and
P. T.
Cummings
, “
A hierarchical, component based approach to screening properties of soft matter
,” in
Molecular Modeling and Simulation
(
Springer
,
Singapore
,
2016
), pp.
79
92
.
56.
P. T.
Cummings
,
C.
MCabe
,
C. R.
Iacovella
,
A.
Ledeczi
,
E.
Jankowski
,
A.
Jayaraman
,
J. C.
Palmer
,
E. J.
Maginn
,
S. C.
Glotzer
,
J. A.
Anderson
,
J.
Ilja Siepmann
,
J.
Potoff
,
R. A.
Matsumoto
,
J. B.
Gilmer
,
R. S.
DeFever
,
R.
Singh
, and
B.
Crawford
,
AIChE J.
67
,
e17206
(
2021
).
57.
A. G.
Demidov
,
B. L. A.
Perera
,
M. E.
Fortunato
,
S.
Lin
, and
C. M.
Colina
,
Living J. Comput. Mol. Sci.
4
,
1561
(
2022
).
58.
M. C.
Ramos
,
P. K.
Quoika
,
V. A. C.
Horta
,
D. M.
Dias
,
E. G.
Costa
,
J. L. M.
do Amaral
,
L. M.
Ribeiro
,
K. R.
Liedl
, and
B. A. C.
Horta
,
J. Chem. Inf. Model.
61
,
1539
(
2021
).
59.
Molydyn Ltd.
, Atlas: Supporting your simulations,
2023
.
60.
D. H.
de Jong
,
G.
Singh
,
W. F. D.
Bennett
,
C.
Arnarez
,
T. A.
Wassenaar
,
L. V.
Schäfer
,
X.
Periole
,
D. P.
Tieleman
, and
S. J.
Marrink
,
J. Chem. Theory Comput.
9
,
687
(
2012
).
61.
P. C.
Kroon
,
F.
Grünewald
,
J.
Barnoud
,
M. v.
Tilburg
,
P. C. T.
Souza
,
T. A.
Wassenaar
, and
S.-J.
Marrink
, arXiv:2212.01191v3 (
2024
).
62.
T.
Bereau
and
K.
Kremer
,
J. Chem. Theory Comput.
11
,
2783
(
2015
).
63.
C.
Hilpert
,
L.
Beranger
,
P. C. T.
Souza
,
P. A.
Vainikka
,
V.
Nieto
,
S. J.
Marrink
,
L.
Monticelli
, and
G.
Launay
,
J. Chem. Inf. Model.
63
,
702
(
2023
).
64.
D.
Kuťák
,
E.
Poppleton
,
H.
Miao
,
P.
Šulc
, and
I.
Barišić
,
Molecules
27
,
63
(
2021
).
65.
C.
Tan
,
J.
Jung
,
C.
Kobayashi
,
D. U. L.
Torre
,
S.
Takada
, and
Y.
Sugita
,
PLoS Comput. Biol.
18
,
e1009578
(
2022
).
66.
J. P. K.
Doye
,
T. E.
Ouldridge
,
A. A.
Louis
,
F.
Romano
,
P.
Šulc
,
C.
Matek
,
B. E. K.
Snodin
,
L.
Rovigatti
,
J. S.
Schreck
,
R. M.
Harrison
, and
W. P. J.
Smith
,
Phys. Chem. Chem. Phys.
15
,
20395
(
2013
).
67.
J. M.
Majikes
and
J. A.
Liddle
,
J. Res. Natl. Inst. Stand. Technol.
126
,
126001
(
2021
).
68.
T. A.
Wassenaar
,
H. I.
Ingólfsson
,
R. A.
Böckmann
,
D. P.
Tieleman
, and
S. J.
Marrink
,
J. Chem. Theory Comput.
11
,
2144
(
2015
).
69.
A. I.
Jewett
,
D.
Stelter
,
J.
Lambert
,
S. M.
Saladi
,
O. M.
Roscioni
,
M.
Ricci
,
L.
Autin
,
M.
Maritan
,
S. M.
Bashusqeh
,
T.
Keyes
,
R. T.
Dame
,
J.-E.
Shea
,
G. J.
Jensen
, and
D. S.
Goodsell
,
J. Mol. Biol.
433
,
166841
(
2021
).
70.
B.
Crawford
,
U.
Timalsina
,
C. D.
Quach
,
N. C.
Craven
,
J. B.
Gilmer
,
C.
McCabe
,
P. T.
Cummings
, and
J. J.
Potoff
,
J. Chem. Inf. Model.
63
,
1218
(
2023
).
71.
Y.
Nejahi
,
M. S.
Barhaghi
,
G.
Schwing
,
L.
Schwiebert
, and
J.
Potoff
,
SoftwareX
13
,
100627
(
2021
).
72.
A.
Gallant
and
M.
Hils
(
2023
). “
Pdoc–generate API documentation for Python projects
,” GitHub.
73.
H. E.
Grecco
and
J.
Chéron
(
2023
). “
Pint: makes units easy
,” GitHub.
74.
D.
Frenkel
and
B.
Smit
,
Understanding Molecular Simulation: From Algorithms to Applications
, 3rd ed. (
Elsevier
,
2023
).
75.
E. A.
Bienkiewicz
and
K. J.
Lumb
,
J. Biomol. NMR
15
,
203
(
1999
).
76.
CRC Handbook of Chemistry and Physics
, 72nd ed., edited by
D. R.
Lide
(
CRC Press
,
Boca Raton, FL
,
1991
).
77.
M. A. S.
Hass
and
F. A. A.
Mulder
,
Annu. Rev. Biophys.
44
,
53
(
2015
).
78.
Y.
Nozaki
and
C.
Tanford
, in
Methods in Enzymology
,
Enzyme Structure Vol. 11
, edited by
C. H. W.
Hirs
(
Academic Press
,
New York, NY
,
1967
), pp.
715
734
.
79.
G.
Platzer
,
M.
Okon
, and
L. P.
McIntosh
,
J. Biomol. NMR
60
,
109
(
2014
).
80.
R. L.
Thurlkill
,
G. R.
Grimsley
,
J. M.
Scholtz
, and
C. N.
Pace
,
Protein Sci.
15
,
1214
(
2006
).
82.
O.
Lenz
,
A.
Mezger
, and
H.
Menke
(
2014
). “
VTF plugin, version 2.4
,” GitHub.
83.
E. D.
Chrysina
,
K.
Brew
, and
K. R.
Acharya
,
J. Biol. Chem.
275
,
37021
(
2000
).
84.
J.
Landsgesell
,
C.
Holm
, and
J.
Smiatek
,
Eur. Phys. J.: Spec. Top.
226
,
725
(
2017
).
85.
P.
Košovan
,
J.
Landsgesell
,
L.
Nová
,
F.
Uhlík
,
D.
Beyer
,
P. M.
Blanco
,
R.
Staňo
, and
C.
Holm
,
Soft Matter
19
,
3522
(
2023
).
86.
C.
Labbez
and
B.
Jönsson
, in
Applied Parallel Computing: State of the Art in Scientific Computing
,
Lecture Notes in Computer Science Vol. 4699
, edited by
B.
Kågström
,
E.
Elmroth
,
J.
Dongarra
, and
J.
Waśniewski
(
Springer
,
Berlin, Heidelberg
,
2007
), pp.
66
72
.
87.
R.
Staňo
,
P.
Košovan
,
A.
Tagliabue
, and
C.
Holm
,
Macromolecules
54
,
4769
(
2021
).
88.
W. R.
Smith
and
B.
Triska
,
J. Chem. Phys.
100
,
3019
(
1994
).
89.
J. K.
Johnson
,
A. Z.
Panagiotopoulos
, and
K. E.
Gubbins
,
Mol. Phys.
81
,
717
(
1994
).
90.
D.
Beyer
and
C.
Holm
,
J. Chem. Phys.
159
,
014905
(
2023
).
91.
P. W.
Atkins
,
J.
de Paula
, and
J.
Keeler
,
Atkins’ Physical Chemistry
, 12th ed. (
Oxford University Press
,
Oxford, United Kingdom
,
2023
).
92.
B.
Widom
,
J. Chem. Phys.
39
,
2808
(
1963
).
93.
T.
Curk
,
J.
Yuan
, and
E.
Luijten
,
J. Chem. Phys.
156
,
044122
(
2022
).
95.
D.
Berthelot
,
C. R. Acad. Sci.
126
,
1703
(
1898
).
96.
W.
McKinney
, in
Proceedings of the 9th Python in Science Conference
, edited by
S.
van der Walt
and
J.
Millman
(
2010
), pp.
56
61
.
97.
The Pandas Development Team
(
2023
). “
pandas-dev/pandas: Pandas, version 1.5.3
,” Zenodo. https://doi.org/10.5281/zenodo.3509134
98.
R.
Lunkad
,
A.
Murmiliuk
,
P.
Hebbeker
,
M.
Boublík
,
Z.
Tošner
,
M.
Štěpánek
, and
P.
Košovan
,
Mol. Syst. Des. Eng.
6
,
122
(
2021
).
99.
K.
Hyltegren
,
T.
Nylander
,
M.
Lund
, and
M.
Skepö
,
J. Colloid Interface Sci.
467
,
280
(
2016
).
100.
B. J.
MacKay
,
L.
Denepitiya
,
V. J.
Iacono
,
S. B.
Krost
, and
J. J.
Pollock
,
Infect. Immun.
44
,
695
(
1984
).
101.
S.
Puri
and
M.
Edgerton
,
Eukaryotic Cell
13
,
958
(
2014
).
102.
S.
Brownlow
,
J. H. M.
Cabral
,
R.
Cooper
,
D. R.
Flower
,
S. J.
Yewdall
,
I.
Polikarpov
,
A. C. T.
North
, and
L.
Sawyer
,
Structure
5
,
481
(
1997
).
103.
G. S.
Grest
and
K.
Kremer
,
Phys. Rev. A
33
,
3628
(
1986
).
104.
P. C. T.
Souza
,
R.
Alessandri
,
J.
Barnoud
,
S.
Thallmair
,
I.
Faustino
,
F.
Grünewald
,
I.
Patmanidis
,
H.
Abdizadeh
,
B. M. H.
Bruininks
,
T. A.
Wassenaar
,
P. C.
Kroon
,
J.
Melcr
,
V.
Nieto
,
V.
Corradi
,
H. M.
Khan
,
J.
Domański
,
M.
Javanainen
,
H.
Martinez-Seara
,
N.
Reuter
,
R. B.
Best
,
I.
Vattulainen
,
L.
Monticelli
,
X.
Periole
,
D. P.
Tieleman
,
A. H.
de Vries
, and
S. J.
Marrink
,
Nat. Methods
18
,
382
(
2021
).
105.
D.
Beyer
,
P. B.
Torres
,
S. P.
Pineda
,
C. F.
Narambuena
,
J.-N.
Grad
,
P.
Košovan
, and
P. M.
Blanco
, pyMBE release 0.8.0,
2024
.
106.
C. R.
Harris
,
K. J.
Millman
,
S. J.
van der Walt
,
R.
Gommers
,
P.
Virtanen
,
D.
Cournapeau
,
E.
Wieser
,
J.
Taylor
,
S.
Berg
,
N. J.
Smith
,
R.
Kern
,
M.
Picus
,
S.
Hoyer
,
M. H.
van Kerkwijk
,
M.
Brett
,
A.
Haldane
,
J. F.
del Río
,
M.
Wiebe
,
P.
Peterson
,
P.
Gérard-Marchant
,
K.
Sheppard
,
T.
Reddy
,
W.
Weckesser
,
H.
Abbasi
,
C.
Gohlke
, and
T. E.
Oliphant
,
Nature
585
,
357
(
2020
).
107.
H. E.
Grecco
(
2023
). “
Pint-Pandas: Extend Pandas dataframe with physical quantities module
,” GitHub.
108.
B.
Dröge
,
V.
Holanda Rusu
,
K.
Hoste
,
C.
van Leeuwen
,
A.
O’Cais
, and
T.
Röblitz
,
Software: Pract. Exper.
53
,
176
(
2023
).
109.
A.
O’Cais
,
B.
Dröge
, and
K.
Hoste
(
2024
). “
GitHub action: EESSI, version 3.1.0
,” GitHub marketplace.