Chapter 1: A Practical Introduction to Martini 3 and its Application to Protein-Ligand Binding Simulations
-
Published:2023
Riccardo Alessandri, Sebastian Thallmair, Cristina Gil Herrero, Raúl Mera-Adasme, Siewert J. Marrink, Paulo C. T. Souza, "A Practical Introduction to Martini 3 and its Application to Protein-Ligand Binding Simulations", A Practical Guide to Recent Advances in Multiscale Modeling and Simulation of Biomolecules, Yong Wang, Ruhong Zhou
Download citation file:
Alessandri, R., Thallmair, S., Gil Herrero, C., Mera-Adasme, R., Marrink, S. J., and Souza, P. C. T., “A practical introduction to Martini 3 and its application to protein-ligand binding simulations,” in A Practical Guide to Recent Advances in Multiscale Modeling and Simulation of Biomolecules, edited by Y. Wang and R. Zhou (AIP Publishing, Melville, New York, 2023), pp. 1-1–1-34.
Martini 3 is the new version of a widely used coarse-grained (CG) model that have been extensively parameterized to reproduce experimental and thermodynamic data. Based on a building-block approach, the new version shows a better coverage of the chemical space and more accurate predictions of interactions and molecular packing in general. Given these improvements, the Martini 3 model allows new applications such as studies involving protein–ligand interactions. In this chapter, a summary of the key elements of the new Martini version is presented, followed by an example of a practical application: a simulation of caffeine binding to the buried pocket of the adenosine A2A receptor, which is part of the GPCR family. Formulated as a hands-on tutorial, this chapter contains guidelines to build CG models of important systems, such as small drug-like molecules, transmembrane proteins, and lipid membranes. Finally, the last sections contain an outlook of possible future developments and notes describing useful information, limitations, and tips about Martini.
1.1 Introduction
The Martini force field is one of the most popular coarse-grained (CG) models used in molecular dynamics (MD) simulations, with many applications in diverse fields, from structural biology (Abellón-Ruiz et al., 2017; Yen et al., 2018; and Liaci et al., 2021) and biophysics (Agostino et al., 2017; Vögele et al., 2018; and Faustino et al., 2020) to nanotechnology (Bruininks et al., 2020; Zhang et al., 2021; and Machado et al., 2022) and materials design (Liu et al., 2018; Vazquez-Salazar et al., 2020; and De Marco et al., 2021). One of the main characteristics of the Martini model is the use of a building-block approach. In this approach, small chemical fragments composed of two to four non-hydrogen atoms (but including associated hydrogens) are represented by CG particles, commonly called “beads.” The resulting simplification of the molecular systems is among the main reasons for the high computational performance of the model, which can be 2 to 3 orders of magnitude faster than atomistic force fields (see Note 1 for more details about Martini performance). Martini started to be developed around 2002 (Marrink and Mark, 2003a, 2003b), with the first version only used for the modeling of phospholipid systems (Marrink et al., 2004). Over the years, the force field was expanded in a second version, Martini 2, which included new models of not only new and more specialized lipids (López et al., 2013; Melo et al., 2015; and Gu et al., 2017), but also many other biomolecules such as proteins (Monticelli et al., 2008; and De Jong et al., 2013), carbohydrates (López et al., 2009, 2015), nucleic acids (Uusitalo et al., 2015, 2017), post-translational modifications (Atsmon-Raz and Tieleman, 2017; and Shivgan et al., 2020), and a variety of cofactors and metabolites (De Jong et al., 2015; and Sousa et al., 2021). Martini 2 models also expanded towards materials science applications as models for polymers, surfaces, and nanoparticles were developed (Alessandri et al., 2021). With the evolution of the model, limitations inherent in the coarse-graining approach (see Note 2) and also specific to Martini (see Note 3), in particular, the “stickiness” problem (Alessandri et al., 2019), became clear. The necessary improvements to the model in combination with a high demand for better coverage of chemical space have promoted the development of the new version of the model called Martini 3 (Souza et al., 2021a). In this chapter, we will focus on the main aspects of this new release of the Martini force field.
The Martini 3 model has a total of 7 chemical classes of beads: polar (P), intermediately polar (N), apolar (C), halo-compounds (X), monovalent ions (Q), divalent ions (D), and water (W) (Fig. 1.1). Except for W and D beads, all beads have subtypes that are distinguished by a number indicating the relative degree of polarity (from 1, low polarity, to a maximum of 6, high polarity). These subtypes allow for a more accurate representation of the chemical nature of the underlying atomistic structure. Furthermore, the beads can be distinguished by additional labels which mimic chemically specific interactions, such as hydrogen bonding (see Note 4 for details about other labels). All beads come in three different sizes, which are associated with the number of non-hydrogen atoms represented by the bead: regular (“R,” for 4 non-hydrogen atoms represented by 1 bead, or 4-1), small (“S,” 3-1), and tiny (“T,” 2-1) (Fig. 1.1). These assignments are reasonable for linear fragments with exceptions such as in the case of branched moieties (see Note 5).
Martini 3 employs non-bonded and bonded potentials typical for atomistic force fields (and also in the previous Martini versions), which makes the model easily implementable in modern MD programs like GROMACS (Abraham et al., 2015). Lennard–Jones (LJ) potentials are used to model the effective non-bonded interactions between all CG beads. Each pair interaction needs to be explicitly defined, with the interactions calibrated to reproduce thermodynamic data, such as liquid–liquid partitioning and miscibility. The sigma of the LJ potential defines the size of the beads, with a default size for the self-interaction of R, S, and T-beads of 0.47, 0.41, and 0.34 nm, respectively. Interactions involving special beads (such as W), cross-interactions between different sizes, and certain interactions between polar and non-polar beads may use different sigma values (Souza et al., 2021a). In addition to LJ potentials, the charged beads (Q and D) use Coulomb interactions. These electrostatic interactions take into account a default explicit screening with a relative permittivity of ϵr = 15. Both LJ and Coulomb potentials are shifted such that the potentials reach 0 kJ/mol for any distance greater than 1.1 nm, the cutoff distance (De Jong et al., 2016).
While non-bonded interactions follow a “top-down” approach (making use of experimental data) for calibration, bonded interactions are typically obtained via a “bottom-up” approach based on reference atomistic MD simulations. Simple bonded potentials, such as two-body harmonic potentials, three-body angular potentials, and four-body periodic dihedral angle potentials, are commonly used in Martini. The parameters of these potentials are fitted according to the distributions obtained from atomistic simulations mapped onto a pseudo-CG trajectory. In contrast to previous versions, Martini 3 presents a more consistent set of rules for the calculation of bond lengths and also for the selection of bead sizes, both aspects aiming at an optimal representation of molecular volume and shape of CG models (Souza et al., 2021a; and Alessandri et al., 2022). The combination of improved packing of the CG models, better coverage of the chemical space, and improved interactions opened new application avenues for Martini. One such avenue is the study of protein-small molecule binding. We recently demonstrated that Martini 3 can achieve millisecond sampling and high accuracy of protein–ligand interaction (Souza et al., 2020). Pockets and binding pathways can be predicted without the need of any a priori knowledge, with the approach being tested on a range of systems from the well-characterized T4 lysozyme to members of the GPCR, nuclear receptor, and kinase families (Souza et al., 2020).
In Secs. 1.2–1.4, we will describe a hands-on tutorial illustrating the application of Martini to protein–ligand binding. The tutorial will use one of our protein–ligand binding studies as a guide: the binding of caffeine to the adenosine A2A receptor, which is part of the GPCR family. This tutorial will first illustrate with concrete examples of how to build Martini CG models of small molecules and transmembrane proteins. In the final part of the text, these CG models will be embedded in a water-lipid bilayer system to be used in unbiased protein–ligand binding simulations. Although this material is based on GROMACS (Abraham et al., 2015) (versions 2018.X to 2020.X), similar approaches can be adapted using other software packages that support Martini such as NAMD (Phillips et al., 2020), LAMMPS (Jewett et al., 2021), ddcMD (Zhang et al., 2020), and openMM (Eastman et al., 2017).
1.2 Parameterization of a New Small Molecule
In this section, we will describe how to create a Martini 3 model for a new small molecule—in this case, a model for caffeine. We will discuss considerations that are generally relevant for creating small molecules within the Martini 3 framework. As such, this section constitutes a pragmatic description of the Martini 3 coarse-graining principles described in Refs. (Souza et al., 2021a) and (Alessandri et al., 2022). The full procedure and necessary files are available at https://github.com/ricalessandri/Martini3-small-molecules/. Here, we will discuss in detail the procedure and highlight key steps and commands.
1.2.1 Generation of atomistic reference data
We need an all-atom (AA) reference simulation to extract the bonded parameters for the CG model. There is no particular constraint on the AA model to be used. Note only that, ideally, all the hydrogens are needed to extract optimal bonded parameters in Martini 3, so an all-atom—instead of a united-atom—model is the ideal choice.
In this case, we use an OPLS-AA model as obtained from the LigParGen server (Jorgensen and Tirado-Rives, 2005; and Dodda et al., 2017a, 2017b). The model can be obtained from the server by providing the SMILES string of caffeine (i.e., Cn1cnc2n(C)c(=O)n(C)c(=O)c12). LigParGen's OPLS-AA/1.14∗CM1A-LBCC files in the GROMACS format for caffeine can also be found on the Martini 3 small molecule database (Alessandri et al., 2022). We solvate the caffeine AA model in TIP3P water (Jorgensen et al., 1998) and after energy minimizing and equilibrating the system we run an MD simulation of 50 ns. Note that caffeine is a fairly rigid small molecule; hence, this simulation time is more than sufficient to sample all the bonded degrees of freedom. In the case of more complex small molecules, a longer simulation time may be required for adequate sampling. Generating the atomistic reference data requires running a standard MD simulation with GROMACS (Abraham et al., 2015). The files necessary for the run along with a script with all the commands can be found at the GitHub repository: https://github.com/ricalessandri/Martini3-small-molecules/.
1.2.2 Definition of the coarse-grained mapping
Splitting the molecules in building blocks to be described by CG beads, i.e., CG mapping, is a cornerstone of coarse-graining. Defining a CG mapping within Martini 3 should follow the guidelines below. Moreover, the use of already available models (Alessandri et al., 2022) as a reference and starting point is another useful practice to develop Martini 3 mappings in a rapid and more consistent way.
Consider only non-hydrogen atoms to define the mapping.
Avoid dividing chemical functional groups (e.g., amide, carboxylate, etc.) between two beads.
Retain in the CG representation as much as possible (1) the symmetry and (2) the molecular volume and shape of the underlying AA structure.
Keep in mind that the default bead sizes for mapping linear molecular fragments are one regular (R) bead for 4 non-hydrogen atoms (4-1), one small (S) bead for three (3-1), and one tiny (T) bead for two (2-1).
Consider that R-beads are the most computationally-performant option; S-beads are useful to mimic the “bulkier” shape of aliphatic rings, while T-beads are particularly suited to represent atom-thick, conjugated rings.
Optimize the number of beads such that there is at most a mismatch of ±1 non-hydrogen atom every 10 non-hydrogen atoms. Some exceptions may be allowed if well-tested (e.g., mapping of thiophene or furan with three T-beads have a mismatch of −1 given that 5 non-hydrogen atoms are mapped onto 3 T-beads which are parameterized to represent 6 non-hydrogen atoms).
For fully branched fragments, use beads of one size smaller than the size determined based on the non-hydrogen atom count. The rationale behind this is that the central atom of a branched group is not exposed to the environment (i.e., it is buried), which reduces its influence on the interactions. For example, a neopentane group contains 5 non-hydrogen atoms but it can be effectively modeled with an R-bead (see also Note 5).
In the present case, as the structure of caffeine is mostly fully conjugated, T-beads are a good choice to ensure accurate packing-related properties, which are important for ligand binding recognition (Souza et al., 2020, 2021a; and Alessandri et al., 2022). Given a non-hydrogen atom count of 14, caffeine can be mapped to 7 T-beads, as shown in Fig. 1.2(a), where functional groups—such as carbonyl groups—have been intentionally kept together.
1.2.3 Coarse-grained mapping of the atomistic reference data
Using the mapping just defined [Fig. 1.2(a)], we now need to transform the AA trajectory to CG resolution. In order to do this, we create a GROMACS (AA-to-CG) index file where every index group stands for a CG bead and contains the corresponding mapped atom numbers.
The required index file can be generated in several ways. One way which may be useful to define and refine a mapping is to use the CGbuilder tool (CGbuilder). CGbuilder allows previewing and refining mappings via an intuitive GUI. The GUI requires the AA structure file (e.g., pdb or gro format) and allows to define the mapping by clicking on the atoms to add/remove them to/from a certain bead. A current limitation of CGbuilder is that atoms cannot contribute with a weight different from one to a certain bead, something which is at times necessary to keep the symmetry of the underlying AA structure in its Martini representation. In such cases, the index file must be subsequently refined by editing the file created by CGbuilder.
An important aspect to take into account when creating the CG mapping file is that in Martini 3 hydrogen atoms are taken into account to determine the relative position of the beads, in what we refer to as center-of-geometry (COG)-based mapping (Souza et al., 2021a; and Alessandri et al., 2022). This aspect constitutes a change with respect to the previous versions of Martini, which instead neglected the hydrogens also when deriving bonded parameters. COG-based mapping leads to Martini 3 models with better molecular (e.g., solvent accessible surface area) and bulk (e.g., mass densities) properties (Souza et al., 2021a; and Alessandri et al., 2022). Hence, COG-based mapping of AA structures taking into account the hydrogen atoms is the standard procedure for deriving bonded parameters in Martini 3. Accordingly, in the case of caffeine, the first two groups of the index file representing the first two beads will look as follows:
[ B0 ]19 7 21 20 6[ B1 ]9 8,
where B0 indicates the first bead and it maps 5 atoms—the N atom and the four atoms of the methyl group, as shown in Fig. 1.2(a); the second bead B1 instead describes only 2 atoms, the C and O of the carbonyl group [Fig. 1.2(a)].
With the index file in place, we can now convert the AA trajectory to CG resolution according to our mapping. First, we make the AA trajectory “whole”:
echo 0 | gmx trjconv -f ../1_AA-reference/3-run.xtc -o AA-traj.whole.xtc -s../1_AA-reference/3-run.tpr -pbc whole
We then create a COG-based topology file of the AA system (see Note 6):
gmx grompp -p system_COG.top -f ../1_AA-reference/run.mdp -c../1_AA-reference/2-eq.gro -o AA-COG.tpr
and then map the trajectory with the GROMACS tool gmx traj:
seq 0 6 | gmx traj -f AA-traj.whole.xtc -s AA-COG.tpr -oxtmapped.xtc -n mapping.ndx -ng 7 -com,
which will produce the CG-mapped trajectory mapped.xtc.
1.2.4 Generation of the initial coarse-grained model
To define the CG model of caffeine, a CG topology file needs to be generated. In GROMACS, such a topology is defined by creating an itp file. This file contains the definition of the CG bead types and properties as well as the bonded parameters that define how the beads are held together. A thorough description of the itp file format can be found in the GROMACS manual (GROMACS 2021.5). Here, we recap the basics and describe the terms necessary to create a CG model for caffeine.
The first entry in the itp file is the [moleculetype], which contains the molecule name and the number of non-bonded exclusions. In Martini, the standard number of exclusions is 1, meaning that non-bonded interactions between particles directly connected are excluded. Hence, in the case of caffeine, the first lines read
[moleculetype]; molname nrexcl . CAFF 1
However, for models of rigid rings, it is common to exclude all the intramolecular non-bonded interactions. In particular, virtual sites need to be excluded in relation to all neighboring real beads. To do that, we explicitly add an [exclusions] section, typically at the end of the itp file. Each line of this section indicates that the bead in the first column does not interact with the beads in the next columns of the same line:
[exclusions] 1 2 3 4 5 6 7 2 3 4 5 6 7 3 4 5 6 7 4 5 6 7 5 6 7 6 7
The next entry after [moleculetype] is [atoms], where all the beads that make up the molecule are defined. For each bead, we define a one-line entry containing bead number, bead type, residue number, residue name, bead name, charge group, charge, and mass. Bead numbers should start from one. In the case of small molecules like caffeine, the residue number and residue name will be the same for all beads. Bead names can be freely chosen. The charge and charge group are not relevant in the case of caffeine given its charge neutrality. The mass and bead type assignments will be described in more detail below.
[atoms];nr type resnr residue atom cgnr charge mass1 TN1 1 CAFF R1 1 0 632 TP1a 1 CAFF O2 2 0 03 TN1 1 CAFF R3 3 0 634 TP1a 1 CAFF O4 4 0 05 TC5 1 CAFF R5 5 0 06 TN5a 1 CAFF N6 6 0 637 TN1 1 CAFF R7 7 0 63
1.2.4.1 Bead types
Similar to atom types in AA force fields, in Martini, we must assign bead types to the different beads that compose a given molecule. The initial assignment can be done based on the underlying chemical building block, following the examples collected in Table 1.1 of Alessandri et al., 2022 [note that an up-to-date version of the table can be found at (https://github.com/ricalessandri/Martini3-small-molecules/blob/main/tutorials/building_block_table.pdf)]. It is important to note that, after the initial assignment, bead type choices can be subsequently fine-tuned based on, for example, comparison to experimental free energy of transfer data. This is the case for the beads TP1a, which represent carbonyl groups (see Sec. 2.8). The default assignment would be TN6a in the case of simple ketones and aldehyde groups. However, as the carbonyl groups in caffeine are conjugated in the aromatic heterocycle, their bead type can be subject to a deviation from the standard assignment as it was considered for nucleobases (Souza et al., 2021a).
Model . | ΔGOCO→W . | ΔGHD→W . | ΔGCLF→W . | |||
---|---|---|---|---|---|---|
Expt. . | CG . | Expt. . | CG . | Expt. . | CG . | |
Take 1: wrong bead assignment TN4a-TN1-TN4a-TN1-TC5-TN6a-TN1 | −0.4 | 5.8 | … | −12 | … | 6.2 |
Take 2: based on the reference table TN6a-TN1-TN6a-TN1-TC5-TN6a-TN1 | −0.4 | 1.2 | … | −17.9 | … | −2.2 |
Take 3: Analogy to nucleobases TP1a-TN1-TP1a-TN1-TC5-TN5a-TN1 | −0.4 | 1.6 | … | −19.1 | … | 1.0 |
Model . | ΔGOCO→W . | ΔGHD→W . | ΔGCLF→W . | |||
---|---|---|---|---|---|---|
Expt. . | CG . | Expt. . | CG . | Expt. . | CG . | |
Take 1: wrong bead assignment TN4a-TN1-TN4a-TN1-TC5-TN6a-TN1 | −0.4 | 5.8 | … | −12 | … | 6.2 |
Take 2: based on the reference table TN6a-TN1-TN6a-TN1-TC5-TN6a-TN1 | −0.4 | 1.2 | … | −17.9 | … | −2.2 |
Take 3: Analogy to nucleobases TP1a-TN1-TP1a-TN1-TC5-TN5a-TN1 | −0.4 | 1.6 | … | −19.1 | … | 1.0 |
1.2.4.2 Masses
When masses are not specified—as is usually the case in Martini—default masses of 72, 54, and 36 a.m.u. are used for R-, S-, and T-beads, respectively. However, when virtual sites are present, such as in the present case, masses should be specified to account for the mass-less virtual sites (see Sec. 2.5). In these cases, a pragmatic choice is to redistribute the mass of the whole molecule between the real beads (which is our option here). An alternative would be to use the real masses of the fragment mapped in the real beads combined with the real masses of the fragment mapped as a virtual site, possibly only redistributing the masses between the real beads used in the construction of the virtual site position.
The [moleculetype] and [atoms] entries are mandatory and make up a basic itp file. With these entries, we can generate a Gromacs binary topology file, CG.tpr, that will be necessary later on. To do this, one needs the basic topology file just defined for the CG model (system_CG.top), a file containing the simulation settings (which can be found in folder 4_initial-CG; let's say we pick martini_v2.x_new_run.mdp, although this is not important at this stage), and a coordinate file. The latter can be extracted from the atomistic trajectory in the following way:
seq 0 6 | gmx traj -f ../3_mapped/AA-traj.whole.xtc -s../3_mapped/AA-COG.tpr -n ../3_mapped/mapping.ndx -oxtmolecule.gro ng 7 -com -e 0
that the CG.tpr can then be generated with the following command:
gmx grompp -f martini_v2.x_new_run.mdp -c molecule.gro -psystem_CG.top -o CG.tpr –maxwarn 1
Further necessary entries in order to define a multiparticle model of a molecule are entries defining bonded terms such as [bonds], [constraints], [dihedrals], etc. We will go through these terms in Sec. 1.2.5.
1.2.5 Generation of bonded terms and target distributions
Next, parameters for the bonded interactions need to be obtained from the AA trajectory mapped to the CG resolution. In this subsection, we first describe how to settle on which bonded parameters we need, and then translate that into index files that are needed in order to extract the reference distributions from the CG-mapped trajectory.
1.2.5.1 The choice of bonded terms for the coarse-grained model
The bonds connecting the beads within the CG model of caffeine are expectedly very stiff, i.e., their distributions are very narrow. At the Martini scale, the fast stretching of these bonds becomes unimportant, and, hence, they can be replaced by constraints, i.e., those bond lengths are held fixed (Marrink et al., 2007; Souza et al., 2021a; and Alessandri et al., 2022). A straightforward but naïve way of constructing the model would be to interconnect all the beads with constraints. However, such a model would easily lead to numerical instabilities, as the constraint solver—we assume here the use of the LINCS (Hess et al., 1997) algorithm in GROMACS—cannot satisfy a growing number of connected constraints with a time step of 20 fs.
A way to circumvent the use of more than five interconnected constraints and, hence, build a numerically stable model is to use a so-called “hinge” construction (Melo et al., 2015; and Alessandri et al., 2022). In such construction, four beads are used to build a hinge, while other beads are constructed as virtual sites, i.e., particles the position of which is defined by their constructing particles. The use of virtual particles not only improves the models” numerical stabilities but also their performance (Alessandri et al., 2022). In the specific case of caffeine, beads number 1, 3, 6, and 7 can be connected via 5 constraints to construct the hinge, as shown in Fig. 1.2(b). This means having one such line in the itp:
[constraints]; i j funct length 1 3 1 XXX
for each constraint to be defined, namely constraints 1-3, 1-6, 3-6, 3-7, and 6-7. Here, XXX is a placeholder for the initial constraint length that will be obtained from the target atomistic distribution in the next step. An improper dihedral potential involving the 4 constructing beads needs to be added to the model to keep the beads in the same plane (e.g., dihedral 1-3-7-6 with an equilibrium value set to zero). The remaining three beads can be defined via virtual sites as follows. First, bead number 5 can be defined with GROMACS” [virtual_sites2], that is, its position can be constructed as a linear combination of the positions of beads 3 and 6:
[virtual_sites2]; sites positioned at 1-a; site from funct a 5 3 6 1 0.635
The factor a specifies the position of bead 5 on the connecting line between beads 3 and 6 (GROMACS 2021.5). The remaining two beads, bead number 2 and 4 [Fig. 1.2(b)], can be defined using the [virtual_sites3] construction as follows:
[virtual_sites3]Site from funct a b 2 5 1 3 1 0.640 0.605 4 5 3 7 1 0.650 0.600
In such construction, the site is positioned as a linear combination of the position of three particles (e.g., in the case of bead 2, these are beads 5, 1, and 3). Note that both [virtual_sites3] constructions use bead 5 as a constructing particle, which in turn is constructed as a virtual site itself. This dependency is possible because bead 5 is a “simpler” virtual site, i.e., it is constructed using a simpler construction than the construction in which it is used as constructing particle. Again, factors a and b specify the exact position of the virtual sites. More details on the virtual sites available in GROMACS can be found in the GROMACS manual (GROMACS 2021.5).
1.2.5.2 Index files and generation of the target distributions
Now that we have settled on the bonded terms, we can create index files for the bonds, angles, and dihedrals for which we want to plot the target distributions. For example, the file bonds.ndx will contain an entry [bondX] for each bond, each containing pairs of CG beads between which a bond (or constraint) exists:
[bond1] 1 3[bond2] 1 6...
and similarly for the dihedrals (with quartets of CG beads); note that, in general, an angles.ndx (with triples) would have to be created for the angles, but in the case of caffeine no angle potentials are needed. The following commands will then generate a bond length distribution for the first bond in the index (the bond connecting beads 1 and 3):
echo 0 | gmx distance -f ../3_mapped/mapped.xtc -n bonds.ndx -s ../4_initial-CG/CG.tpr -oall bond_0.xvg -xvg none
gmx analyze -f bond_0.xvg -dist distr_bond_0.xvg -xvg none -bw 0.001
similarly, for the first (and only) dihedral, we can issue the following commands:
echo 0 | gmx angle -type dihedral -f../3_mapped/mapped.xtc-n dihedrals.ndx -ov dih_0.xvg
gmx analyze -f dih_0.xvg -dist distr_dih_0.xvg -xvg none -bw1.0
1.2.6 Optimization of the coarse-grained parameters
We can now finalize the CG model by running a CG MD simulation of the caffeine CG model solvated in water after having energy-minimized, and equilibrated the system. The starting values for the bonded parameters can be obtained from the mean values of the distributions in the previous step. Further refinement of the bonded parameters of the CG model will be done by extracting bonded distributions from the CG simulations—using commands similar to those given in Sec. 2.5.2—and comparing them to the CG-mapped AA target distributions obtained in Sec. 1.2.5.
Representative distributions of well-fitted bonded parameters are shown in Fig. 1.2(c), where Martini distributions are shown in red and CG-mapped OPLS-AA in blue. It should be mentioned that to fine-tune the parameters for the virtual site constructions, one usually needs to compare distances between beads which are not explicitly connected via bonded potentials in the CG model. This is the case, for example, of the distributions 1-2 and 5-6 of [Fig. 1.2(c)].
1.2.7 Bartender: automated generation of coarse-grained bonded parameters
Bartender is a program for the automated determination of bonded parameters for Martini 3 (Bartender). As such, its use can replace the steps in Secs. 1.2.1, 1.2.3, and 1.2.6. Bartender takes advantage of the novel “GFN” semiempirical quantum-chemical methods developed by the Grimme group in recent years (Bannwarth et al., 2021). It employs one of these methods to perform a Born–Oppenheimer quantum mechanical MD (QM-MD) simulation. Subsequently, the bonded parameters are fitted to the distance, angle, and dihedral distributions from the QM-MD trajectory. The GFN methods are efficient enough to run dynamics for several nanoseconds, allowing a good sampling of the conformational space for most rigid small molecules like caffeine. For more challenging cases, Bartender includes replica exchange MD capabilities. In addition, Bartender can read a supplied trajectory in dcd or xtc format and omit the QM-MD step.
1.2.7.1 Input files
Bartender requires two input files. The first one is a structure file in pdb, gro, or xyz format with the atomistic representation of the molecule. The second input file, which is specific to Bartender, contains a description of the desired atom-to-bead mapping, and of the bonded parameters that are to be obtained. The file is divided into several sections separated by headers. This input file can be created using the python script write_bartender_inp.py included in the Bartender GitHub repository (https://github.com/rmera/bartender). A sample input for the caffeine molecule is given below:
BEADS1 19,6,21,7202 8,93 23,24,11,22,104 12,135 5146 18,3,47 15,1,17,16,2BONDS1,31,63,63,76,7IMPROPERS1,3,6,7
The first section starts with the header BEADS. It contains a list of the CG beads in the Martini representation of the molecule, followed by a space, and a list of the indexes for the atoms that the bead represents, separated by a comma. A simple line for the section would be:
2 8,9
which indicates that bead 2 represents atoms 8 and 9. It is possible that an atom needs to be split between beads. That can happen for various reasons, such as to preserve the symmetry of the underlying atomistic structure. Bartender allows to define a bead containing only a fraction of an atom. For instance, the following lines:
2 8/2,93 8/2,23,24,11,22,10
divide atom 8 equally between beads 2 and 3. The other sections of the inputs are all optional and define the particular bonded terms that are to be obtained by Bartender. Their headings are BONDS, ANGLES, DIHEDRALS, and IMPROPERS. Each term is described in its own line by simply specifying the relevant beads. For caffeine, no angles and dihedrals are necessary, only bonds and improper dihedrals. Virtual sites are still not implemented in Bartender, but they will be in the near future.
1.2.7.2 Running Bartender
When called, Bartender will perform a QM-MD for the given molecule and fit the requested bonded terms to their respective probability distributions in the obtained trajectory, using the Boltzmann distribution to convert frequencies into potential energies.
Bartender is a command-line program that can be executed as follows:
bartender [flags] geometry.xyz/pdb/gro bartender_input.inp
Omitting any flags would execute Bartender's default behavior, which is designed to nicely perform its purpose in most cases. However, the settings of Bartender can be modified using the available flags. The list of all available flags is detailed by calling Bartender as follows:
bartender -help
Some of the common flags are
-charge int | Sets the total charge of the molecule (default: 0) |
-method string | Sets the specific GFN method to be used (default: GFNFF) |
-dcdSave file.dcd | Saves the QM-MD trajectory in DCD format. |
-owntraj file.dcd/xtc | Uses an MD trajectory instead of performing a QM-MD simulation. |
-time int | The total QM-MD simulation time in ps (default: 1000) |
-solvent string | Sets the implicit solvent to be used (default: h2o, for water) |
-cpus int | The number of CPUs used for the QM-MD simulation. |
-charge int | Sets the total charge of the molecule (default: 0) |
-method string | Sets the specific GFN method to be used (default: GFNFF) |
-dcdSave file.dcd | Saves the QM-MD trajectory in DCD format. |
-owntraj file.dcd/xtc | Uses an MD trajectory instead of performing a QM-MD simulation. |
-time int | The total QM-MD simulation time in ps (default: 1000) |
-solvent string | Sets the implicit solvent to be used (default: h2o, for water) |
-cpus int | The number of CPUs used for the QM-MD simulation. |
For our example, we need to generate a xyz file from the LigParGen pdb file, as LigParGen produces files that do not follow the pdb standard and cannot be read by goChem. The conversion can be done either by hand with custom scripts or by using programs such as openbabel or VMD. Bartender is then excecuted as follows:
bartender CAFF.xyz caff_bartender.inp
1.2.7.3 Bartender output
The Bartender run produces several output files:
A GROMACS topology file, gmx out.itp, with the parameterization of the requested bonded terms.
A series of png files, each of which contains the plot of a fitted function for a bonded term, superimposed with the potential obtained from the Boltzmann inversion of distribution for that term in the MD.
A pdb file of the atomistic system, where the residue identifier for each atom contains the ID of the corresponding bead. The b-factor column of the pdb file is written in such a way as to group the atoms in a given bead by color when colored by b-factor, which is supported by viewers like PyMOL (Schrödinger, 2015) and VMD (Humphrey et al., 1996) [see Fig. 1.3(a)].
The gmx_out.itp file is the most important one. For each requested bonded term, a corresponding line is written containing the parameters from the QM-MD fit. Some of the bonded terms can be parameterized with several different functions in GROMACS. Of these, Bartender supports different functions for angle (bending) and dihedral terms. Parameters are determined by fitting all available functions for each term. All fits are then printed to the gmx_out.itp file. For each term, all but one of the available fits are commented out so as to be ignored by GROMACS. To choose the function to be used for each term, Bartender prints the RMSD for each fit to the gmx_out.itp file. In addition, for each fit, a PNG file is produced, which allows comparing the fitted function to the actual distribution from the QM-MD trajectory. In this study, only bonds and one improper dihedral are needed, and Bartender only supports harmonic potentials for them. Of course, it is possible to express the improper dihedral term as a regular dihedral, which would allow other functions to be tested.
In Figs. 1.3(b)–1.3(e), plots for some of the bonded terms are shown. Note that, for the three bonds shown, the fitting is generally good, though anharmonicities at the extreme points result in slight deviations of the fits. For the improper dihedral term, the fitting is also good except for an outlier close to 180 degrees. Because of the form of the potential, Bartender forces the equilibrium angle to 180 degrees, which appears to be a good approximation in this case. To account for deviations from planarity in the QM-MD simulation, a regular dihedral function could be employed alternatively. To finalize the caffeine model generated by Bartender, it is now necessary to add the constructions of the three virtual sites (see Sec. 1.2.5).
1.2.8 Validation and refinement of the coarse-grained model
We now describe how the model can be validated and refined and discuss some final considerations about creating models for small molecules with Martini 3.
1.2.8.1 Free energies of transfer
Experimental free energies of transfer are particularly important reference data for validating Martini CG models. In the case of caffeine, an experimental logP value is available for comparison (Hansch et al., 1995). Table 1.1 reports a comparison between this value and the computed free energy of transfer of three different models.
The first take represents the results obtained with a wrong bead assignment for the carbonyl groups. In take 2, we follow the reference table in (Alessandri et al., 2022) for the bead assignments and, hence, represent the carbonyl groups with TN6a beads. This modification is enough to get an improved prediction of the octanol-water free energy of transfer. In take 3, the model is changed considering its analogy with other nucleobases. The model still gets a reasonable octanol-water free energy of transfer, with some improvement of the strength and selectivity of hydrogen bonds with other nucleobases (Souza et al., 2021a). Both take 2 and 3 show good agreement between the Martini model and experiments, with an error ≤2 kJ mol−1. We note that Martini free energy values within kBT (2.479 kJ mol−1) are considered to be in very good agreement.
1.2.8.2 Molecular volume and shape
Looking into how well the Martini CG model represents the underlying atomistic molecular size and shape is a way to further refine the model. It is also good practice to validate the size and shape of the CG model whenever molecular packing is critical for certain applications, which is the case for the ligand-protein interactions described in this chapter. In fact, the use of COG-based bonded parameters is oriented to high-throughput applications where this procedure could be automated. However, such parameters may need refinements in some cases. In case the packing and/or mass density of a certain model is off, it is worthwhile to compare the molecular volume and shape of the CG model to the ones of the corresponding AA structure.
To this end, the GROMACS tool gmx sasa can be used to compute the solvent accessible surface area (SASA) and the Connolly surface for both the AA and CG models. In order to use this tool for the CG models, an appropriate vdwradii.dat, a file that contains the radius of atoms or beads and that is used by gmx sasa, needs to be provided. According to the naming provided in Sec. 1.2.4, the vdwradii.dat for the CG model will look like the following:
??? R1 0.191??? O2 0.191??? R3 0.191??? O4 0.191??? R5 0.191??? N6 0.191??? R7 0.191
Note that the file contains the atom names of the beads (second column), which in the case of caffeine, should be all set to the T-bead radius, 0.191 nm. R- and S-beads have a radius of 0.264 and 0.230 nm, respectively. Here, ??? in the first column indicate that any residue or molecule name could match with the atom names defined here. The user could use CAFF in case there are more molecules or residues to be defined in the vdwradii.dat file. This file should be copied to the location where the gmx sasa analysis is executed. We also recommend using more recent van der Waals radii (Rowland and Taylor, 1996) for the atomistic reference SASA calculations instead of the GROMACS default values given in the standard vdwradii.dat file.
Now, one can compute the AA SASA with the following command:
echo CAFF | gmx sasa -f ../3_mapped/AA-traj.whole.xtc -s../3_mapped/AA-COG.tpr -ndots 4800 -probe 0.191 -o SASA-AA.xvg
and similarly for the CG SASA. Note that the probe size is the size of a T-bead (the size does not affect the results particularly, as long as one compares SASA values obtained with the same probe size, naturally). The -ndots 4800 guarantees a high accuracy of the computed SASA value. For caffeine, we obtain an AA SASA value of 4.37 ± 0.01 nm2 and a CG SASA value of 4.34 ± 0.01 nm2, which are in excellent agreement (discrepancy of <1%).
One can further compare how the shape of the CG model compares to that of the AA model by calculating the Connolly surfaces of the models. This can also be done with the gmx sasa tool as follows (AA case):
echo CAFF | gmx sasa -f AA.gro -s ../3_mapped/AA-COG.tpr -otemp.xvg -probe 0.191 -ndots 240 -q surf-AA.pdb
Note that in this case, a lower accuracy (-ndots 240) is used to compute the Connolly surfaces (file surf-AA.pdb) in order to avoid excessively large surface files that can make visualization unnecessarily slow. The AA and CG surfaces can be visualized using VMD (Humphrey et al., 1996) with the following command:
vmd -m AA.gro surf-AA.pdb surf-CG.pdb
Snapshots are displayed in Fig. 1.2(d), which show that the CG model faithfully reproduces the molecular shape and volume of the underlying AA representation, in line with the obtained SASA values. Given the small differences in bond lengths obtained using Bartender, it is also expected that SASA would be in good agreement in this case.
1.3 Building a Protein Model in Martini 3
After successfully generating a CG model for the ligand caffeine, in this section, we will go through the steps required to generate a Martini 3 protein model. In Martini 3, the protein models contain two layers: the first layer contains the regular bonded terms of the protein backbone as well as the amino acid side chains and the second layer consists of the structure-preserving model typically required for proteins in a CG representation to maintain their higher-order structure (Kmiecik et al., 2016). It is important to note that in Martini 3, the two layers are designed in such a way that they are independent of each other. This is in contrast to Martini 2, where different backbone mappings were applied for the regular Martini 2 elastic network (EN) approach (Monticelli et al., 2008), the ElNeDyn (elastic network dynamics) approach (Periole et al., 2009), and the GōMartini approach (Poma et al., 2017). In this section, we will describe two options for the second layer, namely an EN model and a Gō-like model.
1.3.1 Preparation of the atomistic protein structure
We use an experimental X-ray crystal structure as a starting point to construct the Martini 3 protein model of the adenosine A2A receptor. First, download the structure with the pdb code 3RFM (Doré et al., 2011) from the protein data bank (PDB).
To generate the CG protein model and structure, we will use the Python program martinize2 (Martinize2 and vermouth). Its development is done on GitHub, and it is led by the Molecular Dynamics group at the University of Groningen. To prepare the protein structure, we first have to remove the water and additional molecules present. Often, detergents, ligands, and molecules added to facilitate the crystallization are present in experimental structures. Use the following command to clean the structure:
grep "∧ATOM" 3rfm.pdb > 3rfm_clean.pdb
The protein structure contains 8 mutations (A54L, T88A, R107A, K122A, L202A, L235A, V239A, S277A) compared to the wild type (Doré et al., 2011), which have to be reverted before starting the simulations. In addition, the extracellular loop 2 (ECL2) is missing in the structure 3RFM. One option to fix the mutations and the missing loop is to use the program Modeller (57). Alternatively, we could rely on another experimental X-ray crystal structure, which contains the correct wild-type sequence and ECL2 loop—the pdb structure 2YDO (Lebon et al., 2011). Currently, an alternative solution for fixing mutations and missing loops is simply to predict the A2A receptor structure based on the correct sequence using Alphafold (Jumper et al., 2021).
As a next step, we check the protonation state of the acidic and basic amino acids at pH 7.4 using the H++ server (http://newbiophysics.cs.vt.edu/H++/) (Anandakrishnan et al., 2012). We follow the predictions of the protonation state for all amino acids except Asp52. Because it is quite deeply buried in the protein and, moreover, experiments showed that there is a Na+ ion in close proximity, we keep Asp52 neutral to compensate for the nearby counter ion which neutralizes the charge.
1.3.2 Generating the Martini 3 protein model
With the complete atomistic protein structure at hand, we can now generate the Martini 3 protein model. The program martinize2 (Martinize2 and vermouth) converts the atomistic structure to a CG one, generates a protein model, and an overall system topology. This information is stored in the file format needed for the simulations with the program package GROMACS.
1.3.2.1 The elastic network model
The first option described here to maintain the higher-order protein structure is the EN model. It can be generated with martinize2 using the option -elastic. In the EN model, harmonic bonds are added between backbone (BB) beads, which are in a certain distance range in the reference structure and have a sequence distance of ≥3. BB beads with a shorter sequence distance are already interacting via bonds or angles. Also, EN bonds are added only between amino acids that belong to the same chain. The distance range for the EN can be specified with a lower (-el) and upper distance cutoff (-eu). They are usually chosen between 0.3–0.5 nm and 0.8–1.0 nm, respectively (Periole et al., 2009). Typically, a fixed force constant (-ef) of 500–1000 kJ mol−1 nm−2 is used for the harmonic bonds, but the force constant can also be chosen to be distance-dependent. To optimize the EN, the force constant as well as the distance cutoffs can be adapted in order to improve the comparison with atomistic simulations or experimental data (Periole et al., 2009).
Other options for the protein model which we need to specify are -scfix and –cys auto. The -scfix option adds side chain corrections based on the atomistic reference structure, which improves the side chain orientation. The option –cys auto enables martinize2 to automatically detect and add disulfide bridges. In addition, the path to the program dssp (https://github.com/cmbi/dssp) (Kabsch and Sander, 1983; and Touw et al., 2015) is required, which is used to determine the secondary structure of the protein.
martinize2 -f atomistic_complete.pdb -o A2AR_only.top -xA2AR_cg.pdb -dssp /path/to/dssp -p backbone -ff martini3001-elastic -ef 500.0 -el 0.5 -eu 0.8 -scfix -cy auto
In the case of the structure 3RFM, the mutations could be reversed with the additional flag –mutate of martinize2. For instance, the mutation A54L could be reverted with the flag –mutate A-LEU54:ALA. We get three new files from martinize2, namely A2AR_cg.pdb containing the CG structure, molecule_0.itp containing the protein model, and a rudimentary master topology file A2AR_only.top. The latter has to be modified before starting a simulation of the A2A receptor embedded in a lipid bilayer; details are given in Sec. 1.1.4. As changes in the protonation states are still not part of martinize2, you may need to manually change the protonation of residues in the molecule_0.itp file. An example would be to change the protonation state of Asp52. For the protein model, this requires adjustments of bead 118 in the following line
118 SQ5n 52 ASP SC1 118 -1.0
by changing the bead type from SQ5n to SP2 and the charge from −1.0 to 0.0.
118 SP2 52 ASP SC1 118 0.0 .
1.3.2.2 The Gō-like model
An alternative for the second layer of the Martini 3 protein model is to use a Gō-like model to maintain the higher-order protein structure. The combination of a Gō-like model with Martini—termed GōMartini—was introduced by Poma et al. (2017). Here, BB beads are connected on the basis of a contact map generated from an atomistic reference structure. Lennard–Jones potentials are used to describe the interactions between the connected BB beads. In an updated GōMartini version, virtual beads are placed at the BB bead position. They are connected by the Gō-like bonds implemented as regular non-bonded interactions, enabling them to benefit from the non-bonded cutoff scheme. The regular non-bonded interactions between the connected BB beads are excluded.
To generate a GōMartini model for your protein, you have to first use martinize2. The options –govs-include and –govs-moltype are required to generate an itp file as well as a CG structure.
martinize2 -f atomistic_complete.pdb -o A2AR_only.top -xA2AR_cg.pdb -dssp /path/to/dssp -p backbone -ff martini3001-govs-include -govs-moltype a2ar -scfix -cys auto
As described in the model with the elastic network, do not forget to revert possible mutations and define protonation states in your model. The next step is to obtain the contact map from the web-server http://info.ifpan.edu.pl/~rcsu/rcsu/index.html using the default setting for the radii and Fibonacci number (Wołek et al., 2015). As an alternative, the user can also use Gō Contact Map web-server (http://pomalab.ippt.pan.pl/GoContactMap/). Please note that the requirements for the pdb file uploaded to the web-server are quite strict. Therefore, please check carefully if your contact map is meaningful before using the contact map in the next step. In particular, the table listing the “Residue residue contacts” will be used.
For the next step, the script create_goVirt.py is required, which can be downloaded from http://cgmartini.nl. It requires the CG structure of the protein in the pdb format (-s), the number of CG beads in the protein excluding the virtual Gō beads (--Natoms), the number of missing residues in the atomistic structure (--missres), as well as the contact map file ( -f). The prefix of the generated files can be specified (--moltype) to distinguish different protein chains which are not connected by Gō-like bonds. Note that if the Gō-like model will be combined with a CG protein generated by martinize2, the same name must be specified as for the -govs-moltype flag of martinize2. In addition, the dissociation energy of the Lennard–Jones potentials (--go_eps) as well as a lower (--cutoff_short) and upper cutoff (--cutoff_long) for the distance between two connected backbone beads have to be specified. The latter are similar to the EN model.
./create_goVirt.py -s A2AR_cg.pdb -f 3RFM_complete.map --moltypea2ar --go_eps 12.0 --cutoff_short 0.3 --cutoff_long 1.1 --Natoms717
With the above command, create_goVirt.py produces four files beginning with a2ar_, which contain details of the GōMartini model:
a2ar_BB-part-def_VirtGoSites.itp | bead definitions of the virtual particles |
a2ar_go-table_VirtGoSites.itp | interaction table of the Gō-like bonds |
a2ar_exclusions_VirtGoSites.itp | exclusions |
a2ar_go4view_harm.itp | representing the Gō-like bonds for visualization |
a2ar_BB-part-def_VirtGoSites.itp | bead definitions of the virtual particles |
a2ar_go-table_VirtGoSites.itp | interaction table of the Gō-like bonds |
a2ar_exclusions_VirtGoSites.itp | exclusions |
a2ar_go4view_harm.itp | representing the Gō-like bonds for visualization |
It is important not to change the names of the files because they are included in several other files requiring the correct file names. Finally, two #include statements have to be added to the martini_v3.0.0.itp file. Note that the following two commands should be executed only once.
sed -i "s/\[ nonbond_params \]/\#ifdef GO_VIRT\n\#include \"BB-part-def_VirtGoSites.itp\"\n\#endif\n\-n\ [ nonbond_params \]/"martini_v3.0.0.itp
echo -e "\n#ifdef GO_VIRT \n#include \"go-table_VirtGoSites.itp\"\n#endif" >> martini_v3.0.0.itp
To activate the two #include statements, you need to define the variable GO_VIRT at the beginning of the master topology:
#define GO_VIRT
Make sure that it is included if you use the GōMartini model. If you use the Martini 3 force field without the GōMartini model in the future, you simply do not define GO_VIRT and the #include statements will be skipped.
1.3.2.3 Fine-tuning the binding pocket of the Martini 3 model
The core as well as the binding pocket of the A2A receptor are quite hydrated (Liu et al., 2012). To achieve a realistic behavior of the protein model, the hydration of the binding pocket as well as its hydrogen bonding network has to be taken into account during the model preparation.
A regular water bead W representing four water molecules is too large to be used in the binding pocket. Tiny water beads TW, which represent two water molecules, are much better suited for this task. To mimic the hydration of the binding pocket, place a TW bead in the binding pocket. Add the additional bead at the end of the coordinate file A2AR_cg.pdb. Use the coordinates of the SC1 bead of Asn181 as template and modify them slightly for instance by adding 0.1 nm. In the molecule_0.itp file, add a line at the end of the [atoms] section reading
718 TW 306 TW TW 718 0.0
in the case of the EN model. For the GōMartini model, the atom number has to be 1023 instead of 718. To prevent the TW bead from diffusing out of the pocket, add two harmonic bonds to the side chains of Asn181 and His250 with a force constant of 500 kJ/(mol · nm2) and an equilibrium distance of 0.28 nm. Since the water bead is not covalently bound to the two amino acids, specify bond type 6, which does not exclude the non-bonded interactions. The additional lines in the [bonds] section of the molecule_0.itp file with the EN model read
410 718 6 0.28 500.0571 718 6 0.28 500.0
To further stabilize the binding pocket, we add two harmonic bonds mimicking the water-mediated hydrogen bonds Thr88-Trp246 and Asn181-His250 (Liu et al., 2012) again using bond type 6:
197 562 6 0.39 500.0410 571 6 0.40 500.0
1.4 Protein–Ligand Binding with Conventional Simulations
Since we now have both the coordinates and topology of the Martini 3 protein, we can put the protein in a simulation cell and add lipids, water, and ions. We will use for this task insane (Wassenaar et al., 2015)
python2 insane.py -f A2AR_cg.pdb -o box.gro -p CG.top -pbc cubic-box 12,12,12 -u POPC -l POPC -salt 0.15 -sol W -d 0
Be aware that the current implementation of insane is still based on Python 2. The output file (CG.gro) contains NA+, CL-, and W residues, while CG.top is a prototype of the master topology file. You can remove the charge signs of NA and CL in the CG.top file. You should also replace “Protein” with molecule_0. You may get a list of molecules which is similar to
[ molecules ] | |
; name | number |
molecule_0 | 1 |
POPC | 205 |
POPC | 206 |
W | 8800 |
NA | 90 |
CL | 101 |
[ molecules ] | |
; name | number |
molecule_0 | 1 |
POPC | 205 |
POPC | 206 |
W | 8800 |
NA | 90 |
CL | 101 |
Don't forget to also include the Martini 3 force field files at the beginning of the CG.top.
#include "martini_v3.0.0.itp"#include "molecule_0.itp"#include "martini_v3.0.0_phospholipids_v1.itp"#include "martini_v3.0.0_solvents_v1.itp"#include "martini_v3.0.0_ions_v1.itp"
The first itp file contains information about non-bonded interactions between all types of Martini 3 beads. The other files are more specific and define the protein (molecule_0.itp, generated in Sec. 1.1.3) as well as standard Martini 3 molecules such as lipids, solvents, and ions. The last step is to add caffeine molecules to your simulation box. This can be done with the GROMACS tool gmx insert-molecules
gmx insert-molecules -f box.gro -nmol 10 -cicaffeine_CG.gro -o box_ready.gro
We added 10 molecules of caffeine via the flag –nmol. This corresponds to a concentration of 16·10−3 M. A representation of the simulation box is shown in Fig. 1.4.
At this point, almost all the steps for building your protein–ligand binding system are complete. You only need to add the caffeine itp file and the number of caffeine molecules in the section [molecules] in the topology file. We are now ready to perform the MD simulations and start studying the binding of caffeine to the A2A receptor. We suggest a minimum of 200 µs of sampling to be able to observe multiple binding events. This could efficiently be performed with 10 replicas of 20 µs. Do you see any binding events taking place? Does caffeine bind directly to the A2A receptor or does it first go to the membrane? In case you did not observe binding events, you may consider increasing the total simulation time or using more simulation replicas. Take a look at our published study for more details and to get more ideas for analysis you could perform on your trajectory (Souza et al., 2020).
1.5 Outlook
Considering the constant efforts in improving the Martini force field and the new possible applications involving protein–ligand binding and virtual drug design (Souza et al., 2020, 2021b), it seems critical to develop further the model and its associated tools in certain key directions, which are summarized below:
Ligand database and automated parametrization: One of the main limitations of the use of Martini in protein–ligand binding studies is the generation of accurate models for the ligands. Although there are well-defined protocols to generate these models (as described in this chapter), only automatic protocols will allow virtual screening of hundreds to millions of ligands, something that is already common in many in silico drug-design campaigns based on other models. The first step towards a fully automatic topology builder (ATB) in Martini 3 is the creation of a curated dataset of high-quality Martini 3 models. This effort is essential for the full calibration and benchmarks of any automatic protocol strategy. An initial set of this database was recently published (Alessandri et al., 2022), while MAD (Martini Database—https://mad.ibcp.fr/explore) (Hilpert et al., 2022) will become the standard platform to collect new models in the future. Although some ATBs have already been developed, more work is necessary, in particular to adapt and integrate these codes for Martini 3, which has an expanded set of beads covering a larger chemical space than the previous versions. Some of the already published ATB tools focus on generating bonded potentials from atomistic trajectories, such as Swarm-CG (Empereur-Mot et al., 2020) and PyCGtool (Graham et al., 2017). Bartender also fits in this category, but using a QM-MD strategy, which can be potentially advantageous for the parameterization of ligands not typically included (or accurate) in the chemical space of typical AA force fields. Other codes, like graph-based coarse-graining (Webb et al., 2019), cg_param (Potter et al., 2021), and autoMartini (Bereau and Kremer, 2015) addressed the challenge of bead assignment and mapping of a chemical structure to its (Martini 2) CG representation. Future developments should target a fully integrated code converting a SMILES string into an accurate Martini 3 model in terms of mapping, bead assignment, and bonded parameters. Additionally, similar protocols for automatic parameterization of titratable Martini models (Grünewald et al., 2020) should help with the challenge of capturing protonation changes in ligand binding to proteins.
Protein conformational flexibility: Although the representation of protein flexibility possibly improved with the combination of Martini 3 and Gō-like models (Poma et al., 2017; and Souza et al., 2019), there are still some important limitations blocking its full potential, in particular for certain problems involving protein-small molecule binding. While developing a Martini model capable of folding proteins from their sequence could be considered too ambitious, problems such as the discovery of cryptic pockets may be within the reach of the model. For instance, the current approach to generate contact maps of GōMartini is based on a unique reference structure. New developments using atomistic trajectories could result in higher quality contact maps and refinements in the depth of the Gō LJ potentials. Side chain dihedrals are currently added based on a reference folded structure [via side chain fixes (Herzog et al., 2016)], but new improvements could also make them more generic and useful to describe conformational transitions. Additionally, a more realistic description of the backbone, such as an explicit representation of peptide-group dipoles, would already be a significant improvement. For instance, modeling the realistic transition between secondary structure elements would be extremely beneficial. Such an idea has been used in other coarse-grained models for proteins (e.g., Sieradzan et al., 2022), and it has already been proposed to be incorporated in Martini (Marrink and Tieleman, 2013). Overall, we foresee many possible directions to improve Martini 3 protein models in the near future.
Generic backmapping codes: Even with improvements in the generation of accurate protein and ligand models, CG approaches such as Martini have some inherent limitations (see Note 2). Possibly any virtual ligand-screening strategy using Martini will need a refinement step based on atomistic models (Souza et al., 2021a), which fully depends on accurate protocols to reverse transform CG configurations into fine-grained structures at united-atom and all-atom resolutions. Different options of backmapping already exist, such as fragment-based [e.g., CG2AT (Stansfeld and Sansom, 2011; and Vickery and Stansfeld, 2021)], machine learning-based [e.g., GLIMPS (Louison et al., 2021)], and geometric approaches [e.g., backward (Wassenaar et al., 2014)]. Any of these strategies are currently limited to only a set of molecules, with the main challenge still not solved: a general tool to automatically transform any molecule between atomistic and CG resolutions, generating both coordinate and topology files. Such a development would be appreciated not only in protein–ligand binding studies but also for the overall modeling community using Martini models.
1.6 Notes
Computational performance of Martini: The low cost of MD simulations with Martini comes from a combination of factors. First of all, given the resolution of the beads, it is expected that a simulation box with Martini will have 4 to 10 times fewer particles than all-atom systems. Given the higher masses of the beads in combination with the use of constraints and virtual particles (typically employed in systems such as aromatic rings), Martini can run simulations with a time step of 20 fs. This time step gives a speed-up of 1 order of magnitude in relation to atomistic models, which instead typically require a time step of 2 fs. Additional performance gains come from the use of shifted Coulomb potentials, computationally cheaper than PME. A final important aspect to be considered is the effective timescale in CG simulations. The CG dynamics are faster than the AA dynamics because the CG potential landscape is much smoother as compared to an atomistic one. Estimates with neutral small molecules in Martini 3 indicate that diffusion coefficients are 2–3 times higher than experimental references (Souza et al., 2020). These values can be larger in highly charged systems such as ionic liquids (Vazquez-Salazar et al., 2020), deep eutectic solvents (Vainikka et al., 2021), and PIP-containing membrane (Borges-Araújo et al., 2022), given that the electrostatic interactions are underestimated in the non-polarizable Martini model. The combination of all of these factors indicates that Martini can reach speed-ups of 2–3 orders of magnitude as compared to all-atom simulations. Applications involving virtual screening could potentially get extra speed-up with coarse-graining in the chemical space, as recently demonstrated by Kanekal and Bereau (2019).
General limitations of coarse-grained models: The process of coarse-graining creates a number of limitations for these models, which are also the case for Martini 3 (Marrink and Tieleman, 2013; and Jarin et al., 2021). An important example is the entropy loss upon coarse-graining, which is necessarily compensated for by a reduced enthalpy term (Baron et al., 2007). As a consequence, the enthalpy/entropy balance of many processes can be biased when modeled at the CG level, which can also affect its temperature dependence. Another limitation is the lack of directionality of certain intermolecular interactions, such as hydrogen bonding or T-shaped interactions between aromatic compounds (Alessandri et al., 2022). This issue may be one of the reasons hindering the development of a foldable protein backbone model in Martini. An additional problem of CG models is the non-trivial interpretation of the timescale, which is caused by the non-homogeneously smoothened potential energy surface on which the CG dynamics take place (Marrink and Tieleman, 2013; and Jarin et al., 2021). Although for certain systems, the speed-up of Martini models in relation to experimental diffusion rates was quantified (Souza et al., 2020; Vazquez-Salazar et al., 2020; and Borges-Araújo et al., 2022), the interpretation of timescales should always be taken with caution. The Martini 3 publication presents additional information in this matter for the reader, in particular in the discussion section and supplementary information (Souza et al., 2021a).
The stickiness problem in Martini 2: Despite the success of the Martini force field, in the recent years, it became clear that the Martini CG approach had some inherent problems affecting the model. Several studies reported an apparent overstabilization of biomolecular interactions. In particular, high aggregation propensity was reported for soluble (Stark et al., 2013) and membrane proteins (Javanainen et al., 2017; and Majumder and Straub, 2021), as well as carbohydrates (Gu et al., 2017; Schmalhorst et al., 2017; and Shivgan et al., 2020). Known as the stickiness problem, many ad hoc solutions were proposed, including scaling of solute-solute or solute-solvent interactions. The usage of S-beads for the modeling of carbohydrate rings was also a strategy used in the re-parameterization of gangliosides (Gu et al., 2017). The underlying cause of the stickiness problem was finally elucidated in the work of Alessandri et al. (2019), showing that the problem was related to an imbalance between the small and regular bead types, in combination with the use of weak bonded force constants or bonds between beads that were too short in the description of biomolecules (Alessandri et al., 2019). New rules for the use of different bead sizes and for the calculation of bond lengths together with the full re-parameterization of the model were part of the main improvements implemented in Martini 3 (Souza et al., 2021a) to tackle the stickiness problem. Despite the fact that Martini 3 seems to show an overall improvement for the interactions between folded proteins and peptides (Lamprakis et al., 2021; and Souza et al., 2021a) and carbohydrates (Grünewald et al., 2022), some problems may persist in the case of intrinsically disordered proteins (Thomasen et al., 2022).
Labels in Martini 3: Specific subtypes, called “labels,” are an alternative and complementary solution to fine-tune CG models in Martini 3. The concept already existed in Martini 2, but it only considered the different hydrogen bonding capabilities of N and Q beads, keeping these beads with a similar degree of hydrophobicity. In Martini 3, there are many more label options that can be applied to all beads, slightly changing their behavior, which is translated into small changes in the interaction matrix. Hydrogen bonding, electron polarizability, positive/negative charge labels are chemically specific, which means that they are only applied to certain chemical types. Self-interaction and partial charge labels are more generic and can be applied on top of all beads of the organic block, including beads that have already been modified with some of the chemically specific labels. Labels can be used to promote selectivity in the interactions which are enhanced for smaller beads. For instance, hydrogen donor (“d”) and acceptor (“a”) increase by one interaction level the interaction of SP4d–SP4a in relation to SP4–SP4. On the other hand, TP4d–TP4a increased the TP4–TP4 interaction by three interaction levels. More details about labels are described in Sec. A4 of the supporting information of the Martini 3 publication (Souza et al., 2021a).
Mapping of branched fragments: Fully branched fragments (such as fragments with quaternary carbon atoms or tertiary amine groups) should usually use beads of smaller size as the central atom is not exposed to the environment which reduces its influence on the interactions. For instance, a neopentane group contains 5 non-hydrogen atoms but as it is fully branched can be safely modeled as an R-bead. The choice of the bead size for partially branched fragments of molecules is not straightforward; it is desirable to validate the choice by comparison to experimental data and/or in relation to other Martini 3 CG models. The main danger of using the “fully branching rule” for partially branched fragments is to reduce the self-interactions too much. Such a reduction can be especially problematic in polymers where a small deviation on the monomer scale may add up to a big deviation at the polymer scale.
COG-mapping of an atomistic trajectory: If one wants to use the GROMACS tool gmx traj to map the AA trajectory to CG resolution using the COG strategy [alternatives include, for example, using Python and the MDAnalysis library (Gowers et al., 2016)], a little workaround is needed. The gmx traj tool only allows center-of-mass (COM) mapping. Hence, we first need to create an AA GROMACS tpr file with all the atoms of the atomistic structure having the same mass. This is done by modifying the original topology file obtained from the LigParGen server (CAFF_LigParGen.itp) by changing the values under the eighth column of the [atoms] section to an equal value (e.g., 1.0). Let us call the new file so created CAFF_LigParGen_COG.itp. It is this file that can be used to create the GROMACS tpr file for the subsequent gmx traj command, where the -com flag of the gmx traj command will effectively act as a -cog flag.
Exceptions in bead assignment: In our experience, mapping of some functional groups, especially when done at higher resolutions (e.g., in aromatic rings), can vary based on the proximity of other functional groups. The rule of thumb is that such perturbations may require a ±1 shift in the degree of polarity of the bead in question with respect to the “standard” choice listed in the reference “building block table” (Martini3).
Mapping based on experience: Once again, when creating a Martini 3 model for a new small molecule, it is useful to consult already-developed models in order to develop models in a rapid and more consistent way. In particular for small molecules, many examples are available on the Martini 3 small molecule GitHub repo (Alessandri et al., 2022).
Additional validation targets: In general, besides free energies of transfer, other validation targets for the small molecule CG model may be considered depending on the application. In the case of protein–ligand interactions, as described, the validation of the molecular size and shape is also strongly encouraged. Together with the free energy of transfer comparison, it allows for fine-tuning and optimizing bead type choices and bonded parameters.
Acknowledgments
The authors would like to thank all researchers that contributed to the development of the new Martini 3 force field. Additionally, we would like to acknowledge all users of the molecular modeling community that tested the open-beta version of Martini 3 and sent us feedback. Without your input and criticism, it would not have been possible to finish this new version of Martini.