Coarse-grained (CG) molecular dynamics (MD) can simulate systems inaccessible to fine-grained (FG) MD simulations. A CG simulation decreases the degrees of freedom by mapping atoms from an FG representation into agglomerate CG particles. The FG to CG mapping is not unique. Research into systematic selection of these mappings is challenging due to their combinatorial growth with respect to the number of atoms in a molecule. Here we present a method of reducing the total count of mappings by imposing molecular topology and symmetry constraints. The count reduction is illustrated by considering all mappings for nearly 50 000 molecules. The resulting number of mapping operators is still large, so we introduce a novel hierarchical graphical approach which encodes multiple CG mapping operators. The encoding method is demonstrated for methanol and a 14-mer peptide. With the test cases, we show how the encoding can be used for automated selection of reasonable CG mapping operators.

The length scale of all-atom molecular dynamics (MD) simulations is fundamentally limited due to the large number of atoms in complex multiscale phenomena. One way to address this is through coarse-grained (CG) MD simulations using a lower resolution model which has fewer degrees of freedom than the fine-grained (FG) or all-atom model. CG simulation enables the study of larger length scale systems, such as multi-protein complexes,1–3 lipid bilayers,1,3,4 nucleic acids,3,5–8 and polymers.9–11 

A major step in defining a CG model is to determine how the FG model maps to the CG model. The choice of mapping from FG to CG is not unique and thus continues to be an active area of research.5,8,12,13 We refer to this mapping as a CG mapping operator. Past researchers have designed CG mapping operators through chemical intuition14 or with specific rules tailored for classes of molecules.9,15,16 In a recent study, the CG model has been defined by a probabilistic CG to FG mapping.17 The main obstacle to developing general and theoretically justified systematic approaches to find CG mapping operators is the combinatorial growth of possible CG mapping operators with respect to atom number. In this work, we show that the CG mapping operator count can be drastically reduced by imposing topology and symmetry restrictions. For example, one can exclude mappings that change the symmetry group when going from the FG to the CG system. Four restrictions were explored and are able to reduce the growth from super-exponential (≈enlnn)18 to 2b,19 where b′ is the number of chemically unique bonds and n is the number of atoms. Even with those restrictions, the number of CG mapping operators for a molecule is too large to permit rigorous study in all but the simplest molecules.

To mitigate the exponential growth of CG mapping operators after imposing constraints, we have developed an approach of compressing, or encoding, the operators into a compact hierarchical graph representation. We introduce a general algorithm that deterministically constructs a unique hierarchical graph for a given molecule. The hierarchical graph can then be used for automated selection of CG mappings.20 

The idea of encoding CG mapping operators into a hierarchical graph representation is inspired by recent work in computer video processing. Mapping atoms into CG particles is analogous to a method called video segmentation. Video segmentation is the merging of similar voxels (volumetric pixels) into supervoxels21 to facilitate easier analysis of a video. Both CG and video segmentation involve dimensionality reduction to remove extraneous details while retaining salient features of the high resolution representation. This work is motivated by the successful application of graph-based methods in video segmentation.22–24 

Specific work on CG mapping theory includes work by Zhang and Voth25–27 who designed CG mapping operators of biomacromolecules such as proteins, ribosomes, and actin filaments by preserving the “essential dynamics”28 of the simulation. They made the optimization of the CG operator tractable by limiting analysis to a subset of principal component analysis modes25 and using a potential energy function that does not require iterative optimization (an elastic-network model).26,27 Cao and Voth14 worked on theory for the centering functions, specifically comparing using center of mass (COM) versus center of charge. They found that center of charge gave better preservation of the inter-molecular structure in four example systems. Wagner et al.29 developed theory that can generalize centering functions to arbitrary thermodynamic quantities. Marrink et al.30 developed a hand-tuned mapping operator based on grouping four heavy atoms. Although this approach was empirical, their work showed that consistently using the same number of heavy atoms while grouping (fixed-size segmentation) is a valid simplification.

This paper is organized as follows. In Sec. II, we discuss different enumeration schemes and introduce the theory of hierarchical graphs encoding CG mapping operators. Section III contains details of FG and CG simulations. In Sec. IV, we provide examples to demonstrate the mapping operator enumeration schemes on 49 889 molecules. The section also includes the algorithm for hierarchical graph construction. We implement our hierarchical graph approach for methanol. We chose methanol as a test case because it is a commonly used benchmark in the CG literature.31–35 As a proof of concept, the capability of the hierarchical graph to encode CG mappings is tested and specific mappings are evaluated based on their ability to preserve radial distribution and velocity autocorrelation functions (VACFs). The usability of hierarchical graphs for automated CG mapping operator selection is also discussed. We test the scalability of our approach by implementing the theory for a 14-mer peptide.

A CG model is specified by a mapping operator, M, and a CG-potential U(R). M establishes the relationship between the FG coordinates, r, and the CG coordinates, R. Here, M is a linear operator represented as a matrix of dimension NR×Nr, where Nr and NR are the number of FG particles and number of CG particles, respectively. The coordinates of the ith CG particle can be written as Ri=j=1NrMijrj. Mij refers to an individual matrix element and rj represents the FG coordinates of atom j. A non-zero Mij value determines that the jth FG atom contributes to the ith CG particle.

A few simplifying assumptions can be made immediately. We assume the coordinate transformation is isotropic across all coordinate dimensions. To conserve mass between CG and FG models, each FG particle should be mapped to at least one CG particle. Thus each column in M must have at least one non-zero element. We will restrict our discussion to mapping operators that map each atom to only one CG particle. Therefore, each column in M has exactly one non-zero entry. This restriction is not always imposed for mapping FG to CG particles as seen in some previous studies.6,9,12,20,36 The entries of each row Mij should sum to unity (j=1NrMij=1,i) because the mass of a CG particle equals the sum of the masses of its constituent atoms. The subset si of columns with non-zero entries for the ith row represents the atoms that are grouped to form the ith CG particle. For solutions, CG mapping operators can be restricted to the solute and implicit solvent models37 so that this restriction does not preclude removing the solvent. The simplifications discussed above have reduced all possible mapping operators from selecting a NR×Nr matrix to selecting a partition of atoms from which the matrix can be built.

We need to represent molecules as graphs with atoms as nodes and covalent bonds as edges38 in order to continue reducing the number of mapping operators. Mapping atoms of a molecule to CG particles while meeting the previously mentioned criteria is equivalent to partitioning the vertex set, G, of the molecular graph. Thus, the exhaustive count of mapping operators for a molecule with n atoms is obtained by partitioning G into disjoint and exhaustive subsets. The total number of ways G can be partitioned is calculated by the nth Bell number,39Bn. Bn evaluates the number of ways n elements can be partitioned into non-empty subsets. It is given by the expression Bn=k=0n1n1kBk. Since one of the partitions given by Bn represents the FG representation, the exhaustive count for CG mapping operators (referred to as the count with Bell number) for a molecule with n atoms is Bn − 1. Due to its rapid growth rate, the mapping operator count given by Bell numbers becomes intractably large. The mapping operator count can be reduced by joining only those atoms which are connected by bonds. Naïvely, the number of mapping operators for a molecule with b bonds with this limitation is 2b − 1 since each bond can either be retained or removed. The FG representation has all the bonds retained, and to exclude it from our mapping operator count, we subtract 1 from 2b. This naïve count of mapping operators treats all bonds to be unique.

The assumption that all bonds in a molecule are unique is not always true. Molecules have topological symmetries40 as a whole, like ethane, or in parts, like the methyl group of methanol.40 Bonds in such symmetries are not unique and are equivalent bonds. These equivalent bonds can be defined as edge orbits of the automorphisms of the molecular graph.41 A graph automorphism, which is an isomorphism from a graph to itself, is an operator that permutes vertex labels without changing the graph. When a vertex is re-labeled, it forms an equivalence class with the vertex whose label it adopts. Sets of equivalent vertices are called vertex orbits. The same principle applies to edges, leading to edge orbits. Vertex orbits and edge orbits are our definition of symmetric atoms and bonds, respectively. Merging atoms that are joined by equivalent bonds results in duplicate mapping operators. For example, in water, the two O–H bonds are equivalent. The mapping operator obtained by removing one O–H bond and merging the corresponding O and H atoms into one bead will be the same mapping operator that we would get by removing the other O–H bond. Thus the two mapping operators are duplicates. A mapping operator count without duplicates due to equivalent bonds can be calculated with the stars and bars theorem of combinatorics.42 The theorem states that n identical elements can be distributed in k states in Ck1n+k1 ways. In our case, k is 2 since the bonds can be either retained or removed. Thus, the number of CG mapping operators becomes (iCk1mi+k1)1, which is the number of ways i groups of mi equivalent bonds can be distributed in k states. Note that we have subtracted one from the expression here to exclude the FG representation from the count. The mapping operator count can be further restricted by considering only those that preserve topological symmetries. To preserve symmetry, equivalent bonds are mandated to have uniform states (i.e., all retained or removed) while defining a CG mapping operator. For example, this would disallow removing one C–H bond in a methyl group without removing the other two. The total number of unique mapping operators preserving symmetry is 2b − 1, where b′ is the number of unique bonds (b′ ≤ b). It is possible to achieve similar reduction in mapping operator count when a different metric is used to join atoms, like coordination number or spatial proximity instead of the presence of a chemical bond.

We introduce two forms of hierarchical graphs: the mapping operator tree (MOT) and the mapping operator graph (MOG). Both show how atoms in a molecule can be grouped to form CG particles at different resolutions and encode multiple mapping operators including mixed-resolution models. An MOG is unique for a given molecule and encodes all the symmetry preserving mapping operators. MOTs encode a subset of the mapping operators and may be applied to practically select mapping operators, whereas MOGs are the theoretical results that are impractical for molecules larger than a few atoms. A hierarchical graph can be constructed for a molecule so that only symmetry preserving mapping operators are considered and thus serve as the basis for automated selection of mapping operators. We will use the notation {A0A1A2} to indicate that atoms A0 though A2 are joined to form a CG particle. Thus the single CG bead representation for methanol is denoted by {CH3OH}.

An illustrative MOT for methanol is shown in Fig. 1(a). For an MOT with h levels, Li where i ∈ [1, h], each node in Li is marked as Vsi, with s being the individual index. The root of the MOT (V11) corresponds to the lowest resolution CG representation, mapping one molecule to one CG bead. Subsequent levels of the tree show CG representations in increasing resolution. The highest level of the tree (Lh) contains the leaf nodes (Vsh) corresponding to atoms in a molecule. Union of Vsi for each Li comprises the complete atom set of the molecule. A CG mapping operator can be extracted from the MOT with a valid tree slice. A tree slice refers to a subset of nodes chosen from the MOT. It can be specified by a binary vector x with length equal to the total number of nodes (V) in the MOT. If a node is chosen in a slice, the corresponding variable in x is set to 1, else it is set to 0. A tree slice is valid only when it includes some representation of all atoms and includes only one node from each of the root-to-leaf paths. A path matrix P is constructed as a Np × V matrix with binary entries, with Np being the number of root-to-leaf paths. An element pij in P is 1 if the corresponding node falls under the path Pi. A valid tree slice x obeys Px=1Np, where 1Np denotes ones vector of length Np.22 Methods like uniform entropy slice (UES) flattening22 can be employed to identify a valid and optimal tree slice.

FIG. 1.

(a) Example of an MOT for methanol. (b) Example of an MOG for methanol. Both hierarchical graphs show different levels of granularity, the lowest being the FG representation. The edges show which particles are grouped together in a CG particle. Multiple CG mapping operators are encoded in the MOT and can be extracted with a valid slice. The methanol MOG is a modified version of the methanol MOT allowing multiple parents. The three methyl hydrogen atoms are represented in the MOG with only one hydrogen atom since they are equivalent. The MOG encodes all the methanol CG mapping operators that obey symmetry and uniform state restrictions.

FIG. 1.

(a) Example of an MOT for methanol. (b) Example of an MOG for methanol. Both hierarchical graphs show different levels of granularity, the lowest being the FG representation. The edges show which particles are grouped together in a CG particle. Multiple CG mapping operators are encoded in the MOT and can be extracted with a valid slice. The methanol MOG is a modified version of the methanol MOT allowing multiple parents. The three methyl hydrogen atoms are represented in the MOG with only one hydrogen atom since they are equivalent. The MOG encodes all the methanol CG mapping operators that obey symmetry and uniform state restrictions.

Close modal

An MOT cannot encode all mapping operators for most molecules and thus is not unique for a given molecule. This limitation arises because MOT construction pre-determines the order of bond removal. For instance, in Fig. 1(a), the C–H and the O–H bonds are removed prior to the C–O bond. Thus the methanol MOT fails to encode these mapping operators: {CH3O}H, H3{COH}, and H3{CO}H. The symmetry preserving mapping operators that can be extracted from the methanol MOT with valid slices are: {CH3OH}, {CH3}{OH}, {CH3}OH, and CH3{OH}. The MOT can be adjusted to encode all mapping operators by allowing multiple parents. This results in a mapping operator graph (MOG) that is no longer a tree. Figure 1(b) illustrates the MOG for methanol. It must be noted that only one H atom is included from the methyl group as all the three methyl H’s are chemically equivalent. Allowing multiple parents introduces multiple paths from the root to leaf nodes and thus the rows in the path matrix must be adjusted by using an OR operation on the paths leading up to the same leaf node to maintain the same equation for valid tree slices. The rule validating a slice still holds. All MOTs for a molecule can be derived from its corresponding unique MOG by removing nodes. Thus the MOG acts as an upper-bound and starting point for MOTs. The final selection of an MOT for specific application should be made with careful consideration since the available mapping operators to choose from will be less than what is encoded in an MOG. The governing principle that should dictate the choice of an MOT from the parent MOG is a subject of future research.

An FG simulation of methanol molecules was performed at a density of 0.778 g/cm3 using GROMACS-5.1.343,44 with the OPLS-AA force field45 and 1 fs time step. The NVT ensemble was maintained at 300 K with the Nosé-Hoover thermostat.46 The duration of the simulation was 100 ps. A short range cut-off distance of 12.2 Å was used for Van der Waals interactions, and smooth particle-mesh Ewald47 was used to treat long-range electrostatic interactions. A second trajectory was produced from the original trajectory by recalculating forces with a topology that excluded all bonded interactions. The recalculated trajectory was used to calculate exclusion forces which correspond to non-bonded interactions. Rühle and Junghans48 reported exclusion Force-matching (FM) to perform better than full FM in their study of liquid hexane.

1 ns CG-MD simulations were performed corresponding to different CG mappings for methanol using the GROMACS 2016.343 simulation engine and the VOTCA 1.5 package.49 Force-matching (FM)50–52 was employed to obtain tabulated CG potentials from the FG trajectory. GROMACS uses cubic spline interpolation for the tabulated values.44 The pair-wise potentials are given in Fig. 2. FM uses least-squares minimization to calculate CG forces. Cubic spline was used as the basis function to solve the least-squares minimization in VOTCA.49 Both FG and CG methanol simulations were conducted in a box (4.09 nm × 4.09 nm × 4.09 nm) with periodic boundaries. All bonds and angles in the CG simulations were constrained using the SHAKE algorithm.53 

FIG. 2.

Pair potentials for CG methanol simulations. The pair potentials used for seven symmetry preserving methanol mapping operators are shown. The potentials are derived from the fine-grain simulation using force-matching.

FIG. 2.

Pair potentials for CG methanol simulations. The pair potentials used for seven symmetry preserving methanol mapping operators are shown. The potentials are derived from the fine-grain simulation using force-matching.

Close modal

To demonstrate the feasibility of applying our hierarchical graph-based method to a larger system, a 200 atom peptide (VPKDGIVAREGSPA) was simulated for 20 ns using the AMBER-99SB-ILDN force field54 and a time step of 2 fs. The temperature was set at 300 K (NVT). The SPC/E water model55 was used to solvate the peptide. Counter ions and NaCl were added to achieve an ionic concentration of 100 mM. The peptide FG simulation was automated following the protocol mentioned in a recent work.56 Post-processing and analysis of all the simulation trajectories were completed using tools from GROMACS-2016.3 and MDAnalysis.57,58

As discussed under Sec. II, different constraints can be imposed to reduce the number of mapping operators. To quantify the extent of reduction in the number of mapping operators, we have implemented the four enumeration schemes that have successively fewer mapping operators: count with Bell number, naïve count, count without duplicate mapping operators, and count preserving symmetry. For example, methanol has 202 mapping operators using the Bell number. The naïve count, assuming all bonds in methanol to be unique, yields a total mapping operator count of 31. This number includes identical mapping operators and those which do not retain symmetry. Since the methyl hydrogen atoms in methanol are equivalent, the total number of mapping operators without duplicate mapping operators is 15. The number of mapping operators preserving symmetry is 7. The semi-log plot in Fig. 3 shows similar disparity in the order of magnitude of mapping operator counts for 49 889 other molecules when the four counting methods are used. These molecules, with the number of heavy atoms ranging from 3 to 9, were randomly taken from PubChem.59 The details of the selection process are included in the supplementary material. The PubChem database was chosen to ensure that the molecules we are considering have been synthesized. For all the test molecules, we constructed unique MOGs encoding all the symmetry preserving CG mapping operators. To illustrate the extent to which an MOG is useful in encoding the CG mapping operators within itself, we have included the number of nodes in the MOGs for all the test molecules in Fig. 3.

FIG. 3.

Mapping operator counts given by four counting methods for 49 889 molecules with heavy atoms in the range 3 to 9. The x-axis denotes the number of heavy atoms in the molecules. The y-axis is in the logarithmic scale. The figure also shows the upper bound of the node count of the hierarchical graph, called the mapping operator graph (MOG), which encodes multiple mapping operators. The solid lines denote the median values for counts; the upper and lower bounds enclosing the corresponding shaded region denote the upper and lower quartiles, respectively. With symmetry and uniform state restrictions, the number of CG mapping operators to be encoded in the hierarchical graph decreases significantly.

FIG. 3.

Mapping operator counts given by four counting methods for 49 889 molecules with heavy atoms in the range 3 to 9. The x-axis denotes the number of heavy atoms in the molecules. The y-axis is in the logarithmic scale. The figure also shows the upper bound of the node count of the hierarchical graph, called the mapping operator graph (MOG), which encodes multiple mapping operators. The solid lines denote the median values for counts; the upper and lower bounds enclosing the corresponding shaded region denote the upper and lower quartiles, respectively. With symmetry and uniform state restrictions, the number of CG mapping operators to be encoded in the hierarchical graph decreases significantly.

Close modal

In this section, we demonstrate the systematic construction of an MOG followed by the extraction of a valid CG mapping operator from the illustrative methanol MOG. MOGs can be deterministically and uniquely constructed for a given molecule following Algorithm 1 (see the supplementary material for information on the source code). We demonstrate how CG mapping operators can be extracted from an MOG using a corresponding path matrix, a slice vector, and the slice validation criterion. The path matrix (P) for the methanol MOG [Fig. 1(b)] is shown in Table I.

Algorithm 1.

MOG Construction: Nodes (Vg) and edges (Eg) of a molecular graph are taken as input arguments for the MOG function. Using automorphism, the equivalent node classes are identified. The quotient graph of the molecular graph (Vg, Eg) is built by partitioning the nodes according to atom equivalent classes. Nodes are added to the MOG only if they are not already present in the MOG. Addition of nodes to all but the first MOG level is followed by the addition of directed edges from the node to its children.

 
 
TABLE I.

Path matrix for methanol MOG.

PathV1V2V3V4V5V6V7V8V9V10
P7 
P8 
P9 
P10 
PathV1V2V3V4V5V6V7V8V9V10
P7 
P8 
P9 
P10 

An example slice selecting nodes v4 and v6 is denoted by the vector x=[0,0,0,1,0,1,0,0,0,0]. The slice is valid since it satisfies Px=1Np. The slice represents the two particle CG representation of methanol ({CH3}{OH}). An example of an invalid slice is the selection of nodes v4, v5, and v6 (x=[0,0,0,1,1,1,0,0,0,0]) where Px=[1,2,2,1]. The slice attempts to map each of nodes v8 and v9 in two CG particles simultaneously. The validation rule not only prevents encapsulating an atom in multiple CG particles but also ensures that a particular atom is chosen to have singular unambiguous resolution by invalidating the selection of a parent and its child simultaneously. The methanol MOG exhaustively encodes the seven mapping operators of methanol ({CH3OH}, {CH3}{OH}, {CH3}OH, CH3{OH},{CH3O}H, H3{COH}, and H3{CO}H) which obey symmetry restriction.

We demonstrate the use of MOT for automated CG mapping operator selection using methanol and a 14-mer peptide (VPKDGIVAREGSPA) as examples. The automated selection of CG mapping operators was performed using the UES method.22 

The UES method aims to balance the information content across CG particles. It entails calculating the entropy for each node of the MOT and finding the valid tree slice where the entropy difference between selected nodes is minimized,

xselected=argminVs,VtMOT|S(Vs)S(Vt)|xsxt.
(1)

In Eq. (1), S represents the entropy of specific nodes of the MOT and x refers to the individual element in the slice vector x. The definition of entropy60–66 used in UES can vary depending on the system.

In our methanol example, the negative of the order parameter67 was used as an approximation for entropy. All the nodes except the root node were allowed to be selected. The valid slice including all the leaf nodes is a trivial slice corresponding to the FG representation of the molecule; hence it was not considered. The valid slice with nodes 2 and 3 from methanol MOT was selected by minimizing Eq. (1). The chosen slice represents the two bead methanol CG model {CH3}{OH}. To compare the two bead model with other symmetry preserving mapping operators of methanol, we ran CG simulations of all seven symmetry preserving CG mapping operators of methanol. There is no agreed upon method to compare CG mapping operators. We chose a simple thermodynamic function and a dynamic function which have clear definitions in both the CG and the FG systems. The first is the center of mass (COM) radial distribution function (RDF) and the second is the velocity autocorrelation function (VACF). RDF is a structural property commonly used to compare CG models.2,14,29,68,69 VACF was chosen because it is relatively uncorrelated with RDF and provides a measure of how fluctuations are preserved after mapping. VACF relates to the self-diffusion coefficient and has been used for previous CG studies.70 The mean square differences between the target FG function and the corresponding function from CG simulations provide a quantitative comparison. The mean square differences for different mapping operators are normalized by dividing each element by the norm. Figure 4 compares the normalized mean square differences of CG COM-RDF and VACF of different CG mapping operators for methanol. The purpose of comparing different CG mapping operators encoded in the MOT and the MOG is to elucidate the applicability of the MOT for automated selection of a CG mapping operator that performs reasonably well.

FIG. 4.

Mean square differences of COM-RDFs and velocity autocorrelation functions (VACFs) between FG and seven different methanol CG mapping operators. The mapping operator obtained by applying UES, {CH3}{OH}, is highlighted in red. {CH3}{OH} has the least combined RDF and VACF mean square error and performs best among the four CG mapping operators encoded in the methanol MOT ({CH3OH}, {CH3}{OH}, {CH3}OH, and CH3{OH}), when RDF and VACF are considered to be the performance metrics. The other mapping operators to the right of the vertical dashed line are those which are encoded in the methanol MOG but not in the methanol MOT in Fig. 1(a).

FIG. 4.

Mean square differences of COM-RDFs and velocity autocorrelation functions (VACFs) between FG and seven different methanol CG mapping operators. The mapping operator obtained by applying UES, {CH3}{OH}, is highlighted in red. {CH3}{OH} has the least combined RDF and VACF mean square error and performs best among the four CG mapping operators encoded in the methanol MOT ({CH3OH}, {CH3}{OH}, {CH3}OH, and CH3{OH}), when RDF and VACF are considered to be the performance metrics. The other mapping operators to the right of the vertical dashed line are those which are encoded in the methanol MOG but not in the methanol MOT in Fig. 1(a).

Close modal

The three mapping operators with least mean square differences for RDF and VACF are included in Fig. S1 of the supplementary material. COM-RDF obtained from one bead ({CH3OH}) CG model shows least mean square deviation from the target FG COM-RDF followed by the H3{COH} and the {CH3}{OH} mapping operators. VACFs of {CH3}OH, H3{COH}, and {CH3}{OH} deviate least from the FG VACF. The two bead model {CH3}{OH}, chosen by the UES method, performs better than the three other mapping operators encoded in the methanol MOT: {CH3OH}, {CH3}OH, and CH3{OH}.

In previous studies comparing CG mapping schemes, the highest resolution CG mapping operator did not necessarily best reproduce the target FG RDF.71–73 Foley et al. hypothesized in their study that there might be optimal CG configurations which mirror FG behaviors better than others.72 The results obtained underscore the importance of systematically exploring CG mapping operators instead of solely relying on chemical intuition. Besides the symmetry preserving mapping operators which are discussed above, we have considered two asymmetric mapping operators and discussed their performances in the supplementary material with corresponding Figs. S2–S4.

In order to demonstrate that an MOT with UES can scale to larger macromolecules, we have applied the method to a 14-mer peptide (VPKDGIVAREGSPA). An MOT was built as shown in Fig. 5(a) by joining particles to one of their nearest neighbors at every level of the MOT. The nearest neighbors were calculated from the final frame of a 20 ns MD simulation of the peptide. The nearest neighbors can be identified by the light edges connecting them in Fig. 5(b). Notice that in contrast to the methanol example, nearest neighbors are used as the basis of joining atoms instead of chemical bonds. This allows joining non-bonded particles based on side-chain-side-chain interactions.

FIG. 5.

(a) An MOT construction for the peptide VPKDGIVAREGSPA. The extraction of a valid slice using the UES method is shown. The chosen nodes are colored black, the children of the selected nodes are shown as colored nodes, and the red central node is the root. The backbone amide nitrogen of the residues is identified with labels. The backbone amide nitrogen atoms are unevenly spread since the atoms are not arranged sequentially in the base layer to reduce edge crossing. (b) The mapping of the peptide atoms into CG particles is elucidated by using the same color coding as the MOT to indicate individual CG particles. Nearest neighbors of each atom are identified by connecting them with lighter edges. The nearest neighbors are defined based on spatial proximity of atoms in the three-dimensional structure of the peptide. Some atoms are grouped into one CG particle in spite of not being nearest neighbors in the FG molecular graph since CG particles gain all edges of their child particles in the MOT. This gaining of edges upon agglomeration leads to joining of disconnected particles at higher tree levels.

FIG. 5.

(a) An MOT construction for the peptide VPKDGIVAREGSPA. The extraction of a valid slice using the UES method is shown. The chosen nodes are colored black, the children of the selected nodes are shown as colored nodes, and the red central node is the root. The backbone amide nitrogen of the residues is identified with labels. The backbone amide nitrogen atoms are unevenly spread since the atoms are not arranged sequentially in the base layer to reduce edge crossing. (b) The mapping of the peptide atoms into CG particles is elucidated by using the same color coding as the MOT to indicate individual CG particles. Nearest neighbors of each atom are identified by connecting them with lighter edges. The nearest neighbors are defined based on spatial proximity of atoms in the three-dimensional structure of the peptide. Some atoms are grouped into one CG particle in spite of not being nearest neighbors in the FG molecular graph since CG particles gain all edges of their child particles in the MOT. This gaining of edges upon agglomeration leads to joining of disconnected particles at higher tree levels.

Close modal

To apply the UES method, a choice of entropy must be made for groups of atoms that compose a node in the MOT. We chose to compute entropy based on how similar relative velocities were within a group of atoms in the MD simulation. Vj, the entropy of node j, was calculated with the usual definition of entropy

S(Vj)=k=x,y,zrpk(r)lnpk(r),
(2)

where pk(r) is the probability distribution of COM relative velocities from the 20 ns MD trajectory and k represents the x, y, and z coordinates. COM relative velocities were calculated by removing COM translation and rotational motion from each group of atoms that make up a node in the MOT via alignment.74,75 The intuition of this choice was that nodes whose atoms and COM move similarly would have a low entropy and nodes whose atoms move in a variety of different directions relative to the COM motion would have high entropy. Velocities at every 4 fs were used to compute the probability distributions.

Equipped with a working definition of entropy, nodes from levels 6 to 12 in the MOT were allowed to be selected while applying UES. The choice of disallowing high MOT levels is a heuristic method to prevent consideration of overly large CG particles while speeding-up the UES computation. The optimal CG mapping operator obtained with these conditions is shown in Fig. 5(a) as the black nodes and in Fig. 5(b) as the color coded groups of atoms.

In the selected mapping operator, the 3rd residue, lysine (K3), is grouped under three CG particles with its entire side-chain mapped into a single CG particle. K3 exhibits the highest standard deviation of the atomic root mean square fluctuation (RMSF) values given in Fig. S5 of the supplementary material. The 6th residue, isoleucine, also has its side-chain grouped into an individual CG particle. Residues 11, 5, and 13, with lowest RMSF standard deviations, were assigned to one CG particle each. Overall, the automated mapping operator selection from the MOT yielded a reasonable CG mapping for the peptide without requiring explicit simulation of the ≈10120 possible mapping operators.

We have adapted a hierarchical graph method employed in video segmentation to encode several CG mapping operators. The combinatorial explosion in the number of CG mapping operators is avoided by systematically laying out general restrictions like preserving symmetry or only joining bonded atoms. An algorithm and an illustrative example have been provided to show the construction of a mapping operator graph (MOG) which is capable of encoding all symmetry preserving mapping operators of a given molecule. Examples were provided to demonstrate how symmetry preserving multi-resolution CG mapping operators can be captured within an MOG. We have also discussed a simpler version of the MOG, called the mapping operator tree (MOT), which can be used to automate the selection of CG mapping operators. Though an MOT does not encode all symmetry preserving mapping operators like the MOG, it still encodes many CG mapping operators and is more suited for practical use. The theory discussed in this paper provides a basis for automated selection of CG mapping operators by selecting valid tree slices. We have applied the UES method to the methanol MOT and the peptide MOT as examples of automated slice selection. The CG mapping operators for both methanol and the 14-mer peptide, obtained from the automated process, underline the potential of this method as a systematic approach for automated CG mapping operator selection.

The supplementary material contains information about the source code of the MOG building algorithm, the details of molecule selection from PubChem, the individual RDF/VACF plots, a discussion on asymmetric mapping operators, and the RMSF plot for peptide VPKDGIVAREGSPA.

This material is based upon work supported by the National Science Foundation under No. 1764415. We thank the Center for Integrated Research Computing at the University of Rochester for providing computational resources. We also thank Professor Martin McCullagh for providing helpful feedback. GromacsWrapper (Oliver Beckstein et al., https://github.com/Becksteinlab/GromacsWrapper doi:10.5281/zenodo.17901) was used in the automated peptide simulation protocol.

1.
A.
Morriss-Andrews
and
J. E.
Shea
,
J. Phys. Chem. Lett.
5
,
1899
(
2014
).
2.
K.
Kawaguchi
,
S.
Nakagawa
,
S.
Kinoshita
,
M.
Wada
,
H.
Saito
, and
H.
Nagao
,
Mol. Phys.
115
,
587
(
2017
).
3.
H. I.
Ingólfsson
,
C. A.
Lopez
,
J. J.
Uusitalo
,
D. H.
de Jong
,
S. M.
Gopal
,
X.
Periole
, and
S. J.
Marrink
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
4
,
225
(
2014
).
4.
C.
Arnarez
,
J. J.
Uusitalo
,
M. F.
Masman
,
H. I.
Ingólfsson
,
D. H.
De Jong
,
M. N.
Melo
,
X.
Periole
,
A. H.
De Vries
, and
S. J.
Marrink
,
J. Chem. Theory Comput.
11
,
260
(
2015
).
5.
S.
Riniker
,
J. R.
Allison
, and
W. F.
van Gunsteren
,
Phys. Chem. Chem. Phys.
14
,
12423
(
2012
).
6.
S.
Kmiecik
,
D.
Gront
,
M.
Kolinski
,
L.
Wieteska
,
A. E.
Dawid
, and
A.
Kolinski
,
Chem. Rev.
116
,
7898
(
2016
).
7.
P.
McCullagh
,
P. T.
Lake
, and
M.
McCullagh
,
J. Chem. Theory Comput.
12
,
4390
(
2016
).
8.
N. B.
Becker
and
R.
Everaers
,
Phys. Rev. E
76
,
021923
(
2007
).
9.
G.
Milano
and
F.
Müller-Plathe
,
J. Phys. Chem. B
109
,
18609
(
2005
).
10.
W.
Huang
,
T.
Mandal
, and
R. G.
Larson
,
Mol. Pharm.
14
,
733
(
2017
).
11.
A.
Aramoon
,
C.
Woodward
,
T. D.
Brietzman
, and
J. A.
El-Awady
,
J. Phys. Chem. B
120
,
9495
(
2016
).
12.
Y.
Li
,
B. C.
Abberton
,
M.
Kroger
, and
W. K.
Liu
,
Polymers
5
,
751
(
2013
).
13.
W. G.
Noid
,
J. Chem. Phys.
139
,
090901
(
2013
).
14.
Z.
Cao
and
G. A.
Voth
,
J. Chem. Phys.
143
,
243116
(
2015
).
15.
S.
Izvekov
,
M.
Parrinello
,
C. J.
Burnham
, and
G. A.
Voth
,
J. Chem. Phys.
120
,
10896
(
2004
).
16.
A.
Vishnyakov
,
D. S.
Talaga
, and
A. V.
Neimark
,
J. Phys. Chem. Lett.
3
,
3081
(
2012
).
17.
M.
Schöberl
,
N.
Zabaras
, and
P. S.
Koutsourelakis
,
J. Comput. Phys.
333
,
49
(
2017
).
18.
N. G.
de Bruijn
,
Asymptotic Methods in Analysis
(
Dover
,
1981
), pp.
102
109
.
19.
K. H.
Rosen
,
Handbook of Discrete and Combinatorial Mathematics
, 2nd ed. (
Chapman and Hall/CRC Press
,
2017
), pp.
92
93
.
20.
E.
Brini
,
E. A.
Algaer
,
P.
Ganguly
,
C.
Li
,
F.
Rodríguez-Ropero
, and
N. F. A. V. D.
Vegt
,
Soft Matter
9
,
2108
(
2013
).
21.
M.
Grundmann
,
V.
Kwatra
,
M.
Han
, and
I.
Essa
, in
Proceedings of IEEE International Conference on Computer Vision (ICCV)
(
IEEE
,
2010
), p.
2141
.
22.
C.
Xu
,
S.
Whitt
, and
J. J.
Corso
, in
Proceedings of the IEEE International Conference on Computer Vision
(
IEEE
,
2013
), p.
2240
.
23.
E.
Sharon
,
M.
Galun
,
D.
Sharon
,
R.
Basri
, and
A.
Brandt
,
Nature
442
,
810
(
2006
).
24.
C.
Xu
,
C.
Xiong
, and
J. J.
Corso
, in
European Conference on Computer Vision
(
SpringerLink
,
2012
), pp.
626
639
.
25.
Z.
Zhang
,
L.
Lu
,
W. G.
Noid
,
V.
Krishna
,
J.
Pfaendtner
, and
G. A.
Voth
,
Biophys. J.
95
,
5073
(
2008
).
26.
Z.
Zhang
,
J.
Pfaendtner
,
A.
Grafmüller
, and
G. A.
Voth
,
Biophys. J.
97
,
2327
(
2009
).
27.
Z.
Zhang
and
G. A.
Voth
,
J. Chem. Theory Comput.
6
,
2990
(
2010
).
28.
A.
Amadei
,
A. B. M.
Linssen
, and
H. J. C.
Berendsen
,
Proteins: Struct., Funct., Genet.
17
,
412
(
1993
).
29.
J. W.
Wagner
,
J. F.
Dama
,
A. E. P.
Durumeric
, and
G. A.
Voth
,
J. Chem. Phys.
145
,
044108
(
2016
).
30.
S. J.
Marrink
,
H. J.
Risselada
,
S.
Yefimov
,
D. P.
Tieleman
, and
A. H.
De Vries
,
J. Phys. Chem. B
111
,
7812
(
2007
).
31.
Y.
Wang
,
W. G.
Noid
,
P.
Liu
, and
G. A.
Voth
,
Phys. Chem. Chem. Phys.
11
,
2002
(
2009
).
32.
J. W.
Wagner
,
J. F.
Dama
, and
G. A.
Voth
,
J. Chem. Theory Comput.
11
,
3547
(
2015
).
33.
A.
Davtyan
,
G. A.
Voth
, and
H. C.
Andersen
,
J. Chem. Phys.
145
,
224107
(
2016
).
34.
P. G.
Lafond
and
S.
Izvekov
,
J. Chem. Theory Comput.
12
,
5737
(
2016
).
35.
P. a.
Golubkov
,
J. C.
Wu
, and
P.
Ren
,
Phys. Chem. Chem. Phys.
10
,
2050
(
2008
).
36.
D.
Fritz
,
V. A.
Harmandaris
,
K.
Kremer
, and
N. F. A.
Van Der Vegt
,
Macromolecules
42
,
7579
(
2009
).
37.
S.
Wang
and
R. G.
Larson
,
Macromolecules
48
,
7709
(
2015
).
38.
A. T.
Balaban
,
J. Chem. Inf. Comput. Sci.
25
,
334
(
1985
).
39.
E. T.
Bell
,
Ann. Math.
35
,
258
(
1934
).
40.
W.
Chen
,
J.
Huang
, and
M. K.
Gilson
,
J. Chem. Inf. Comput. Sci.
44
,
1301
(
2004
).
41.
K.
Balasubramanian
,
Int. J. Quantum Chem.
21
,
411
(
1982
).
42.
J.
Buhler
,
D.
Eisenbud
,
R.
Graham
, and
C.
Wright
,
Am. Math. Mon.
101
,
507
(
1994
).
43.
M. J.
Abraham
,
T.
Murtola
,
R.
Schulz
,
S.
Páll
,
J. C.
Smith
,
B.
Hess
, and
E.
Lindah
,
SoftwareX
1-2
,
19
(
2015
).
44.
D.
Van Der Spoel
,
E.
Lindahl
,
B.
Hess
,
G.
Groenhof
,
A. E.
Mark
,
H. J. C.
Berendsen
,
J. Comput. Chem.
26
,
1701
(
2005
).
45.
W. L.
Jorgensen
,
D. S.
Maxwell
, and
J.
Tirado-Rives
,
J. Am. Chem. Soc.
118
,
11225
(
1996
).
46.
D. J.
Evans
and
B. L.
Holian
,
J. Chem. Phys.
83
,
4069
(
1985
).
47.
U.
Essmann
,
L.
Perera
,
M. L.
Berkowitz
,
T.
Darden
,
H.
Lee
, and
L. G.
Pedersen
,
J. Chem. Phys.
103
,
8577
(
1995
).
48.
V.
Rühle
and
C.
Junghans
,
Macromol. Theory Simul.
20
,
472
(
2011
).
49.
V.
Rühle
,
C.
Junghans
,
A.
Lukyanov
,
K.
Kremer
, and
D.
Andrienko
,
J. Chem. Theory Comput.
5
,
3211
(
2009
).
50.
S.
Izvekov
and
G. A.
Voth
,
J. Phys. Chem. B
109
,
2469
(
2005
).
51.
W. G.
Noid
,
J. W.
Chu
,
G. S.
Ayton
, and
G. A.
Voth
,
J. Phys. Chem. B
111
,
4116
(
2007
).
52.
F.
Ercolessi
and
J. B.
Adams
,
Europhys. Lett.
26
,
583
(
1994
).
53.
J. P.
Ryckaert
,
G.
Ciccotti
, and
H. J.
Berendsen
,
J. Comput. Phys.
23
,
327
(
1977
).
54.
K.
Lindorff-Larsen
,
S.
Piana
,
K.
Palmo
,
P.
Maragakis
,
J. L.
Klepeis
,
R. O.
Dror
, and
D. E.
Shaw
,
Proteins
78
,
1950
(
2010
).
55.
H. J. C.
Berendsen
,
J. R.
Grigera
, and
T. P.
Straatsma
,
J. Phys. Chem.
91
,
6269
(
1987
).
56.
D. B.
Amirkulova
and
A. D.
White
,
J. Theor. Comput. Chem.
17
,
1840007
(
2018
).
57.
R. J.
Gowers
,
M.
Linke
,
J.
Barnoud
,
T. J. E.
Reddy
,
M. N.
Melo
,
S. L.
Seyler
,
J.
Domański
,
D. L.
Dotson
,
S.
Buchoux
,
I. M.
Kenney
, and
O.
Beckstein
, in
Proceedings of the 15th Python in Science Conference
(
SciPy
,
2016
), pp.
98
105
.
58.
N.
Michaud-Agrawal
,
E. J.
Denning
,
T. B.
Woolf
, and
O.
Beckstein
,
J. Comput. Chem.
32
,
2319
(
2011
).
59.
S.
Kim
,
P. A.
Thiessen
,
E. E.
Bolton
,
J.
Chen
,
G.
Fu
,
A.
Gindulyte
,
L.
Han
,
J.
He
,
S.
He
,
B. A.
Shoemaker
,
J.
Wang
,
B.
Yu
,
J.
Zhang
, and
S. H.
Bryant
,
Nucleic Acids Res.
44
,
D1202
(
2016
).
60.
L.
Benguigui
,
Eur. J. Phys.
34
,
303
(
2013
).
61.
A. M.
Scarfone
and
T.
Wada
,
Physica A
384
,
305
(
2007
).
62.
T.
Helvik
and
K.
Lindgren
,
J. Stat. Phys.
155
,
687
(
2014
).
63.
P.
Zupanovi
and
D.
Kui
,
J. Phys. Commun.
2
,
045002
(
2018
).
64.
P. O.
Amblard
and
O. J.
Michel
,
Entropy
15
,
113
(
2013
).
65.
R.
Luzzi
,
A. R.
Vasconcellos
, and
J. G.
Ramos
,
Braz. J. Phys.
28
,
97
(
1998
).
66.
C. E.
Shannon
,
Bell Syst. Techn. J.
27
,
379
(
1948
).
67.
S.
Pant
,
T.
Gera
, and
N.
Choudhury
,
J. Chem. Phys.
139
,
244505
(
2013
).
68.
T.
Murtola
,
A.
Bunker
,
I.
Vattulainen
,
M.
Deserno
, and
M.
Karttunen
,
Phys. Chem. Chem. Phys.
11
,
1869
(
2009
).
69.
J.
Sauter
and
A.
Grafmüller
,
J. Chem. Theory Comput.
13
,
223
(
2017
).
70.
E.
Hawlicka
,
G.
Pálinkás
, and
K.
Heinzinger
,
Chem. Phys. Lett.
154
,
255
(
1989
).
71.
S.
Markutsya
,
A.
Devarajan
,
J. Y.
Baluyut
,
T. L.
Windus
,
M. S.
Gordon
, and
M. H.
Lamm
,
J. Chem. Phys.
138
,
214108
(
2013
).
72.
T. T.
Foley
,
M. S.
Shell
, and
W. G.
Noid
,
J. Chem. Phys.
143
,
243104
(
2015
).
73.
M.
Dallavalle
and
N. F. A.
van der Vegt
,
Phys. Chem. Chem. Phys.
19
,
23034
(
2017
).
74.
P.
Liu
,
D. K.
Agrafiotis
, and
D. L.
Theobald
,
J. Comput. Chem.
31
,
1561
(
2009
).
75.
D. L.
Theobald
,
Acta Crystallogr. Sect. A: Found. Crystallogr.
61
,
478
(
2005
).

Supplementary Material