Coarse-grained (CG) molecular dynamics (MD) can simulate systems inaccessible to fine-grained (FG) MD simulations. A CG simulation decreases the degrees of freedom by mapping atoms from an FG representation into agglomerate CG particles. The FG to CG mapping is not unique. Research into systematic selection of these mappings is challenging due to their combinatorial growth with respect to the number of atoms in a molecule. Here we present a method of reducing the total count of mappings by imposing molecular topology and symmetry constraints. The count reduction is illustrated by considering all mappings for nearly 50 000 molecules. The resulting number of mapping operators is still large, so we introduce a novel hierarchical graphical approach which encodes multiple CG mapping operators. The encoding method is demonstrated for methanol and a 14-mer peptide. With the test cases, we show how the encoding can be used for automated selection of reasonable CG mapping operators.

## I. INTRODUCTION

The length scale of all-atom molecular dynamics (MD) simulations is fundamentally limited due to the large number of atoms in complex multiscale phenomena. One way to address this is through coarse-grained (CG) MD simulations using a lower resolution model which has fewer degrees of freedom than the fine-grained (FG) or all-atom model. CG simulation enables the study of larger length scale systems, such as multi-protein complexes,^{1–3} lipid bilayers,^{1,3,4} nucleic acids,^{3,5–8} and polymers.^{9–11}

A major step in defining a CG model is to determine how the FG model maps to the CG model. The choice of mapping from FG to CG is not unique and thus continues to be an active area of research.^{5,8,12,13} We refer to this mapping as a CG mapping operator. Past researchers have designed CG mapping operators through chemical intuition^{14} or with specific rules tailored for classes of molecules.^{9,15,16} In a recent study, the CG model has been defined by a probabilistic CG to FG mapping.^{17} The main obstacle to developing general and theoretically justified systematic approaches to find CG mapping operators is the combinatorial growth of possible CG mapping operators with respect to atom number. In this work, we show that the CG mapping operator count can be drastically reduced by imposing topology and symmetry restrictions. For example, one can exclude mappings that change the symmetry group when going from the FG to the CG system. Four restrictions were explored and are able to reduce the growth from super-exponential (≈*e*^{n} ^{ln} ^{n})^{18} to 2^{b′},^{19} where *b*′ is the number of chemically unique bonds and *n* is the number of atoms. Even with those restrictions, the number of CG mapping operators for a molecule is too large to permit rigorous study in all but the simplest molecules.

To mitigate the exponential growth of CG mapping operators after imposing constraints, we have developed an approach of compressing, or encoding, the operators into a compact hierarchical graph representation. We introduce a general algorithm that deterministically constructs a unique hierarchical graph for a given molecule. The hierarchical graph can then be used for automated selection of CG mappings.^{20}

The idea of encoding CG mapping operators into a hierarchical graph representation is inspired by recent work in computer video processing. Mapping atoms into CG particles is analogous to a method called video segmentation. Video segmentation is the merging of similar voxels (volumetric pixels) into supervoxels^{21} to facilitate easier analysis of a video. Both CG and video segmentation involve dimensionality reduction to remove extraneous details while retaining salient features of the high resolution representation. This work is motivated by the successful application of graph-based methods in video segmentation.^{22–24}

Specific work on CG mapping theory includes work by Zhang and Voth^{25–27} who designed CG mapping operators of biomacromolecules such as proteins, ribosomes, and actin filaments by preserving the “essential dynamics”^{28} of the simulation. They made the optimization of the CG operator tractable by limiting analysis to a subset of principal component analysis modes^{25} and using a potential energy function that does not require iterative optimization (an elastic-network model).^{26,27} Cao and Voth^{14} worked on theory for the centering functions, specifically comparing using center of mass (COM) versus center of charge. They found that center of charge gave better preservation of the inter-molecular structure in four example systems. Wagner *et al.*^{29} developed theory that can generalize centering functions to arbitrary thermodynamic quantities. Marrink *et al.*^{30} developed a hand-tuned mapping operator based on grouping four heavy atoms. Although this approach was empirical, their work showed that consistently using the same number of heavy atoms while grouping (fixed-size segmentation) is a valid simplification.

This paper is organized as follows. In Sec. II, we discuss different enumeration schemes and introduce the theory of hierarchical graphs encoding CG mapping operators. Section III contains details of FG and CG simulations. In Sec. IV, we provide examples to demonstrate the mapping operator enumeration schemes on 49 889 molecules. The section also includes the algorithm for hierarchical graph construction. We implement our hierarchical graph approach for methanol. We chose methanol as a test case because it is a commonly used benchmark in the CG literature.^{31–35} As a proof of concept, the capability of the hierarchical graph to encode CG mappings is tested and specific mappings are evaluated based on their ability to preserve radial distribution and velocity autocorrelation functions (VACFs). The usability of hierarchical graphs for automated CG mapping operator selection is also discussed. We test the scalability of our approach by implementing the theory for a 14-mer peptide.

## II. THEORY

A CG model is specified by a mapping operator, **M**, and a CG-potential U($R\u2192$). **M** establishes the relationship between the FG coordinates, $r\u2192$, and the CG coordinates, $R\u2192$. Here, **M** is a linear operator represented as a matrix of dimension $NR\u2192\xd7Nr\u2192$, where $Nr\u2192$ and $NR\u2192$ are the number of FG particles and number of CG particles, respectively. The coordinates of the *i*th CG particle can be written as $Ri\u2192=\u2211j=1Nr\u2192Mijrj\u2192$. *M*_{ij} refers to an individual matrix element and $rj\u2192$ represents the FG coordinates of atom *j*. A non-zero *M*_{ij} value determines that the *j*th FG atom contributes to the *i*th CG particle.

A few simplifying assumptions can be made immediately. We assume the coordinate transformation is isotropic across all coordinate dimensions. To conserve mass between CG and FG models, each FG particle should be mapped to at least one CG particle. Thus each column in **M** must have at least one non-zero element. We will restrict our discussion to mapping operators that map each atom to only one CG particle. Therefore, each column in **M** has exactly one non-zero entry. This restriction is not always imposed for mapping FG to CG particles as seen in some previous studies.^{6,9,12,20,36} The entries of each row *M*_{ij} should sum to unity ($\u2211j=1NrMij=1,\u2200i$) because the mass of a CG particle equals the sum of the masses of its constituent atoms. The subset *s*_{i} of columns with non-zero entries for the *i*th row represents the atoms that are grouped to form the *i*th CG particle. For solutions, CG mapping operators can be restricted to the solute and implicit solvent models^{37} so that this restriction does not preclude removing the solvent. The simplifications discussed above have reduced all possible mapping operators from selecting a $NR\u2192\xd7Nr\u2192$ matrix to selecting a partition of atoms from which the matrix can be built.

### A. Enumeration schemes for mapping operators

We need to represent molecules as graphs with atoms as nodes and covalent bonds as edges^{38} in order to continue reducing the number of mapping operators. Mapping atoms of a molecule to CG particles while meeting the previously mentioned criteria is equivalent to partitioning the vertex set, *G*, of the molecular graph. Thus, the exhaustive count of mapping operators for a molecule with *n* atoms is obtained by partitioning *G* into disjoint and exhaustive subsets. The total number of ways *G* can be partitioned is calculated by the *n*th Bell number,^{39} *B*_{n}. *B*_{n} evaluates the number of ways *n* elements can be partitioned into non-empty subsets. It is given by the expression $Bn=\u2211k=0n\u22121n\u22121kBk$. Since one of the partitions given by *B*_{n} represents the FG representation, the exhaustive count for CG mapping operators (referred to as the *count with Bell number*) for a molecule with *n* atoms is *B*_{n} − 1. Due to its rapid growth rate, the mapping operator count given by Bell numbers becomes intractably large. The mapping operator count can be reduced by joining only those atoms which are connected by bonds. Naïvely, the number of mapping operators for a molecule with *b* bonds with this limitation is 2^{b} − 1 since each bond can either be retained or removed. The FG representation has all the bonds retained, and to exclude it from our mapping operator count, we subtract 1 from 2^{b}. This *naïve count* of mapping operators treats all bonds to be unique.

The assumption that all bonds in a molecule are unique is not always true. Molecules have topological symmetries^{40} as a whole, like ethane, or in parts, like the methyl group of methanol.^{40} Bonds in such symmetries are not unique and are equivalent bonds. These equivalent bonds can be defined as edge orbits of the automorphisms of the molecular graph.^{41} A graph automorphism, which is an isomorphism from a graph to itself, is an operator that permutes vertex labels without changing the graph. When a vertex is re-labeled, it forms an equivalence class with the vertex whose label it adopts. Sets of equivalent vertices are called vertex orbits. The same principle applies to edges, leading to edge orbits. Vertex orbits and edge orbits are our definition of symmetric atoms and bonds, respectively. Merging atoms that are joined by equivalent bonds results in duplicate mapping operators. For example, in water, the two O–H bonds are equivalent. The mapping operator obtained by removing one O–H bond and merging the corresponding O and H atoms into one bead will be the same mapping operator that we would get by removing the other O–H bond. Thus the two mapping operators are duplicates. A *mapping operator count without duplicates* due to equivalent bonds can be calculated with the stars and bars theorem of combinatorics.^{42} The theorem states that *n* identical elements can be distributed in *k* states in $Ck\u22121n+k\u22121$ ways. In our case, *k* is 2 since the bonds can be either retained or removed. Thus, the number of CG mapping operators becomes $(\u2211iCk\u22121mi+k\u22121)\u22121$, which is the number of ways *i* groups of *m*_{i} equivalent bonds can be distributed in *k* states. Note that we have subtracted one from the expression here to exclude the FG representation from the count. The mapping operator count can be further restricted by considering only those that preserve topological symmetries. To preserve symmetry, equivalent bonds are mandated to have uniform states (i.e., all retained or removed) while defining a CG mapping operator. For example, this would disallow removing one C–H bond in a methyl group without removing the other two. The total number of unique *mapping operators preserving symmetry* is 2^{b′} − 1, where *b*′ is the number of unique bonds (*b*′ ≤ *b*). It is possible to achieve similar reduction in mapping operator count when a different metric is used to join atoms, like coordination number or spatial proximity instead of the presence of a chemical bond.

### B. Hierarchical graphs encoding mapping operators

We introduce two forms of hierarchical graphs: the mapping operator tree (MOT) and the mapping operator graph (MOG). Both show how atoms in a molecule can be grouped to form CG particles at different resolutions and encode multiple mapping operators including mixed-resolution models. An MOG is unique for a given molecule and encodes all the symmetry preserving mapping operators. MOTs encode a subset of the mapping operators and may be applied to practically select mapping operators, whereas MOGs are the theoretical results that are impractical for molecules larger than a few atoms. A hierarchical graph can be constructed for a molecule so that only symmetry preserving mapping operators are considered and thus serve as the basis for automated selection of mapping operators. We will use the notation {A_{0}A_{1}A_{2}} to indicate that atoms A_{0} though A_{2} are joined to form a CG particle. Thus the single CG bead representation for methanol is denoted by {CH_{3}OH}.

An illustrative MOT for methanol is shown in Fig. 1(a). For an MOT with *h* levels, *L*^{i} where *i* ∈ [1, *h*], each node in *L*^{i} is marked as $Vsi$, with *s* being the individual index. The root of the MOT ($V11$) corresponds to the lowest resolution CG representation, mapping one molecule to one CG bead. Subsequent levels of the tree show CG representations in increasing resolution. The highest level of the tree (*L*^{h}) contains the leaf nodes ($Vsh$) corresponding to atoms in a molecule. Union of $Vsi$ for each *L*^{i} comprises the complete atom set of the molecule. A CG mapping operator can be extracted from the MOT with a valid *tree slice*. A *tree slice* refers to a subset of nodes chosen from the MOT. It can be specified by a binary vector $x\u2192$ with length equal to the total number of nodes (*V*) in the MOT. If a node is chosen in a slice, the corresponding variable in $x\u2192$ is set to 1, else it is set to 0. A tree slice is valid only when it includes some representation of all atoms and includes only one node from each of the root-to-leaf paths. A path matrix $P$ is constructed as a *N*_{p} × *V* matrix with binary entries, with *N*_{p} being the number of root-to-leaf paths. An element *p*_{ij} in $P$ is 1 if the corresponding node falls under the path *P*_{i}. A valid tree slice $x\u2192$ obeys $Px\u2192=1\u2192Np$, where $1\u2192Np$ denotes ones vector of length *N*_{p}.^{22} Methods like uniform entropy slice (UES) flattening^{22} can be employed to identify a valid and optimal tree slice.

An MOT cannot encode all mapping operators for most molecules and thus is not unique for a given molecule. This limitation arises because MOT construction pre-determines the order of bond removal. For instance, in Fig. 1(a), the C–H and the O–H bonds are removed prior to the C–O bond. Thus the methanol MOT fails to encode these mapping operators: {CH_{3}O}H, H_{3}{COH}, and H_{3}{CO}H. The symmetry preserving mapping operators that can be extracted from the methanol MOT with valid slices are: {CH_{3}OH}, {CH_{3}}{OH}, {CH_{3}}OH, and CH_{3}{OH}. The MOT can be adjusted to encode all mapping operators by allowing multiple parents. This results in a mapping operator graph (MOG) that is no longer a tree. Figure 1(b) illustrates the MOG for methanol. It must be noted that only one H atom is included from the methyl group as all the three methyl H’s are chemically equivalent. Allowing multiple parents introduces multiple paths from the root to leaf nodes and thus the rows in the path matrix must be adjusted by using an OR operation on the paths leading up to the same leaf node to maintain the same equation for valid tree slices. The rule validating a slice still holds. All MOTs for a molecule can be derived from its corresponding unique MOG by removing nodes. Thus the MOG acts as an upper-bound and starting point for MOTs. The final selection of an MOT for specific application should be made with careful consideration since the available mapping operators to choose from will be less than what is encoded in an MOG. The governing principle that should dictate the choice of an MOT from the parent MOG is a subject of future research.

## III. SIMULATION METHODS

An FG simulation of methanol molecules was performed at a density of 0.778 g/cm^{3} using GROMACS-5.1.3^{43,44} with the OPLS-AA force field^{45} and 1 fs time step. The NVT ensemble was maintained at 300 K with the Nosé-Hoover thermostat.^{46} The duration of the simulation was 100 ps. A short range cut-off distance of 12.2 Å was used for Van der Waals interactions, and smooth particle-mesh Ewald^{47} was used to treat long-range electrostatic interactions. A second trajectory was produced from the original trajectory by recalculating forces with a topology that excluded all bonded interactions. The recalculated trajectory was used to calculate exclusion forces which correspond to non-bonded interactions. Rühle and Junghans^{48} reported exclusion Force-matching (FM) to perform better than full FM in their study of liquid hexane.

1 ns CG-MD simulations were performed corresponding to different CG mappings for methanol using the GROMACS 2016.3^{43} simulation engine and the VOTCA 1.5 package.^{49} Force-matching (FM)^{50–52} was employed to obtain tabulated CG potentials from the FG trajectory. GROMACS uses cubic spline interpolation for the tabulated values.^{44} The pair-wise potentials are given in Fig. 2. FM uses least-squares minimization to calculate CG forces. Cubic spline was used as the basis function to solve the least-squares minimization in VOTCA.^{49} Both FG and CG methanol simulations were conducted in a box (4.09 nm × 4.09 nm × 4.09 nm) with periodic boundaries. All bonds and angles in the CG simulations were constrained using the SHAKE algorithm.^{53}

To demonstrate the feasibility of applying our hierarchical graph-based method to a larger system, a 200 atom peptide (VPKDGIVAREGSPA) was simulated for 20 ns using the AMBER-99SB-ILDN force field^{54} and a time step of 2 fs. The temperature was set at 300 K (NVT). The SPC/E water model^{55} was used to solvate the peptide. Counter ions and NaCl were added to achieve an ionic concentration of 100 mM. The peptide FG simulation was automated following the protocol mentioned in a recent work.^{56} Post-processing and analysis of all the simulation trajectories were completed using tools from GROMACS-2016.3 and MDAnalysis.^{57,58}

## IV. RESULTS

### A. Mapping operator enumeration

As discussed under Sec. II, different constraints can be imposed to reduce the number of mapping operators. To quantify the extent of reduction in the number of mapping operators, we have implemented the four enumeration schemes that have successively fewer mapping operators: *count with Bell number, naïve count, count without duplicate mapping operators, and count preserving symmetry*. For example, methanol has 202 mapping operators using the Bell number. The naïve count, assuming all bonds in methanol to be unique, yields a total mapping operator count of 31. This number includes identical mapping operators and those which do not retain symmetry. Since the methyl hydrogen atoms in methanol are equivalent, the total number of mapping operators without duplicate mapping operators is 15. The number of mapping operators preserving symmetry is 7. The semi-log plot in Fig. 3 shows similar disparity in the order of magnitude of mapping operator counts for 49 889 other molecules when the four counting methods are used. These molecules, with the number of heavy atoms ranging from 3 to 9, were randomly taken from PubChem.^{59} The details of the selection process are included in the supplementary material. The PubChem database was chosen to ensure that the molecules we are considering have been synthesized. For all the test molecules, we constructed unique MOGs encoding all the symmetry preserving CG mapping operators. To illustrate the extent to which an MOG is useful in encoding the CG mapping operators within itself, we have included the number of nodes in the MOGs for all the test molecules in Fig. 3.

### B. MOG construction and extraction of mapping operators

In this section, we demonstrate the systematic construction of an MOG followed by the extraction of a valid CG mapping operator from the illustrative methanol MOG. MOGs can be deterministically and uniquely constructed for a given molecule following Algorithm 1 (see the supplementary material for information on the source code). We demonstrate how CG mapping operators can be extracted from an MOG using a corresponding path matrix, a slice vector, and the slice validation criterion. The path matrix ($P$) for the methanol MOG [Fig. 1(b)] is shown in Table I.

Path . | V_{1}
. | V_{2}
. | V_{3}
. | V_{4}
. | V_{5}
. | V_{6}
. | V_{7}
. | V_{8}
. | V_{9}
. | V_{10}
. |
---|---|---|---|---|---|---|---|---|---|---|

P_{7} | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |

P_{8} | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 |

P_{9} | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |

P_{10} | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |

Path . | V_{1}
. | V_{2}
. | V_{3}
. | V_{4}
. | V_{5}
. | V_{6}
. | V_{7}
. | V_{8}
. | V_{9}
. | V_{10}
. |
---|---|---|---|---|---|---|---|---|---|---|

P_{7} | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |

P_{8} | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 |

P_{9} | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |

P_{10} | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |

An example slice selecting nodes $v4$ and $v6$ is denoted by the vector $x\u2192=[0,0,0,1,0,1,0,0,0,0]$. The slice is valid since it satisfies $Px\u2192=1\u2192Np$. The slice represents the two particle CG representation of methanol ({CH_{3}}{OH}). An example of an invalid slice is the selection of nodes $v4$, $v5$, and $v6$ ($x\u2192=[0,0,0,1,1,1,0,0,0,0]$) where $Px\u2192=[1,2,2,1]$. The slice attempts to map each of nodes $v8$ and $v9$ in two CG particles simultaneously. The validation rule not only prevents encapsulating an atom in multiple CG particles but also ensures that a particular atom is chosen to have singular unambiguous resolution by invalidating the selection of a parent and its child simultaneously. The methanol MOG exhaustively encodes the seven mapping operators of methanol ({CH_{3}OH}, {CH_{3}}{OH}, {CH_{3}}OH, CH_{3}{OH},{CH_{3}O}H, H_{3}{COH}, and H_{3}{CO}H) which obey symmetry restriction.

### C. Automated mapping operator selection using MOT

We demonstrate the use of MOT for automated CG mapping operator selection using methanol and a 14-mer peptide (VPKDGIVAREGSPA) as examples. The automated selection of CG mapping operators was performed using the UES method.^{22}

The UES method aims to balance the information content across CG particles. It entails calculating the entropy for each node of the MOT and finding the valid tree slice where the entropy difference between selected nodes is minimized,

In Eq. (1), *S* represents the entropy of specific nodes of the MOT and *x* refers to the individual element in the slice vector $x\u2192$. The definition of entropy^{60–66} used in UES can vary depending on the system.

In our methanol example, the negative of the order parameter^{67} was used as an approximation for entropy. All the nodes except the root node were allowed to be selected. The valid slice including all the leaf nodes is a trivial slice corresponding to the FG representation of the molecule; hence it was not considered. The valid slice with nodes 2 and 3 from methanol MOT was selected by minimizing Eq. (1). The chosen slice represents the two bead methanol CG model {CH_{3}}{OH}. To compare the two bead model with other symmetry preserving mapping operators of methanol, we ran CG simulations of all seven symmetry preserving CG mapping operators of methanol. There is no agreed upon method to compare CG mapping operators. We chose a simple thermodynamic function and a dynamic function which have clear definitions in both the CG and the FG systems. The first is the center of mass (COM) radial distribution function (RDF) and the second is the velocity autocorrelation function (VACF). RDF is a structural property commonly used to compare CG models.^{2,14,29,68,69} VACF was chosen because it is relatively uncorrelated with RDF and provides a measure of how fluctuations are preserved after mapping. VACF relates to the self-diffusion coefficient and has been used for previous CG studies.^{70} The mean square differences between the target FG function and the corresponding function from CG simulations provide a quantitative comparison. The mean square differences for different mapping operators are normalized by dividing each element by the norm. Figure 4 compares the normalized mean square differences of CG COM-RDF and VACF of different CG mapping operators for methanol. The purpose of comparing different CG mapping operators encoded in the MOT and the MOG is to elucidate the applicability of the MOT for automated selection of a CG mapping operator that performs reasonably well.

The three mapping operators with least mean square differences for RDF and VACF are included in Fig. S1 of the supplementary material. COM-RDF obtained from one bead ({CH_{3}OH}) CG model shows least mean square deviation from the target FG COM-RDF followed by the H_{3}{COH} and the {CH_{3}}{OH} mapping operators. VACFs of {CH_{3}}OH, H_{3}{COH}, and {CH_{3}}{OH} deviate least from the FG VACF. The two bead model {CH_{3}}{OH}, chosen by the UES method, performs better than the three other mapping operators encoded in the methanol MOT: {CH_{3}OH}, {CH_{3}}OH, and CH_{3}{OH}.

In previous studies comparing CG mapping schemes, the highest resolution CG mapping operator did not necessarily best reproduce the target FG RDF.^{71–73} Foley *et al.* hypothesized in their study that there might be optimal CG configurations which mirror FG behaviors better than others.^{72} The results obtained underscore the importance of systematically exploring CG mapping operators instead of solely relying on chemical intuition. Besides the symmetry preserving mapping operators which are discussed above, we have considered two asymmetric mapping operators and discussed their performances in the supplementary material with corresponding Figs. S2–S4.

In order to demonstrate that an MOT with UES can scale to larger macromolecules, we have applied the method to a 14-mer peptide (VPKDGIVAREGSPA). An MOT was built as shown in Fig. 5(a) by joining particles to one of their nearest neighbors at every level of the MOT. The nearest neighbors were calculated from the final frame of a 20 ns MD simulation of the peptide. The nearest neighbors can be identified by the light edges connecting them in Fig. 5(b). Notice that in contrast to the methanol example, nearest neighbors are used as the basis of joining atoms instead of chemical bonds. This allows joining non-bonded particles based on side-chain-side-chain interactions.

To apply the UES method, a choice of entropy must be made for groups of atoms that compose a node in the MOT. We chose to compute entropy based on how similar relative velocities were within a group of atoms in the MD simulation. *V*_{j}, the entropy of node *j*, was calculated with the usual definition of entropy

where *p*_{k}(*r*) is the probability distribution of COM relative velocities from the 20 ns MD trajectory and *k* represents the *x*, *y*, and *z* coordinates. COM relative velocities were calculated by removing COM translation and rotational motion from each group of atoms that make up a node in the MOT via alignment.^{74,75} The intuition of this choice was that nodes whose atoms and COM move similarly would have a low entropy and nodes whose atoms move in a variety of different directions relative to the COM motion would have high entropy. Velocities at every 4 fs were used to compute the probability distributions.

Equipped with a working definition of entropy, nodes from levels 6 to 12 in the MOT were allowed to be selected while applying UES. The choice of disallowing high MOT levels is a heuristic method to prevent consideration of overly large CG particles while speeding-up the UES computation. The optimal CG mapping operator obtained with these conditions is shown in Fig. 5(a) as the black nodes and in Fig. 5(b) as the color coded groups of atoms.

In the selected mapping operator, the 3rd residue, lysine (K3), is grouped under three CG particles with its entire side-chain mapped into a single CG particle. K3 exhibits the highest standard deviation of the atomic root mean square fluctuation (RMSF) values given in Fig. S5 of the supplementary material. The 6th residue, isoleucine, also has its side-chain grouped into an individual CG particle. Residues 11, 5, and 13, with lowest RMSF standard deviations, were assigned to one CG particle each. Overall, the automated mapping operator selection from the MOT yielded a reasonable CG mapping for the peptide without requiring explicit simulation of the ≈10^{120} possible mapping operators.

## V. CONCLUSION

We have adapted a hierarchical graph method employed in video segmentation to encode several CG mapping operators. The combinatorial explosion in the number of CG mapping operators is avoided by systematically laying out general restrictions like preserving symmetry or only joining bonded atoms. An algorithm and an illustrative example have been provided to show the construction of a mapping operator graph (MOG) which is capable of encoding all symmetry preserving mapping operators of a given molecule. Examples were provided to demonstrate how symmetry preserving multi-resolution CG mapping operators can be captured within an MOG. We have also discussed a simpler version of the MOG, called the mapping operator tree (MOT), which can be used to automate the selection of CG mapping operators. Though an MOT does not encode all symmetry preserving mapping operators like the MOG, it still encodes many CG mapping operators and is more suited for practical use. The theory discussed in this paper provides a basis for automated selection of CG mapping operators by selecting valid tree slices. We have applied the UES method to the methanol MOT and the peptide MOT as examples of automated slice selection. The CG mapping operators for both methanol and the 14-mer peptide, obtained from the automated process, underline the potential of this method as a systematic approach for automated CG mapping operator selection.

## SUPPLEMENTARY MATERIAL

The supplementary material contains information about the source code of the MOG building algorithm, the details of molecule selection from PubChem, the individual RDF/VACF plots, a discussion on asymmetric mapping operators, and the RMSF plot for peptide VPKDGIVAREGSPA.

## ACKNOWLEDGMENTS

This material is based upon work supported by the National Science Foundation under No. 1764415. We thank the Center for Integrated Research Computing at the University of Rochester for providing computational resources. We also thank Professor Martin McCullagh for providing helpful feedback. GromacsWrapper (Oliver Beckstein *et al.*, https://github.com/Becksteinlab/GromacsWrapper doi:10.5281/zenodo.17901) was used in the automated peptide simulation protocol.