Developing accurate coarse-grained (CG) models is critical for addressing long time and length scale phenomena with molecular simulations. Here, we distinguish and quantify two sources of error that are relevant to CG models in order to guide further methods development: “representability” errors, which result from the finite basis associated with the chosen functional form of the CG model and mapping operator, and “information” errors, which result from the limited kind and quantity of data supplied to the CG parameterization algorithm. We have performed a systematic investigation of these errors by generating all possible CG models of three liquids (butane, 1-butanol, and 1,3-propanediol) that conserve a set of chemically motivated locality and topology relationships. In turn, standard algorithms (iterative Boltzmann inversion, IBI, and multiscale coarse-graining, MSCG) were used to parameterize the models and the CG predictions were compared with atomistic results. For off-target properties, we observe a strong correlation between the accuracy and the resolution of the CG model, which suggests that the approximations represented by MSCG and IBI deteriorate with decreasing resolution. Conversely, on-target properties exhibit an extremely weak resolution dependence that suggests a limited role of representability errors in model accuracy. Taken together, these results suggest that simple CG models are capable of utilizing more information than is provided by standard parameterization algorithms, and that model accuracy can be improved by algorithm development rather than resorting to more complicated CG models.

## I. INTRODUCTION

Coarse-grained (CG) models are attractive for studying biological and soft material processes that occur on time and length scales that are too costly for direct atomistic simulation. By reducing the degrees of freedom that are directly modeled, CG equations-of-motion require fewer computational resources to integrate and CG models exhibit simplified configurational phase spaces that are easier to sample and interpret. Despite these advantages, both fundamental and practical problems are presented by the dimension-reduction associated with coarse graining. Among the fundamental problems are the limited accuracy of CG potentials for the retained degrees of freedom (i.e., the “representability” problem) and the limited information that is incorporated into standard parameterization algorithms (i.e., the “information” problem). These problems have typically been addressed together under the umbrella of the representability problem in the literature, although in practical applications they are distinct, and proper attribution has important consequences for where effort should be devoted to improve the reliability of CG models. Specifically, a representability-limited model can only be improved by using a more complex functional form for the interaction potentials, whereas an information-limited model can be improved by incorporating more information into the parameterization algorithm. Previous work has demonstrated that CG information-loss prohibits perfect thermodynamic state transferability,^{1} exact preservation of all macroscopic observables is impossible with reduced resolution,^{2,3} and thermodynamic expressions may become invalid in the CG representation.^{1,4} Under the dichotomy maintained here, most of these issues pertain to information limitations rather than representability limitations of CG models.

The practical question regarding these limitations is whether *effective* CG approximations are obtainable at different levels of resolution with existing algorithms.^{2} For chemical systems, the answer is assumed to be positive, but relatively little work has been done to systematically quantify the sources and effects of these CG errors.^{1,5–8} For instance, CG dimension reduction can (i) increase the required complexity of the CG potentials, (ii) reduce chemical transferability (i.e., accuracy of using the same CG potentials for different molecules and mixtures with similar underlying atoms), (iii) limit the accuracy and number of properties that can be simultaneously reproduced, and (iv) limit thermodynamic state transferability. Establishing the extent to which these tradeoffs occur due to representability-limitations vs information-limitations in practical scenarios is critical for guiding methods development and applying CG models.

In particle-based descriptions, CG representability is determined by the functional form of the interaction potentials and the mapping operator. Comparative studies of CG models with variable functional forms have been performed that show that added complexity can remediate some CG inaccuracies,^{9–12} while systematic studies of the mapping operator are scarce.^{13–15} The mapping operator is generally represented by a matrix that relates the positions and momenta of the atoms in the atomistic representation to pseudoparticles in the coarse-grained representation. Since CG models must be rederived for every unique mapping, the choice of mapping operator is typically fixed by chemical intuition without studying the impact of this choice. For example, common mapping operators for water combine 1–4 molecules into a single bead.^{16,17} Protein CG models sometimes use multiple operators, one for the backbone (2–4 atoms per bead) and others for specific side-chains (1–2 atoms per bead).^{16} Most CG models for nucleic acids separately combine each base, sugar, and phosphate into 1–3 beads.^{16} In the MARTINI scheme, 2–4 heavy atoms are beaded together and matched to one of 18 types classified by charge and polarity.^{18–20} Faced with this diversity of *ad hoc* mappings, several groups have attempted to develop systematic mapping procedures that conserve stereochemistry,^{21} topology,^{14} and symmetry.^{13} This diversity of mapping strategies begs several questions related to representability, including whether better mappings exist for a given CG resolution, if lower resolution models are possible without significantly reducing accuracy, and if CG models are robust to mapping choices.

The information limitations of CG models are determined by the kind and quantity of information that are utilized in the model parameterization. Bottom-up CG models are parameterized to reproduce the microscopic details of specific higher-resolution models (usually atomistic, AA). Although only a subset of information is utilized by typical parameterization algorithms, the compelling idea behind bottom-up CG modeling is that sufficient chemical specificity is conserved in the simpler CG models to enable these to be used in understanding the properties of the original and related AA systems. In contrast, top-down models are parameterized to reproduce macroscopic thermodynamic quantities such as density, surface tension, vapor pressure, partition energy, solvation energy, etc., which are usually obtained from experiments.^{22–27} Since *a priori* there is no expectation that a top-down model should map onto a particular chemical system,^{4} the information and representability limitations of these models are fundamentally different (and thus not addressed here). Bottom-up parameterization algorithms include: iterative Boltzmann inversion (IBI),^{28} inverse Monte Carlo,^{29} and Yvon-Born-Green theory,^{30} which use structural correlations as the target parameterization property; Multiscale Coarse-Graining (MSCG),^{31,32} which uses forces as the target property; and relative entropy,^{33} in which the log likelihood between the conformation sampling generated by the atomistic model and the coarse-grained model is maximized. The information limitations of a bottom-up CG model are thus determined in practice by the choice of parameterization algorithm and the atomistic information that it utilizes. To that end, some bottom-up coarse-grained methods that utilize multiple atomistic properties to parameterize potentials have also been developed. These coarse-grained models have shown improved performance by simultaneously capturing properties such as energy,^{34} entropy,^{35} pressure,^{5,36,37} and density fluctuations,^{38,39} in addition to the more common forces and structural information.

Here, we present the results of a systematic study of the accuracy tradeoffs associated with dimension reduction and mapping operator selection in bottom-up CG modeling. In general, these tradeoffs depend on the intrinsic dimensionality of the specific fine-grained system being modeled, as well as the choice of evaluated properties. For systems of practical complexity, such as liquids, mixtures, and soft materials, it is likely not possible to establish these trade-offs analytically; however, with modern simulation throughput we can empirically elucidate this tradeoff with systematic studies of mapping-related errors. To this end, we have systematically generated all possible coarse-grained models that conserve a set of chemically motivated locality and topology relationships for three liquids (pentane, 1-butanol, and 1,3-propanediol). In turn, standard algorithms (iterative Boltzmann inversion with pressure correction and multiscale coarse-graining) were used to parameterize the models, and CG predictions for the 54 total models were compared with AA results for on-target properties (i.e., properties that are directly utilized in the parameterizations) and off-target properties. Since on-target properties are directly supplied during parameterization, they exhibit no information limitations. Hence, the accuracy of on-target properties is diagnostic of representability limitations in the CG models. Likewise, a weak mapping dependence of the accuracy of on-target properties also indicates minimal representability limitations with respect to the mapping operator alone. In contrast, information limitations are evaluated based on the performance of the CG models in reproducing off-target properties. Since off-target properties are not included in the parameterization directly, their accuracy is limited by the amount of indirect information supplied by the on-target AA parameterization data. We observe poor reproduction of off-target properties and a weak mapping dependence of on-target properties, which leads us to conclude that the current CG models are *information* limited. In turn, this raises the possibility that with better parameterization algorithms it will be possible to develop CG models of comparable complexity to those presented here that reproduce additional fine-grained properties.

## II. METHODS

### A. Mapping operator generation

A resolution-based algorithm was developed for generating the mapping operators in this study. Here, resolution (*R*) is an integer corresponding to the number of nonhydrogen atoms (“heavy atoms”) to be combined into each CG bead. Mapping proceeds by (i) combining the indices of all hydrogen atoms with their bonded heavy atoms, (ii) identifying the molecular backbone based on the pair of heavy atoms with the largest graphical separation, then, starting at one end of the backbone, (iii) combining *R* consecutive heavy atoms (and their hydrogens) into each bead, until reaching the end of the backbone or running out of consecutive heavy atoms divisible by *R*, (iv) separately combining the remaining groups of connected heavy atoms (i.e., groups smaller than *R*) to complete the mapping, (v) incrementing the starting atom, and repeating [(iii)–(v)]. Mappings are generated in this way until *n* (remainder on dividing backbone length by resolution *R*) increments have been performed, in which case no new mappings will be obtained. The final mapping operators are obtained by converting these index-based mappings into matrices for calculating the center-of-mass of each group of atoms.

The mapping algorithm is depicted schematically in Fig. 1 for the case of pentane and *R* = 3. The algorithm has the feature that it will not combine heavy atoms into a bead if they do not form a connected subgraph [Fig. 1(b)]; this is consistent with local realism and also reduces the number of possible mappings. For cases where the number of heavy atoms in a molecule is not divisible by *R*, the algorithm will return mappings with mixed resolution. As described, the algorithm will always place these higher resolution beads at the edge of the molecular graph. For large macromolecules, this is physically desirable, since it ensures that monomer units are consistently mapped. In this framework, common “united-atom” CG models are obtained with *R* = 1 [Fig. 1(c)] and single-bead CG models correspond to *R* = 5 for all the studied liquids [Fig. 1(d)].

In the current study, we parameterized models for pentane, 1-butanol, and 1,3-propanediol based on all mappings for *R* = 1–5. These molecules were chosen because they have the same chain length but varying degree of anisotropic behavior depending on the presence of alcohol groups. For each molecule, there are a total of 9 mappings for the possible resolutions from 1 to 5. In a few cases, these mappings are symmetry equivalent, but are reported as independent models as an additional consistency check on our parameterization procedures. The structure of the mappings is similar across all molecules (Fig. 2), the only difference is that the end group beads may have a methyl or hydroxyl group depending on the molecule. CG beads that consist of the same number, topology, and types of atoms are assigned to be of the same CG bead type and are described by the same parameters in the models. In all the presented figures, the unique bead types for each mapping are indicated by color.

Although the molecules in this study are simple enough to enumerate these mappings by hand, the outlined algorithm is part of a more general procedure that will be described elsewhere. We note that additional modifications are required for molecular systems that contain rings and side-chains.

### B. Iterative Boltzmann inversion

Iterative Boltzmann Inversion (IBI) is an iterative method to generate CG potentials that reproduce the radial distribution functions [*g*(*r*)] of high-resolution, typically atomistic, models.^{28} The target *g*(*r*) is sampled by coarse-graining a previously generated atomistic trajectory at a fixed thermodynamic state. The initial guess for each pair potential is the potential of mean force obtained by Boltzmann inversion,

where *U*_{αβ}(*r*) is the pair potential between CG particles of type *α* and type *β*, *T* is the simulation temperature, *g*_{αβ}(*r*) is the pair-specific radial distribution function, *V*/*N*_{α}*N*_{β} corresponds to the density of pairs in an ideal gas, *δ* is the Dirac delta, *r*_{ij} is the distance between particles i and j, and dividing the summation by 4*πr*^{2} normalizes the density of pairs at each separation *r*. Pairs separated by less than four bonds are neglected in both the summation and denominator calculations. This guess potential is simultaneously corrected for both the structural property [i.e., *g*(*r*)] and simulation pressure using an iterative procedure according to

where *U*(*r*)^{k} is the potential at the kth iteration, the *α* and *β* subscripts are omitted for clarity, and the *r*-dependence is discretized using tabulated potentials. The first correction approaches zero as the radial distribution function obtained from the CG simulations [*g*(*r*)^{k−1}] approaches the target distribution [*g*(*r*)^{AA}]. The second correction goes to zero at the potential cutoff (*r*_{cut}) and is updated proportional to a weight parameter (*ω*) and *A*, which is calculated according to

In Eq. (4), *P*^{k} and *P*^{AA} correspond to the pressure in the CG simulation and AA simulation, respectively, *N* is the number of particles in the AA simulation, and *ρ* is the mass density of the CG simulation. *A* approaches zero as the pressure from the CG simulation approaches the target atomistic pressure.^{5} Both correction terms are averaged over n independent CG simulations performed at each iteration, denoted by the summation over index i in Eq. (3).

IBI is similarly implemented for bond, angle, and dihedral potentials. The corresponding initial guess and update equations are identical to Eqs. (1) and (3), with the pair radial distribution function [*g*_{αβ}(*r*)] being replaced by the distribution function [*P*_{γ}(*κ*)] of the corresponding interaction type, *γ*, and omission of the pressure correction term. The distribution functions for bonds, angles, and dihedrals are given by $Pb(r)=Hb(r)4\pi r2$, $Pa(\theta )=Ha(\theta )sin\u2061\theta $ and *P*_{d}(*ϕ*) = *H*_{d}(*ϕ*), respectively, where *H*_{γ}(*κ*) is the histogram of the quantity *κ* governing the interaction type *γ*. Additional IBI implementation details, including treatment of undersampled regions, cutoffs, bin sizes, and convergence behavior are included in the supplementary material.

### C. Multiscale coarse-graining

The Multiscale Coarse-graining (MSCG) framework was developed in Ref. 31, where the error between the bead forces in the atomistic representation (i.e., calculated by mapping the atomistic forces) and the coarse-grained representation (i.e., using the CG potentials on the mapped atomistic configurations) was proposed as a variational functional. It was shown that minimizing this functional results in equivalent phase distributions of the coarse-grained system when sampled from the atomistic or coarse-grained representations. The system of equations can be further linearized using spline potentials yielding

where $f\gamma i(\kappa )\u2207i\kappa ({rk,t})$ is the force in the CG representation on the ith site due to the *γ*_{i} type of interaction (e.g., pair potential, bond potential, etc.), and *κ* is the value of the variable associated with that interaction (e.g., pair radial distance, bond radius, etc., which is a function of the position of site i) in snapshot t.^{32,40} Summation over *γ*_{i} gives the net force on the site i, which is parameterized to equal the force obtained for that site in the atomistic representation ($Fi,tAA$). The function $f\gamma i(\kappa )$ is the negative of the derivative of the potential $U\gamma i(\kappa )$, with respect to the variable *κ*, and *∇*_{i}*κ*({r_{k, t}}) is the gradient of the variable *κ*, with respect to the position vector of site i. The potentials $U\gamma i(\kappa )$ take the form of spline polynomials, and their coefficients are obtained by solving this linear system of equations. To alleviate memory constraints, the atomistic trajectory is divided into blocks, the linear system of equations (5) is solved for each of the blocks, and the final forces are given by an average over forces obtained in each block.

In this study, MSCG was only used to parameterize nonbonded potentials, whereas the bonded potentials were obtained in all cases from IBI. Therefore, when we discuss errors in certain properties for IBI vs MSCG generated potentials, it should be interpreted as IBI nonbonded + IBI bonded vs MSCG nonbonded + IBI bonded. This hybrid method of using MSCG only for nonbonded potentials and using distributions for obtaining bonded potentials has been shown to perform better than using MSCG for all potentials.^{41} All MSCG models were parameterized using the Versatile Object-oriented Toolkit for Coarse-graining Applications (VOTCA) package.^{40} Additional details of the specific VOTCA settings used for the MSCG procedure are discussed in the supplementary material.

### D. Property calculation

Five properties—forces, *g*(*r*), self-diffusion coefficient (*D*), velocity autocorrelation function (VAC), and enthalpy of vaporization (Δ*H*_{v})—were calculated for each CG model and compared with the corresponding atomistic results. Here, force and *g*(*r*) are used as on-target properties (i.e., properties that are explicitly included in the model parameterizations), while *D*, VAC, and Δ*H*_{v} are off-target properties (i.e., properties that are not included in the model parameterizations). The mathematical expressions and numerical details involved in the computation of these properties and corresponding errors are presented in the supplementary material. In brief, for forces, the norm squared error is computed for all beads and averaged over all beads. For *g*(*r*) and VAC, the absolute error at each radial distance and time-difference, respectively, is computed and averaged. The *g*(*r*) error is additionally weighted by the exponential of the atomistic radial distribution function at each radial distance. For *D* and Δ*H*_{v}, the signed error is reported. In all cases, the errors are normalized with respect to atomistic representation, to obtain a normalized metric that is comparable across different mappings and molecules.

### E. Molecular dynamics simulations

LAMMPS was used to perform all molecular dynamics simulations.^{42} All atomistic reference simulations used a 1 fs integration time step, velocity-Verlet integration, and periodic boundary conditions. Long-range electrostatics were modeled using the particle-particle-particle-mesh (PPPM) algorithm,^{43} and Lennard-Jones interactions were truncated at 14 Å. All simulations were initialized from diffuse configurations containing at least 2000 atoms, using a cubic grid to place molecules in random orientations without overlap. The velocities were initialized from a uniform distribution obtained with a random seed value and scaled to give the correct kinetic energy. The simulations were first relaxed in the NVE ensemble with restrained atomic displacements of 0.1 Å per time step for 10 ps, followed by a 1 ns NPT equilibration at 298 K and 1 atm. The simulations were extended at 298 K and 1 atm for 2 ns in the NPT ensemble to obtain the average density. In the NPT simulations, the Nosé-Hoover thermostat and barostat were employed in the modified form proposed by Martyna, Tobias, and Klein, as implemented in LAMMPS,^{44} with a relaxation time constant of 0.1 ps and 1 ps for the thermostat and barostat, respectively. From these initial simulations, five independent NVT simulations were generated with velocities reinitialized as described above. These simulations were first relaxed in the NVE ensemble with restrained atomic displacements of 0.1 Å per time step for 10 ps, then the simulation box was rescaled over 10 ps to obtain the correct average density, followed by a 1 ns NVT equilibration at 298 K. The production data were obtained by extending these simulations for 10 ns. The thermodynamic data and coordinates were sampled at 0.1 ps and 1 ps for simulations in the NPT and NVT ensembles, respectively.

The coarse-grained simulations used to evaluate the structural and dynamic properties of each model were performed in the NVT ensemble. The initial configuration for these simulations was obtained by starting from a diffuse configuration of 1000 particles and equilibrating to the known atomistic density by the same NVT equilibration procedure as in the atomistic simulations. The production data for each property were obtained by then extending the simulations to appropriate lengths to obtain statistical convergence (see supplementary material for property specific trajectory lengths). Where independent trajectories were simulated, the velocities were reinitialized for each simulation, and an equilibration run of 1 ns was performed before starting the production run. All CG simulations used tabulated potentials and excluded long-range electrostatic interactions. Experimentation with models that included a long-range electrostatic component seemed to have no effect (not shown).

To provide consistent parameterizations of each liquid, all atomistic force-fields in this study were derived from quantum chemistry calculations using the topology automated force-field interactions methodology (TAFFI).^{45} In brief, all force field terms were parameterized on the basis of density functional theory (DFT) quantum chemistry calculations, using the B3LYP-D3/def2-TZVP level of theory computed via the Orca software package.^{46} The force fields were parameterized using the Optimized Potentials for Liquid Simulations (OPLS) force-field functional form,^{47} except that 1–4 pairwise interactions were excluded in the nonbonded interaction computation. Bond, angle, and dihedral force-field terms were derived from potential energy curves computed for the internal degrees of freedom for each molecule in vacuum, optimizing the other degrees of freedom as a function of the mode scan. The resulting energy curves were self-consistently fit to obtain the corresponding force-constant parameters and equilibrium displacement parameters in the force field. Partial charges were obtained from Charges from Electrostatic Potentials using a Grid-based (CHELPG) method calculations on molecular configurations sampled from the liquid, and Lennard-Jones parameters were parameterized on the basis of counterpoise-corrected interaction energies calculated for molecular pairs sampled from the liquid state.

The atomistic molecular dynamics simulations for pentane, 1-butanol, and 1,3-propanediol contained 143, 134, and 154 molecules (equivalent to at least 2000 atoms) and an average density of 0.65 g/cc, 0.81 g/cc, and 1.04 g/cc was obtained, respectively. Sample input files for all atomistic simulations are distributed with this paper. Additional details pertaining to the sampling and details of the property calculations are included in the supplementary material.

## III. RESULTS AND DISCUSSION

The CG simulation results are presented separately for on-target (i.e., properties that are explicitly included in the model parameterizations) and off-target (i.e., properties that are not included in the model parameterizations) properties. The performance of on-target properties with respect to mapping has significance for the representability limitations incurred by the choice of the mapping operator. The performance of off-target properties with respect to mapping has separate significance with respect to how the approximations represented by each parameterization algorithm are affected by the CG dimension reduction. We observe qualitatively different behavior for these two classes of properties, with implications for the representability and information limitations of CG models.

### A. On-target properties

Figure 3 presents the results for the average errors in bead forces for each CG model and the reference atomistic data. These values have been normalized to the average force in the atomistic representation to provide a measure of the relative error (see the supplementary material for additional details). The bead forces are directly parameterized in the MSCG algorithm via a least-squares minimization, whereas the forces are only indirectly included in the IBI algorithm through the potential of the mean force.

First, we note that IBI and MSCG exhibit very similar error residuals, despite the fact that IBI does not utilize the force information directly in the parameterization. Although the potentials derived from IBI and MSCG show significant differences in all cases (Fig. S1), comparable force residuals are obtained. Considered together, these observations imply that the force error optimization surface is nonconvex and relatively shallow with a broad manifold of CG potentials that are capable of approximately minimizing the residual. As a further illustration of this, even the Boltzmann inverted (BI) potentials, which are distinct from both IBI and MSCG potentials (Fig. S1), show comparable mean force errors to MSCG and IBI for lower resolutions.

Second, the force errors are remarkably large for all resolutions, models, and algorithms. We note that the error residuals for MSCG are rarely reported.^{32,48} In the current case, the residual errors across all models vary between 1 and 38 kcal^{2} mol^{−2} Å^{−2}, which are smaller or equivalent *χ*^{2} values than those reported in Ref. 32 for methanol and a model ionic liquid (∼21 and 163 kcal^{2} mol^{−2} Å^{−2}, respectively). For additional clarification, we also present the distribution of errors in the force magnitude in Fig. 4 for two illustrative cases. Comparison with the null (i.e., zeroed out CG potentials) confirms that IBI and MSCG are capturing some atomistic force information, but this primarily manifests in minimizing the extreme values of the training forces, as seen in the inset of Fig. 4. In particular, the null model yields comparable median force errors to either of the CG models, with a slightly higher mean force error. The relatively small improvement over the null model with respect to the mean force error for IBI and MSCG provides further evidence that the force minimization exhibits a shallow minimum. One interpretation of these high overall errors is that the chosen functional form for the CG models, specifically the use of isotropic pairwise interactions, leads to representability limitations in all cases. Perhaps, most surprising is the large errors for R = 1, which is the mapping that corresponds to typical united-atom force fields that have been broadly adopted in experimentally parameterized force fields. These errors suggest that even at the united atom level, reproduction of the atomistic forces is representation limited. As discussed below, errors in the molecular forces do not necessarily imply that the CG models do not reproduce other significant properties of the atomistic models; however, it does bear on the question of whether molecular forces supply sufficient information in practice to parameterize CG models.

Third, there is a consistent increase in the force error as the model resolution decreases. However, this trend is negligible in comparison with the high overall errors. This suggests that all the models are representability-limited due to the insufficiency of isotropic CG potentials in reproducing the atomistic forces, whereas the representability limitations associated with the mapping operator are small. Likewise, there is no effect on the force error upon the addition of anisotropic moieties (i.e., comparing pentane to 1-butanol to 1,3-propanediol). For a more expressive basis of force field functions, the mapping effects in each of these cases could be more significant.

The *g*(*r*) errors, averaged across all pair types and radial distances, are shown in Fig. 5 for IBI and MSCG. For these data, a probability weighted average has been utilized for averaging over the radial component to avoid poorly sampled regions from dominating the average. To better visualize the *g*(*r*) mismatch at each radial distance, the *g*(*r*) distributions for IBI and MSCG models are compared with target AA distributions for their respective minimum and maximum error cases in Fig. S2. *g*(*r*) is used as the target property to fit potentials in IBI but is not directly utilized in MSCG.

First, we note the qualitative difference between the IBI and MSCG results. IBI exhibits near perfect reproduction of the *g*(*r*) for all molecules and mappings, whereas MSCG exhibits worse overall reproduction in most cases. Moreover, we observe that the MSCG models show decreasing performance upon addition of anisotropic moieties (i.e., the errors for 1,3-propanediol and 1-butanol are greater than that of pentane). This comparison suggests a potential asymmetry in information content supplied by the atomistic *g*(*r*) and forces. We have observed that the IBI algorithm, which utilizes only *g*(*r*) information, also results in force errors that are comparable to MSCG. However, MSCG, which utilizes only force-information, has much higher *g*(*r*) errors than IBI. This asymmetry suggests that the atomistic *g*(*r*) is either more information rich or numerically expedient as a CG parameterization target.

Second, we stress the significance that IBI is capable of accurately reproducing the *g*(*r*) for all mappings and molecules. At the face value, this is simply evidence that IBI is doing what it purports to do. However, the absence of a mapping dependence also indicates that these isotropic CG models are not representability limited with respect to reproducing the *g*(*r*), whereas they were representability limited with respect to reproducing the atomic forces. Thus, representability limitations can strongly depend on the target properties of the CG model.

Third, we also note that in the cases where IBI and MSCG similarly reproduce the *g*(*r*) [e.g., comparing the *g*(*r*) results for the 1-butanol r5m0 mapping], they also exhibit significant differences in the associated potentials (Fig. 6). Obtaining similar *g*(*r*) from qualitatively different potentials suggests that the *g*(*r*) optimization surface, like the force optimization surface, is nonconvex and relatively shallow. We additionally report the potentials for IBI without pressure correction (IBI-NP) and IBI initialized from MSCG potential as the initial guess (IBI-MSCG) in Fig. 6. These additional models also exhibit different converged potentials and yet reproduce *g*(*r*). Several other groups have made a similar observation, in that although the Henderson theorem^{49} formally guarantees the uniqueness of the potential that produces an *g*(*r*) up to a constant, in practice the potential is underdetermined by the *g*(*r*) alone.^{5,7,50} The implication of a nonconvex *g*(*r*) optimization surface is that isotropic pairwise potentials are potentially capable of supporting more physical information.

### B. Off-target properties

The self-diffusion coefficient and enthalpy of vaporization were calculated for each CG model and compared with the atomistic simulations. These serve as off-target properties for both MSCG and IBI. By reproducing key target properties [i.e., forces and *g*(*r*)], MSCG and IBI potentially transfer physical information for additional system properties to the CG models. Our goal here is to quantify the extent of this information transfer as the model resolution decreases and to highlight the implications for the information limitations of the models.

The relative error in self-diffusion coefficients, averaged over all trajectories and normalized with respect to the atomistic coefficients, is shown for each mapping and molecule in Fig. 7. It is generally known that CG models parameterized with MSCG and IBI substantially overestimate diffusivity, and several strategies have been proposed to correct this.^{6,51} Here, we likewise observe a dramatic overestimation of the diffusivity for both MSCG and IBI. The two algorithms perform similarly in all cases with a small but consistent trend of higher accuracy for IBI. Izvekov and Voth presented a general argument for this diffusivity overestimation, whereby the missing degrees of freedom in the CG model manifest as a stochastic friction in the atomistic model that is missing from CG dynamics.^{51} Following this argument, it is unsurprising that the diffusivity errors also show a trend of increasing error with decreasing resolution. However, several quantitative aspects of this comparison are relevant to the accuracy of each CG approximation with respect to dimension reduction.

First, we note that the diffusivity errors show a strong increase with the number of anisotropic moieties in each molecule (i.e., the errors are ranked as 1,3-propanediol > 1-butanol > pentane). However, this trend is the reverse of the number of degrees of freedom that are lost during the CG procedure for each liquid (i.e., for the same mapping, the lost degrees of freedom are ranked as pentane > 1-butanol > 1,3-propanediol). Thus, the diffusivity overprediction is a nontrivial function of the chemical nature of the coarse-grained degrees of freedom, with those associated with more strongly interacting groups having a disparate impact on the CG dynamics. These results also suggest that there is no generally applicable rescaling of the CG dynamics based on resolution, since the associated speedup is a function of the specific degrees of freedom being coarse-grained.

Second, the diffusivity errors correlate more strongly with the number of CG beads in each model than with the resolution. For a fixed number of beads, a mapping effect is observed in some cases (e.g., for 1,3-propanediol, r3m0, which retains a high-resolution bead on both hydroxyl groups, performs better than r2m0 and r2m1, which do not retain a high-resolution hydroxyl bead), however, it is minute in comparison with the effect of changing the number of beads. This weak mapping dependence provides justification for the *ad hoc* mappings that are typical in the CG literature and indicates that the number of beads affects the accuracy of the CG approximation much more strongly than the specific mapping operator.

The relative error in the VAC, averaged over bead types and times less than 500 fs and normalized with respect to the atomistic values, is shown for each mapping and molecule in Fig. S3. Whereas the diffusivity corresponds to the long-timescale integral of the VAC, a comparison of the VAC errors indicates differences in the time-resolved dynamics between the CG models and atomistic simulations on the short timescale. The same qualitative error trends observed for diffusivity are observed for the VAC errors. Increasing errors are exhibited with respect to the increasing number of anisotropic groups in the molecule, and the correlation between accuracy and the number of beads is stronger than the effect of mapping changes for a fixed number of beads. The overall conclusion is that the short-timescale fidelity of the CG dynamics follows a similar dependence to the diffusivity.

The relative error in Δ*H*_{v} is presented in Fig. 8 for all molecules and mappings. The relative error is calculated as the difference between the CG and atomistic enthalpies of vaporization, normalized by the atomistic value, and averaged over multiple independent trajectories. We observe that IBI exhibits consistently smaller errors compared to MSCG, whereas both underestimate Δ*H*_{v}. As with diffusivity, several quantitative aspects of this comparison are relevant to the accuracy of each CG approximation with respect to dimension reduction.

First, we note that the reduced Δ*H*_{v} errors in IBI compared with MSCG can be attributed to the additional attractive interactions in IBI CG models resulting from pressure corrections. This is confirmed by comparison with the Δ*H*_{v} errors for IBI without pressure correction (IBI-NP), which are larger than IBI and, in some cases, MSCG. Thus, IBI with pressure correction is not only capable of reproducing two on-target properties [the *g*(*r*) and pressure], but also is able to capture additional physical information, which is reflected in an off-target property. This supports the underlying premise of bottom-up coarse-graining (i.e., that reproducing a subset of critical atomistic properties will simultaneously reproduce other off-target properties), but also suggests that the existing algorithm is likely underdetermined with respect to the target information supplied.

Second, we note that the Δ*H*_{v} error generally increases with respect to the number of anisotropic moieties in each molecule (i.e., the absolute error ranked as 1,3-propanediol > 1-butanol > pentane). For IBI, this dependence is weak, but can still be seen for intermediate mappings (e.g., comparing the 2 and 3 bead mappings across the liquids). Whereas for MSCG, the underestimation increases with the number of anisotropic moieties, while also exhibiting a stronger resolution dependence.

Finally, we note that the Δ*H*_{v} errors correlate more strongly with the number of beads, rather than the resolution of the model. However, the mapping dependence of errors is weaker for IBI models compared to the MSCG models, and no consistent trend is observable for IBI-NP. The Δ*H*_{v} error for IBI-NP is inversely correlated with the final pressure for the CG model (not shown), which explains the lack of mapping dependence and is also consistent with the interpretation that the pressure information supplied in IBI results in its low Δ*H*_{v} errors. The weak mapping dependence of IBI for enthalpy of vaporization [similar to *g*(*r*)] coupled with the correlation between the simulation pressure and Δ*H*_{v} error suggests that the IBI models are information limited, rather than representability limited.

## IV. CONCLUSIONS

We have systematically investigated the effect of dimension-reduction on the ability of CG models to reproduce the on-target and off-target properties of fine-grained atomistic systems. The weak dimension-dependence of the on-target property accuracy provides evidence that these models are information limited, rather than representability limited. We note that the question of representability limitations for off-target properties cannot be addressed here, because, by virtue of being off-target, these properties were not included in the objective function of the two parameterization algorithms (IBI and MSCG). Presumably at some point simple isotropic pairwise potentials will have to make compromises in the number and accuracy of atomistic properties that they can reproduce. However, until algorithms are developed that can flexibly incorporate multiple properties into CG parameterization, the general representability limitations of isotropic CG models will be left unanswered. In contrast, for on-target properties, we have observed that the reproduction of atomistic forces is deeply in the representability limited regime for all the models that we investigated, while *g*(*r*) reproduction is far from the representability limited regime. The former observation suggests that the atomistic forces are poorly conditioned in practice for CG model parameterization, whereas the latter observation suggests that pairwise potentials are capable of supporting more physical information than is provided by the *g*(*r*) alone.

The results presented here also demonstrate that the CG model accuracy for off-target properties is strongly sensitive to dimension reduction. One interpretation of these results is that the low-resolution models are representability limited, with the consequence that fewer properties can be reproduced in the low-resolution limit. As pointed out above, the representability limitations of off-target properties cannot be assessed by the current study. An alternative interpretation that may be useful in guiding further studies is that the IBI and MSCG approximations become less useful at low-resolutions for reproducing off-target properties. In this latter interpretation, the low-resolution models are information limited, rather than representability limited. This latter interpretation is supported by the nonconvex optimization surfaces of both *g*(*r*) and the atomistic force errors, which admit a potentially large range of parameterizations. Additionally, for IBI, the capability to reproduce the target *g*(*r*) is nearly unaffected by resolution, and the accuracy is uniformly high in all the cases. Thus, the IBI models are not representability limited for their target property and can potentially accommodate more information by including additional atomistic data into the parameterization algorithm. In the case of MSCG, the force errors are uniformly high in all cases, leading us to conclude that all the models are representability limited with respect to the force reproduction, while cautioning that these models do still reproduce off-target properties to a variable extent. Thus, while the MSCG models are representability limited for their target property, they too can potentially accommodate more information by including additional atomistic data into the parameterization algorithm.

Although atomistic molecular dynamics yields an abundance of potential information for CG models to reproduce, existing algorithms utilize only a small fraction of it during parameterization. The results presented here suggest that isotropic pair potentials are likely capable of supporting additional chemical details beyond reproducing the atomistic *g*(*r*), forces, and pressure. Moreover, we have observed that what have been previously classified as representability limitations are more likely information limitations that could be addressed with algorithm development and without increasing model complexity. In our forthcoming work, we will explore in additional detail the proper attribution of these errors.

## SUPPLEMENTARY MATERIAL

See the supplementary material for additional details on the parameterization procedures, property calculations, VAC results, and sample atomistic input files.

## ACKNOWLEDGMENTS

This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant No. ACI-1548562. Simulations were performed on the Comet supercomputer at the University of California, San Diego, under the Allocation No. TG-CHE190014. The work performed by B.M.S. was made possible through the Air Force Office of Scientific Research (AFOSR) under support provided by the Organic Materials Chemistry Program (Grant No. FA9550-18-S-0003, Program Manager: Dr. Kenneth Caster).