Bottom-up coarse-grained (CG) modeling is an effective means of bypassing the limited spatiotemporal scales of conventional atomistic molecular dynamics while retaining essential information from the atomistic model. A central challenge in CG modeling is the trade-off between accuracy and efficiency, as the inclusion of often pivotal many-body interaction terms in the CG force-field renders simulation markedly slower than simple pairwise models. The Ultra Coarse-Graining (UCG) method incorporates many-body terms through discrete internal state variables that modulate the CG force-field according to, e.g., changes in local environment when substantial chemical heterogeneities exist. However, assigning optimal internal states systematically from atomistic simulation data, as well as the practical application of bottom-up UCG theory to biomolecular systems, remain open problems. We develop two synergistic methods to aid in the development of UCG models that can capture inhomogeneities in atomistic systems such as those induced by phase coexistence. The first method establishes the systematic construction of UCG force-fields from a relative entropy minimization principle, while the second method utilizes machine-learning to obtain optimal local order parameters for enhanced model efficiency and transferability. We apply these methods to a methanol liquid–vapor interface and the ripple phase of a 1,2-dipalmitoyl-sn-glycero-3-phosphocholine lipid bilayer and demonstrate that UCG modeling alone recapitulates aspects of phase coexistence that are otherwise not observed in CG modeling.
I. INTRODUCTION
Molecular dynamics (MD) enables the computational study of chemical and biomolecular systems at an atomistic resolution.1 However, the number of atoms capable of being feasibly simulated and the timescales that can be reached via MD constitute a severe limitation. Coarse-graining (CG) methods systematically reduce the number of degrees of freedom in an atomistic system through the construction of a CG model, extending both the spatial and temporal limits available through MD simulation.2–6 CG modeling approaches are generally considered either top-down, in which the CG force-field is hand-tuned to reproduce experimental data, such as in the Martini model,2 or bottom-up, in which atomistic MD simulation is used as reference data to inform model parameterization.5,6 In particular, bottom-up CG methods obtain an effective potential energy landscape to evolve the CG degrees of freedom through variational statements that directly connect the CG Hamiltonian to the atomistic Hamiltonian through statistical mechanics. Ideally, the CG Hamiltonian will reproduce the many-body potential of mean force (mbPMF).5,6
Bottom-up CG methods can generally be divided into two categories: force-matching based methods, including the Multi-Scale Coarse-graining7,8 (MS-CG), also referred to as Force-Matching (FM), and generalized Yvon Born Green methods,9 and structure-matching based methods, such as the relative entropy minimization10–12 (REM) and iterative Boltzmann inversion13 (IBI) methods. While the MS-CG and REM methods are conceptually related,14 and each method will reproduce the same mbPMF given an infinitely expressive basis set, in practice the optimal force-field according to each variational method will differ when a limited basis set is utilized. Through the development of CG models of a variety of systems, it has become increasingly apparent that the inclusion of multi-body terms in the CG force-field is often pivotal in providing a fully accurate representation of the statistical behavior of the CG model with respect to the underlying atomistic system.15,16 Such multi-body terms have included interactions that are physically interpretable, such as the Stillinger–Weber three-body interaction potential16 and local order parameter-based interactions,17 as well as interactions whose recapitulation of many-body terms is more opaque, including virtual particle-based methods18–20 and neural network force-fields.21–23 An alternative approach is to introduce discrete internal degrees of freedom, or “states,” which correspond to conformational changes that occur within the CG site, i.e., involving degrees of freedom that have been integrated out.24–26 This approach has been applied to various biomolecular systems,27–29 allowing for significant heterogeneity in model behavior, and has been applied to model, e.g., the HIV-1 virus capsid assembly process from more than a thousand proteins.30,31 However, the inclusion of these discrete degrees of freedom within the CG Hamiltonian can be largely ad hoc.
The Ultra-Coarse-Graining (UCG) method24–26 is a means of incorporating discrete state degrees of freedom in a CG force-field from a statistical mechanical standpoint. In UCG theory, some or all of the CG sites in a CG model possess a discrete latent variable referred to as an internal state. The probability of a CG site being in a particular state can be expressed conditionally as a function of either the atomistic or CG coordinates. For example, a CG protein model may possess internal states related to being folded or unfolded, and the probabilities of observing these states may be expressed as a function of the pertinent collective variables. Each CG site possesses a distinct interaction term for each internal state the site can be in. By introducing these internal state degrees of freedom, the CG configurational space can be effectively partitioned into separate regions for which different CG force-fields may be appropriate, enabling significantly enhanced model flexibility. Defining internal state degrees of freedom in this manner allows for an extended definition of thermodynamic consistency between the CG and all-atom (AA) models and the development of variational statements that can be employed to construct bottom-up UCG force-fields. Simulation of a UCG force-field can be conducted either via a kinetic Monte Carlo-like criterion for the internal state evolution,25 which dynamically switches states using kinetic information extracted from AA simulation, or the UCG rapid local equilibrium (UCG-RLE) method,26 which proceeds via direct simulation of an effective Hamiltonian whose forces are a conditioned average over the internal state degrees of freedom while assuming those internal states rapidly equilibrate for each motion (MD time step) of the CG beads or sites.
The UCG methodology has been applied in recapitulating hydrogen bonding in CG liquids,32 UCG modeling of interfaces,33 the conformational behavior of peptides,34 the behavior of actin filaments undergoing ATP hydrolysis,35,36 and the assembly of the HIV-1 virus capsid.30,31 However, in all cases considered so far, physical intuition was employed to construct UCG internal state definitions that were descriptive of the system inhomogeneities and computationally inexpensive for simulation. The successful application of UCG theory is contingent upon defining the internal states such that they correlate to the heterogeneities present in the atomistic system while also remaining practical for the CG simulation. Defining such states both rigorously and variationally, therefore, remains an unexplored but critically important area, and it is the central topic of this paper. In addition, the MS-CG formulation of UCG theory is currently the only bottom-up method that has been explored for variational optimization of UCG force-fields. Structure-based approaches to bottom-up CG often provide a complementary approach to force-based approaches when a limited basis set is employed,18 and hence the development of a structure-based approach to UCG is warranted. These two deficiencies in the current UCG theory currently preclude its general advancement in modeling of heterogeneous chemical and biomolecular systems.
In this work, we introduce two methods to address these challenges. Our first method, UCG-REM, establishes UCG theory from a relative entropy minimization principle, enabling structural correlation matching from a UCG perspective. Our second method, internal state regression (ISR) UCG, is a machine-learning method that, given a physically motivated order parameter, utilizes graph message passing to extract optimal internal state definitions that are locally defined and computationally inexpensive for simulation. The ISR-UCG method can be incorporated to assign internal states for systems wherein state definitions directly from the physically relevant order parameter are not computationally optimal either due to potentially global or anisotropic descriptions or increased computational burden. The ISR-UCG method instead employs logistic regression to assign an internal state definition according to the local environment of a CG bead in a spatially isotropic manner.
We demonstrate theoretically that the application of the ISR-UCG and UCG REM methods guarantees recapitulation of aspects of phase coexistence in CG models of chemical systems and provide two practical examples. We apply these two methods to a model of a liquid–vapor interface of methanol as a proof-of-concept example to illustrate the features of both methods and to contrast the behavior of structure and force-based UCG models alone. We then apply these methods to construct a highly CG, solvent-free model of the more challenging system of a ripple-phase bilayer of 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC) and demonstrate that UCG modeling is pivotal in recapitulating the multi-phase tail ordering.
II. THEORY AND METHODOLOGY
A. Relative entropy principle for ultra-coarse-grained modeling
B. Variational definition of internal states
Flow chart (left) depicting protocol for creation of a UCG model utilizing the ISR-UCG and UCG REM methods developed in this work, using a DPPC bilayer as an example. The hexatic order parameter is used to define regressed internal states (right, above) using the ISR-UCG method, which trains internal states according to cross-entropy minimization (right, middle). A UCG lipid model is trained via the UCG REM method (right, below).
Flow chart (left) depicting protocol for creation of a UCG model utilizing the ISR-UCG and UCG REM methods developed in this work, using a DPPC bilayer as an example. The hexatic order parameter is used to define regressed internal states (right, above) using the ISR-UCG method, which trains internal states according to cross-entropy minimization (right, middle). A UCG lipid model is trained via the UCG REM method (right, below).
We detail the protocols that were used in AA data collection, CG modeling, and CG simulation below.
III. APPLICATIONS
A. Methanol liquid/vapor interface
1. All-atom molecular dynamics
Simulation results were taken from Ref. 39. A total of 1728 methanol molecules parameterized using the OPLS/AA force-field40 were placed in a 5 × 5 × 10 nm3 slab geometry. Hydrogen atoms were constrained using LINCS, and electrostatics were calculated using the particle mesh Ewald summation method. A Nose–Hoover thermostat was employed to maintain a thermostat at 298.15 K with a coupling time of 0.5 ps. All-atom simulation was conducted for 5 000 000 steps utilizing a 1 fs time step. Positions and forces were recorded every 5 ps as training data for CG modeling.
2. MS-CG
All bottom-up CG modeling was carried out using the OpenMSCG software package.39 For both CG and UCG models, pairwise interactions were parameterized as third order B-splines with a cutoff of 1.2 nm.
3. REM
All CG MD simulations were conducted using LAMMPS.41 REM CG and UCG models were parameterized using fourth order B-splines. REM gradient descent for the CG and UCG models proceeded from parameters obtained via CG and UCG FM, respectively. A learning rate of 1.0 was employed, with the first and last three coefficients held fixed during optimization. A maximum absolute step size of 0.01 kcal/mol was used; all parameter updates were rescaled when this threshold was reached. Both CG and UCG models were trained for 100 epochs. The CG simulation at each iteration consisted of 50 000 time steps with the temperature maintained via a Nose–Hoover thermostat. A time step of 2 fs and 1 fs was employed for CG and UCG models during training, respectively. Production simulations for all CG models employed a 5 fs time step to capitalize on simulation efficiency.
4. ISR
Training was conducted using PyTorch Geometric.42,43 To learn a regressed set of internal states, a learning rate of 8 × 10−3 was used to optimize internal state parameters for 800 epochs with a batch size of 32. The training dataset consisted of 17 280 examples. The cutoff to define the connectivity matrix was 1.2 nm.
B. Ripple phase behavior of a DPPC lipid bilayer
1. All-atom molecular dynamics
The DPPC lipid bilayer consisted of 4608 DPPC lipids distributed symmetrically across each leaflet. The bilayer was solvated with ∼43 waters per lipid in 0.15M NaCl. The lipid bilayer system was constructed using CHARMM-GUI44–46 and simulated using GROMACS.47 The CHARMM36 force-field was employed to parameterize lipids,48 and the TIP3P force-field was used for the parameterization of solvent.49 Hydrogen atoms were constrained using LINCS, and the particle mesh Ewald summation method was employed for electrostatics calculations. The energy of the system was minimized to a force tolerance of 1000 kJ mol−1 nm−1, and constant NPT simulation was carried out using a Nosé–Hoover thermostat50 at 300 K and a semi-isotropic Parrinello–Rahman barostat51 at a pressure of 1 atm with coupling times of 1 and 5 ps, respectively. A time step of 2 fs was used throughout all simulations. After 90 ns of equilibration in the NPT ensemble, the bilayer was simulated in the constant NVT ensemble for 10 ns, which was then used as reference data for CG modeling. Positions and forces were recorded every 5 ps.
2. MS-CG
For both CG and UCG models, pairwise, bonding, and angular interactions were parametrized as third order B-splines with resolutions of 0.03 nm, 0.005 nm, and 0.5°, respectively. For nonbonding interactions, a cutoff of 2.0 nm was employed. Bayesian regularization was conducted for 250 iterations for both models.52
3. REM
Packmol was used to generate a configuration of randomly dispersed lipids, which was used as an initial condition for all simulations in REM.53 Constant NVT simulation was performed via a Langevin thermostat with a coupling constant of 10 and 0.25 ps for the REM and UCG REM models, respectively. A time step of 20.0 fs and 5.0 fs was used throughout training for CG and UCG models, respectively. For both models, the simulation for each REM iteration consisted of 50 000 time steps of equilibration followed by 50 000 time steps, which were used for REM gradient descent. Due to the large parameter space present in optimization, the RMSProp algorithm was employed for gradient descent.54 A learning rate of 8 × 10−3 was used throughout training, and a memory and decay factor of 0.9 and 0.999 were used, respectively. For both CG and UCG REM models, all bonding and angular interactions were held fixed as the potentials obtained from their respective FM models, and the nonbonding potentials obtained from each FM model were utilized as a starting point for REM optimization. Fourth order B-splines with a resolution of 0.03 and 0.1 nm were utilized to describe all nonbonding interactions for the REM and UCG REM models, respectively. Both CG and UCG models were trained for 100 epochs.
4. CG MD
Production statistics were obtained from CG simulations of each bilayer for 150 ns of CG MD simulation starting from a configuration of the mapped atomistic data. Simulation of spontaneous vesicle formulation was conducted via initial placement of lipids in a spherical droplet formation in a 50 × 50 × 50 nm3 box.
5. ISR
Optimization of internal states proceeded via gradient descent with the Adam optimizer and a learning rate of 8 × 10−3 for 200 epochs with a batch size of 10. The cutoff to define the connectivity matrix was 2.0 nm.
IV. RESULTS AND DISCUSSION
A. Methanol liquid/vapor interface
Interfaces are prevalent in molecular systems and are pertinent for various chemical processes, including chemical adsorption,55 emulsification,56 and liquid extraction.57 Examining the differing behavior of CG models of simple liquid/vapor interfaces provides quantitative insight into the capabilities of various CG methods in reproducing heterogeneity present in realistic atomistic systems, in this case due to variation in local density of the two-phase system. We develop four CG models (FM, REM, UCG FM, and UCG REM) of a methanol liquid–vapor interface and examine the variations in structural correlations and density profiles along the interfacial axis.
A center-of-mass CG bead mapping operator was employed to define the CG configurational space. As the density profile along the interfacial axis provides quantitative evidence for phase coexistence, we defined a reference internal state for UCG modeling of methanol according to the z-axis position of the CG methanol sites. Although this state definition accurately assigns phases to the methanol molecules, its inherently global property hinders the transferability of the resulting UCG models. Instead, a set of local internal states was learned through the ISR UCG method using the set of reference internal states as training data.
Hyperbolic tangent density fit for methanol liquid–vapor interface. Z-distance is centered by removing the mean z-distance of all methanol particles.
Hyperbolic tangent density fit for methanol liquid–vapor interface. Z-distance is centered by removing the mean z-distance of all methanol particles.
Atomistic representation of methanol interface (left), the CG mapping operator defines the CG representation of the methanol interface (middle), and the conditional state probability enables sampling of liquid (pink) and vapor (blue) states (right).
Atomistic representation of methanol interface (left), the CG mapping operator defines the CG representation of the methanol interface (middle), and the conditional state probability enables sampling of liquid (pink) and vapor (blue) states (right).
Figure 4 demonstrates the capabilities of each CG model in recapitulating interfacial behavior. It is clear from Fig. 4 that only the UCG REM model demonstrates the correct decay in local density in the vapor region, whereas the density of the FM and REM models significantly overestimates the density in the vapor region at and around the Gibbs dividing surface. The UCG FM model fails to capture interfacial behavior as indicated through its broader density profile. This suggests that the incorporation of state variables into the CG model directly enables the UCG model to recapitulate the multi-phase behavior present in the AA model.
Density of CG methanol sites along the interfacial axis for mapped AA and CG models.
Density of CG methanol sites along the interfacial axis for mapped AA and CG models.
As shown in Fig. 5, the structural correlations at the CG resolution are markedly different between the force and structure-based models. Both the FM and UCG FM models largely fail to capture the radial distribution function (RDF) of CG methanol. The REM and UCG REM RDFs are similar; however, the UCG REM model provides a more accurate representation of the RDF. Despite the REM and UCG REM models providing similar RDFs, the phase behavior of the two models is distinct. This disparity can be attributed to the tendency of the REM method to capture averages at the cost of significant fluctuations and is more evident through analysis of the structural correlations at the UCG resolution.
Radial distribution function of CG methanol sites for mapped AA and CG models. We note that the 3D RDF here does not decay to 1 due to the inherent anisotropy of the interfacial system.
Radial distribution function of CG methanol sites for mapped AA and CG models. We note that the 3D RDF here does not decay to 1 due to the inherent anisotropy of the interfacial system.
Distribution of occupation number for internal states corresponding to the liquid (left) and vapor phase (right) for mapped AA and CG models of the methanol interface. Note that since the total number of CG methanol sites is conserved, the two distributions are mirrored.
Distribution of occupation number for internal states corresponding to the liquid (left) and vapor phase (right) for mapped AA and CG models of the methanol interface. Note that since the total number of CG methanol sites is conserved, the two distributions are mirrored.
It is clear from Fig. 6 that the UCG REM model most accurately captures the occupation numbers for the liquid and vapor states. All other models significantly overestimate the vapor state as a consequence of failing to capture interfacial behavior. As the interfacial density profile becomes more uniform along the z-axis, the probability of being in the bulk liquid phase decreases due to decreased local density, which consequently assigns more CG sites to the vapor phase. This diminished density also produces the offset in magnitude of the RDF in Fig. 5 for the FM and UCG FM models. Nevertheless, the UCG REM model produces a slightly greater bulk density than the AA model, which is correlated with the bias toward occupancy of the liquid state as shown in Fig. 6. To remedy this, one-body “chemical potential” terms can be introduced to the UCG force-field (and their parameters subsequently optimized according to the REM framework) to penalize deviations in, e.g., average state occupation number.
Note that the average is taken over the UCG resolution, and consequently Nμ and ρν, which are not constant, cannot be removed from the average. When the number of states is equal to 1, i.e., as in standard CG MD, the resulting expression is either identical or proportional to the standard definition of the RDF, depending on the combinatoric normalization.
The UCG RDFs are shown in Fig. 7 for the mapped AA and CG models. The UCG REM model is clearly the most accurate in recapitulating the two-body correlations of each phase as well as the cross-correlation between phases. It is of particular note that both the REM and UCG REM models produce quantitatively similar CG methanol RDFs, as shown in Fig. 5, yet produce distinct RDFs at the UCG resolution. This can be understood as a result of error cancellation for the REM model, as the methanol CG RDF is a weighted sum over the state RDFs of Fig. 7. However, the UCG REM model produces a similar RDF to the mapped AA model for CG methanol directly because of its matching of correlation with the AA model across both phases. This underlying error in the REM model manifests in the interfacial density profile in Fig. 4, for which the REM and UCG REM models produce different results. This illustrates that judging the quality of a CG model via matching of two-body correlations alone produces a misleading assessment of model behavior. We additionally report the simulation efficiencies of the MeOH models in Table S1. Simulation efficiency was assessed via parallel simulation on 40 compute cores.
Radial distribution functions at the UCG resolution for mapped AA and CG models of the methanol liquid–vapor interface. Liquid–liquid (left), liquid–vapor (middle), and vapor–vapor (right) RDFs are shown. Due to the anisotropy in the interfacial system, the UCG RDFs possess different behaviors at long range.
Radial distribution functions at the UCG resolution for mapped AA and CG models of the methanol liquid–vapor interface. Liquid–liquid (left), liquid–vapor (middle), and vapor–vapor (right) RDFs are shown. Due to the anisotropy in the interfacial system, the UCG RDFs possess different behaviors at long range.
The potentials for all CG models are shown in Fig. 8. Both FM and REM models exhibit qualitatively similar potentials. For the bulk liquid interactions, the FM and REM potentials possess greater attractive wells than their UCG counterparts. It is apparent this attraction is somewhat repartitioned into the vapor interactions for the UCG models. Both UCG FM and REM models exhibit similar potentials for the liquid and liquid–vapor interactions; however, the potentials describing the vapor–vapor interaction are markedly different. The UCG REM vapor potential exhibits long-range repulsion and a greater short-range attraction. Further analysis on conducting UCG REM from a different starting point from UCG FM is warranted to assess whether these features are general attributes of UCG REM models or an artifact from the initial starting point.
Potentials for all CG models. Note that the FM and REM models have identical potentials across state designations, as these models do not consider internal state degrees of freedom.
Potentials for all CG models. Note that the FM and REM models have identical potentials across state designations, as these models do not consider internal state degrees of freedom.
B. Ripple phase behavior of a DPPC lipid bilayer
The ripple phase (Pβ) of lipid bilayers is an intermediate phase in which the fluid, liquid-disordered (Ld) phase coexists with the gel (Lβ) phase. The Pβ phase exhibits diverse tail packing behavior in which regions of Lβ phase lipids are connected via Ld phase membrane kinks marked by pronounced leaflet interdigitation in an undulating pattern.58 Due to the significant heterogeneity present, constructing a CG model of the Pβ of lipid bilayers remains an ongoing challenge. For example, the top-down Martini CG force-field (including Martini 3.0) is unable to form stable Pβ phases,59 while bottom-up CG models of DPPC have not so far fully recapitulated the structural heterogeneity present.18
The application of UCG theory to CG modeling of the Pβ phase, in this case of a DPPC bilayer, follows naturally from the identification of order parameters that distinguish between the Ld and Lβ phases. Indeed, similar work has been reported for bottom-up models in which internal state degrees of freedom were assigned to lipid tails.27 The Nelson–Halperin bond orientational order parameter quantifies the degree of hexatic packing in two-dimensional systems60 and can be applied to assess lipid tail packing behavior in lipid bilayers, as lipid bilayers are quasi-two-dimensional liquids.
(a) Mapping of atomistic DPPC to a six-site CG representation. (b) Spatial variation of hexatic ordering of T2 beads is shown for one leaflet of mapped AA DPPC bilayer (top); the hexatic order parameter is used to define internal states for UCG modeling (bottom).
(a) Mapping of atomistic DPPC to a six-site CG representation. (b) Spatial variation of hexatic ordering of T2 beads is shown for one leaflet of mapped AA DPPC bilayer (top); the hexatic order parameter is used to define internal states for UCG modeling (bottom).
The absolute value of the order parameter quantifies the degree of hexatic ordering and ranges from 0 to 1, with values of 1 indicating perfect hexatic ordering. The general mapping scheme and state definitions for DPPC are shown in Figs. 9(a) and 9(b), respectively.
The threshold and switching values were assigned as ψth = 0.5 and ψsw = 0.1. Logistic regression was conducted in the same manner as the methanol liquid–vapor interface, and the following values for the regressed internal state parameters were obtained: Rth = 0.39 nm, α = 0.62, wth = 1.52, and β = 0.46. The training profile of the logistic regression model across epochs is shown in Fig. S2. The accuracy of the logistic regression model was 62.6%. We observed sensitivity in the parameters obtained variationally with respect to their initial values, as in the interfacial MeOH example.
The results of bilayer simulations of all four models are shown in Fig. 10. Of the four CG DPPC models produced, only the REM and UCG REM models produced CG bilayer models. We further note that the REM model exhibits defects, i.e., tail protrusions into the solvent-exposed surface as well as hydrophilic head groups buried in the hydrophobic region. These deficiencies are not observed in the UCG REM model. In addition, we find that training stability is significantly greater for the UCG REM model than the REM model, i.e., a significantly greater percentage of UCG REM iterations enable stable bilayer simulation than REM iterations. We find a significant number of REM iterations exhibit membrane poration, which is in turn correlated with pronounced hexatic ordering, suggesting a delicate balance in the parameter space of tail interactions between ensuring overall membrane stability and capturing the largely structured pair correlations of lipid tails (see Fig. S3 of the supplementary material). This suggests that the training instability of REM models is related to the challenge in simultaneously capturing hexatic ordering as well as the hydrophobic force of self-assembly in a pairwise CG model. We report the simulation efficiency of the functional REM and UCG REM bilayer models in Table S2.
Simulations of (a) FM, (b) REM, (c) UCG FM, and (d) UCG REM models. Only REM and UCG REM models maintain bilayer configurations; however, defects in the REM model are present.
Simulations of (a) FM, (b) REM, (c) UCG FM, and (d) UCG REM models. Only REM and UCG REM models maintain bilayer configurations; however, defects in the REM model are present.
Log-plot of relative entropy error metric during training for (a) REM and (b) UCG REM models. REM model training exhibits oscillatory behavior, whereas the UCG REM model exhibits stable training.
Log-plot of relative entropy error metric during training for (a) REM and (b) UCG REM models. REM model training exhibits oscillatory behavior, whereas the UCG REM model exhibits stable training.
Hexatic ordering distributions of the UCG REM model. The probability distribution (a) for the hexatic order parameter for the mapped AA and UCG REM models is shown, as well as the spatial distribution for one leaflet of the UCG REM model (b).
Hexatic ordering distributions of the UCG REM model. The probability distribution (a) for the hexatic order parameter for the mapped AA and UCG REM models is shown, as well as the spatial distribution for one leaflet of the UCG REM model (b).
Furthermore, the UCG REM methodology is compatable with recently developed virtual site approaches, enabling better capture of solvent-mediated behavior.18,19 Alternatively, iterative force-matching,63,64 which has remained unexplored in UCG modeling, may prove useful in capturing these higher order-correlations. Iterative force-matching has exhibited the capability to capture two-as well three-body correlations and does not target two-body correlations alone as in REM, suggesting a fundamentally different approach to UCG modeling which may prove useful.
Finally, we investigated the transferability of the UCG REM model. As shown in Fig. 13, we find the UCG REM model is capable of spontaneous self-assembly into a vesicle at low density conditions, indicating its general understanding of amphipathic behavior. We note that vesicle UCG simulation is only possible due to the usage of the UCG ISR method, which allows the internal state definitions to be expressed via three-dimensional configurational information rather than two-dimensional information via the Nelson–Halperin bond orientational order parameter.
Spontaneous vesicle formation of UCG REM DPPC model. (a) Full vesicle and (b) cross section are shown.
Spontaneous vesicle formation of UCG REM DPPC model. (a) Full vesicle and (b) cross section are shown.
V. CONCLUSIONS
We have introduced in this work two methods for the development of UCG models for systems that exhibit marked heterogeneities such as phase co-existence. Both methods are founded on UCG principles, which partition the configurational space of CG systems according to a defined set of internal CG bead “states.” These internal states are defined probabilistically through association with a system order parameter(s), which ideally are correlated with the inhomogeneities present in the atomistic system. Our first method, ISR-UCG, can be utilized to regress an order parameter that describes the heterogeneous behavior into an order parameter that is computationally convenient and scalable for incorporation into a UCG force-field. The functional form and practical implementation of the ISR-UCG method utilize a graph network architecture, enabling significant flexibility in defining the regressed order parameter. Our second method, UCG-REM, provides the foundations for the development of a UCG force-field from a relative entropy minimization principle. As demonstrated for the systems investigated in this work, the UCG-REM method provides a particular advantage in capturing, e.g., phase coexistence in CG systems. So long as the discrete states are related to the phases observed, the correlation matching feature of REM methods will ensure some recapitulation of phase coexistence in the CG model.
While we have demonstrated the utility of these methods in capturing phase coexistence, we additionally expect the ISR-UCG and UCG-REM methods to be particularly useful in the development of CG models of biomolecules in a variety of contexts. For example, the ISR-UCG method can be used to define a local order parameter approximately describing various stable basins in a protein free energy landscape, e.g., a protein undergoing various degrees of folding, and the UCG-REM method can be utilized to develop a force-field capable of capturing features across these regions.
Future theoretical developments will focus on the expansion of the ISR-UCG and UCG-RLE methods beyond the binary state architecture employed in this work. This can readily be incorporated into the ISR-UCG method through adaptation of multinomial logistic regression and incorporated into the UCG simulation, as neither the UCG-REM method nor the UCG-RLE method are contingent on a binary state description. We expect this to be particularly fruitful in CG modeling of systems that display multiple free energy minima, as is frequently exhibited in proteins and other biomolecules.
While the ISR-UCG method presented enables computationally practical internal state definitions, it is limited by the necessity that a pertinent order parameter be known prior to UCG model development. It would instead be useful to employ the MS-CG and REM loss functions as a means of determining variationally optimal state definitions, thus tying the state definition search directly to the CG force-field optimization process. We note that the UCG framework, when applied to both the MS-CG and REM methods, admits a direct optimization of the state definitions via the MS-CG and REM loss functions. For example, optimization of UCG internal state parameters via the REM method would proceed through gradient descent as in Eq. (11). However, these model parameters influence the conditional state probability rather than the UCG potential energy, producing a different gradient than Eq. (13) for gradient descent. We intend to elaborate upon this proposed method for autonomous internal state discovery in future work.
SUPPLEMENTARY MATERIAL
See the supplementary material for additional information regarding ISR training for interfacial MeOH and DPPC bilayer. Simulation efficiencies of AA and CG models. Tail ordering of REM DPPC model. RDFs of UCG REM DPPC model. Order parameter spatial distribution for AA DPPC.
ACKNOWLEDGMENTS
This material is based upon work supported by the National Science Foundation (NSF Grant No. CHE-2102677). Simulations were performed using computing resources provided by the University of Chicago Research Computing Center (RCC).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Patrick G. Sahrmann: Conceptualization (equal). Gregory A. Voth: Conceptualization (equal); Funding acquisition (lead).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.