Bottom-up coarse-grained (CG) modeling is an effective means of bypassing the limited spatiotemporal scales of conventional atomistic molecular dynamics while retaining essential information from the atomistic model. A central challenge in CG modeling is the trade-off between accuracy and efficiency, as the inclusion of often pivotal many-body interaction terms in the CG force-field renders simulation markedly slower than simple pairwise models. The Ultra Coarse-Graining (UCG) method incorporates many-body terms through discrete internal state variables that modulate the CG force-field according to, e.g., changes in local environment when substantial chemical heterogeneities exist. However, assigning optimal internal states systematically from atomistic simulation data, as well as the practical application of bottom-up UCG theory to biomolecular systems, remain open problems. We develop two synergistic methods to aid in the development of UCG models that can capture inhomogeneities in atomistic systems such as those induced by phase coexistence. The first method establishes the systematic construction of UCG force-fields from a relative entropy minimization principle, while the second method utilizes machine-learning to obtain optimal local order parameters for enhanced model efficiency and transferability. We apply these methods to a methanol liquid–vapor interface and the ripple phase of a 1,2-dipalmitoyl-sn-glycero-3-phosphocholine lipid bilayer and demonstrate that UCG modeling alone recapitulates aspects of phase coexistence that are otherwise not observed in CG modeling.

Molecular dynamics (MD) enables the computational study of chemical and biomolecular systems at an atomistic resolution.1 However, the number of atoms capable of being feasibly simulated and the timescales that can be reached via MD constitute a severe limitation. Coarse-graining (CG) methods systematically reduce the number of degrees of freedom in an atomistic system through the construction of a CG model, extending both the spatial and temporal limits available through MD simulation.2–6 CG modeling approaches are generally considered either top-down, in which the CG force-field is hand-tuned to reproduce experimental data, such as in the Martini model,2 or bottom-up, in which atomistic MD simulation is used as reference data to inform model parameterization.5,6 In particular, bottom-up CG methods obtain an effective potential energy landscape to evolve the CG degrees of freedom through variational statements that directly connect the CG Hamiltonian to the atomistic Hamiltonian through statistical mechanics. Ideally, the CG Hamiltonian will reproduce the many-body potential of mean force (mbPMF).5,6

Bottom-up CG methods can generally be divided into two categories: force-matching based methods, including the Multi-Scale Coarse-graining7,8 (MS-CG), also referred to as Force-Matching (FM), and generalized Yvon Born Green methods,9 and structure-matching based methods, such as the relative entropy minimization10–12 (REM) and iterative Boltzmann inversion13 (IBI) methods. While the MS-CG and REM methods are conceptually related,14 and each method will reproduce the same mbPMF given an infinitely expressive basis set, in practice the optimal force-field according to each variational method will differ when a limited basis set is utilized. Through the development of CG models of a variety of systems, it has become increasingly apparent that the inclusion of multi-body terms in the CG force-field is often pivotal in providing a fully accurate representation of the statistical behavior of the CG model with respect to the underlying atomistic system.15,16 Such multi-body terms have included interactions that are physically interpretable, such as the Stillinger–Weber three-body interaction potential16 and local order parameter-based interactions,17 as well as interactions whose recapitulation of many-body terms is more opaque, including virtual particle-based methods18–20 and neural network force-fields.21–23 An alternative approach is to introduce discrete internal degrees of freedom, or “states,” which correspond to conformational changes that occur within the CG site, i.e., involving degrees of freedom that have been integrated out.24–26 This approach has been applied to various biomolecular systems,27–29 allowing for significant heterogeneity in model behavior, and has been applied to model, e.g., the HIV-1 virus capsid assembly process from more than a thousand proteins.30,31 However, the inclusion of these discrete degrees of freedom within the CG Hamiltonian can be largely ad hoc.

The Ultra-Coarse-Graining (UCG) method24–26 is a means of incorporating discrete state degrees of freedom in a CG force-field from a statistical mechanical standpoint. In UCG theory, some or all of the CG sites in a CG model possess a discrete latent variable referred to as an internal state. The probability of a CG site being in a particular state can be expressed conditionally as a function of either the atomistic or CG coordinates. For example, a CG protein model may possess internal states related to being folded or unfolded, and the probabilities of observing these states may be expressed as a function of the pertinent collective variables. Each CG site possesses a distinct interaction term for each internal state the site can be in. By introducing these internal state degrees of freedom, the CG configurational space can be effectively partitioned into separate regions for which different CG force-fields may be appropriate, enabling significantly enhanced model flexibility. Defining internal state degrees of freedom in this manner allows for an extended definition of thermodynamic consistency between the CG and all-atom (AA) models and the development of variational statements that can be employed to construct bottom-up UCG force-fields. Simulation of a UCG force-field can be conducted either via a kinetic Monte Carlo-like criterion for the internal state evolution,25 which dynamically switches states using kinetic information extracted from AA simulation, or the UCG rapid local equilibrium (UCG-RLE) method,26 which proceeds via direct simulation of an effective Hamiltonian whose forces are a conditioned average over the internal state degrees of freedom while assuming those internal states rapidly equilibrate for each motion (MD time step) of the CG beads or sites.

The UCG methodology has been applied in recapitulating hydrogen bonding in CG liquids,32 UCG modeling of interfaces,33 the conformational behavior of peptides,34 the behavior of actin filaments undergoing ATP hydrolysis,35,36 and the assembly of the HIV-1 virus capsid.30,31 However, in all cases considered so far, physical intuition was employed to construct UCG internal state definitions that were descriptive of the system inhomogeneities and computationally inexpensive for simulation. The successful application of UCG theory is contingent upon defining the internal states such that they correlate to the heterogeneities present in the atomistic system while also remaining practical for the CG simulation. Defining such states both rigorously and variationally, therefore, remains an unexplored but critically important area, and it is the central topic of this paper. In addition, the MS-CG formulation of UCG theory is currently the only bottom-up method that has been explored for variational optimization of UCG force-fields. Structure-based approaches to bottom-up CG often provide a complementary approach to force-based approaches when a limited basis set is employed,18 and hence the development of a structure-based approach to UCG is warranted. These two deficiencies in the current UCG theory currently preclude its general advancement in modeling of heterogeneous chemical and biomolecular systems.

In this work, we introduce two methods to address these challenges. Our first method, UCG-REM, establishes UCG theory from a relative entropy minimization principle, enabling structural correlation matching from a UCG perspective. Our second method, internal state regression (ISR) UCG, is a machine-learning method that, given a physically motivated order parameter, utilizes graph message passing to extract optimal internal state definitions that are locally defined and computationally inexpensive for simulation. The ISR-UCG method can be incorporated to assign internal states for systems wherein state definitions directly from the physically relevant order parameter are not computationally optimal either due to potentially global or anisotropic descriptions or increased computational burden. The ISR-UCG method instead employs logistic regression to assign an internal state definition according to the local environment of a CG bead in a spatially isotropic manner.

We demonstrate theoretically that the application of the ISR-UCG and UCG REM methods guarantees recapitulation of aspects of phase coexistence in CG models of chemical systems and provide two practical examples. We apply these two methods to a model of a liquid–vapor interface of methanol as a proof-of-concept example to illustrate the features of both methods and to contrast the behavior of structure and force-based UCG models alone. We then apply these methods to construct a highly CG, solvent-free model of the more challenging system of a ripple-phase bilayer of 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC) and demonstrate that UCG modeling is pivotal in recapitulating the multi-phase tail ordering.

We assume the equilibrium distribution of the atomistic configurational degrees of freedom, p̄rrn, is a Boltzmann distribution, i.e.,
(1)
where β = 1/kBT, urn is the atomistic potential energy, and z is the partition function of the atomistic ensemble.
Bottom-up CG modeling in our hands proceeds through defining a mapping operator, M, which subsequently implies the configurational distribution of the CG variables, pMrefRN, given an atomistic Boltzmann distribution, p̄rrn, through the following relation:8,
(2)
Equation (2) establishes thermodynamic consistency between the AA and CG models, as well as the CG effective potential, whose resulting Boltzmann distribution is pMrefRN, which defines the mbPMF. The forces of this mbPMF define a target for variational optimization of trial CG force-fields, which constitutes the MS-CG method.37 Similarly, sampling of potential energy derivatives enables optimization of a CG model via minimization of the relative entropy, or Kullback–Leibler divergence.11,12
The UCG theory posits additional internal state degrees of freedom whose probabilities of observation can be inferred from the configurational degrees of freedom.24 We first denote the number of CG beads that possess internal states as NΩ. Then a given state configuration for the entire system is Ω={ω1,ω2,ωNΩ}, where ωI is a discrete variable denoting the current state of CG site I. The number of states available for each CG site is permitted to differ; however, it is assumed that the number of states is finite. Furthermore, we will consider internal state definitions that do not change particle number and whose state definitions are dependent only on CG configurational information. We construct states such that the conditional probability of observing a state configuration given the configuration in the AA ensemble, prΩ|rn, is identical to the conditional probability of observing a state configuration given the configuration in the CG ensemble, pRΩ|Mrn. These constraints are summarized in the following equation:
(3)
Depending on the ensemble of focus, we will express the conditional probability of observing the states as dependent upon configurational information, rn, Mrn, and RN interchangeably; however, we emphasize that our previous requirements on internal states dictate only CG configurational information is relevant in determining their probabilities of observation. We can express the joint probability of the state degrees of freedom and atomistic configurational degrees of freedom as
(4)
We can then construct a UCG model consisting of a potential U with tunable parameters θ such that the Boltzmann distribution of the UCG model can be described as
(5)
where Z is the partition function of the UCG ensemble. We note that the UCG ensemble can similarly be expressed as
(6)
where we have defined a CG ensemble, p̄RRN;θ, which is the marginalization of the joint distribution over the state degrees of freedom. We define an effective Hamiltonian Umix, which describes the Boltzmann distribution of this CG ensemble
(7)
Importantly, we assume rapid local equilibration of internal states, enabling the construction of the CG potential Umix, which is related to the UCG model potential U through the following equation:26 
(8)
where we sum over all possible state configurations {Ω}. The previous equations establish thermodynamic consistency in the UCG theory and provide the grounds to develop the UCG MS-CG theory.24 
To proceed in developing a UCG REM theory, we construct a variational statement for optimizing the parameters θ from a relative entropy principle
(9)
where Smap is the mapping entropy, which is independent of the model parameters. However, by Eqs. (3) and (4), the relative entropy can be expressed solely in terms of the marginal distributions, i.e.,
(10)
Following the standard derivation of the relative entropy gradient for a model system that is Boltzmann distributed,12 we obtain an expression for the relative entropy gradient of a single model parameter,
(11)
The relationship between Umix and θ is not immediately calculable. However, it can be shown via Eq. (8) that the partial derivative of the effective Hamiltonian with respect to the parameter θ is related to the partial derivative of the UCG potential energy in the following manner:
(12)
Direct substitution of Eq. (12) into Eq. (11) then produces the following equation:
(13)
The relative entropy gradient is then seen here to be the difference of the potential energy derivative across the complete state and configurational space of atomistic and CG ensembles.
The second derivative of the relative entropy can similarly be calculated as
(14)
Note that if the potential energy is linear with respect to θ, then the Hessian is proportional to the variance of the (state-weighted) potential energy gradient and is therefore non-negative. This implies the relative entropy has a single global minimum when employing a UCG model with a parametrically linear potential energy, e.g., spline-based potentials.
Finally, we note that when the relative entropy is minimized with respect to a function, f, whose input is some subset of configurational degrees of freedom, R, the following relation holds:
(15)
Equation (15) is one of the central results of this paper and suggests that when a flexible enough basis set is employed to describe an interaction in the UCG model, the dual correlation is matched across state definitions. For example, if a UCG model of a molecular system that exists in two separate phases is constructed with a sufficiently flexible basis set describing pairwise interactions, then the radial distribution functions representative of both phases as well as the interface will be recapitulated.
We emphasize the form of Eqs. (13)(15) to suggest a numerical procedure for calculating the relative entropy gradient for a UCG model. Upon sampling p̄rrn, one can sample from prΩ|Mrn to “backmap” into the extended state space. Sampling of p̄R(RN;θ) from MD simulation can be accomplished via the UCG-RLE method, which recognizes that the effective Hamiltonian can be expressed as a free energy decomposition,26 
(16)
Upon sampling the Boltzmann distribution given the effective Hamiltonian of Eq. (16), one can backmap to the extended state space to obtain UCG statistics for the CG model necessary in the gradient and Hessian calculations of Eqs. (13) and (14), respectively. This algorithm is identical to the standard REM method, with the only addition being a backmapping procedure to obtain internal state statistics.
We define here a variational statement for the extraction of optimal internal states to be used in a UCG model based on some definition of reference states, Ωref={ωref,1,ωref,2,ωref,NΩref}. The probability of observing the reference internal state ωref,I for CG site I is defined with respect to an order parameter, mI(rn), which may more accurately describe the inhomogeneity present in the atomistic system but would be either computationally expensive to incorporate into the UCG force-field or confer transferability issues, e.g., the order parameter is a global property of the system. We assume a two-state system in this work; however, the methodology presented can be readily extended beyond a two-state description. The probabilities of the two reference internal state values per UCG bead (denoted 0 and 1) are then assumed to be of a form related to a type of sigmoid function, for example,
(17)
(18)
where mth and msw are parameters that control the threshold and steepness, respectively, of the switch function that discerns between the two states. Note that we permit the order parameter to be a function of the atomistic configuration and not merely of the mapped atomistic configuration. This allows for UCG modeling to retain essential behavior, which is described by configurational information that has been integrated out, e.g., retaining hydrogen bonding features in a CG model where the hydrogens have been CG mapped or otherwise dropped out.32 
We now define the functional form of the UCG state probabilities which will be optimized via regression upon the reference states and ultimately be incorporated into the UCG force field for numerical simulation. For a CG site, I, in the present case, the two internal state probabilities are
(19)
(20)
(21)
where NI is the set of CG neighbors of CG site I up to a defined cutoff. The UCG internal state probabilities here are interpreted as the result of a message-passing scheme in which each local neighbor J propagates a message, “Msg,” to site I, and the messages are pooled and fed as input to an “Update” function, which assigns the probability between the two states. By defining and optimizing the UCG internal state probabilities in this manner, the local environment of a CG site is utilized to “learn” the appropriate phase behavior such that simulation of the UCG model is computationally feasible.
The set of parameters that control the internal state probability, ϕ, are the targets for optimization. For the two-state system considered here, the probability of being in either state is Bernoulli-distributed, i.e., the probabilities for each state follow Eqs. (20) and (21). For Bernoulli-distributed random variables, the cross-entropy, LCE, can be utilized as a variational statement to obtain a set of optimal ϕ,
(22)
The protocol for obtaining a set of optimal parameters defining the internal state probability distributions is thus the following: (1) obtain a definition of internal states based on an order parameter of physical meaning, (2) assuming this order parameter is not ideal for numerical incorporation into the UCG force-field, assign a functional form to define the neighbor list to the “Msg” and “Update” functions of Eq. (19), and (3) sample from the original state distribution to obtain training data for minimization of the cross-entropy as defined in Eq. (22). In most practical applications, including those considered in this work, the “Update” function comprises a sigmoid function. Consequently, the minimization of the cross-entropy for the probabilities of internal states amounts to a logistic regression problem,38 and we, therefore, refer to the method of obtaining a set of parameters through the minimization of Eq. (22) as Internal State Regression for Ultra-Coarse-Graining (ISR-UCG). The workflow employed in this work is summarized in Fig. 1.
FIG. 1.

Flow chart (left) depicting protocol for creation of a UCG model utilizing the ISR-UCG and UCG REM methods developed in this work, using a DPPC bilayer as an example. The hexatic order parameter is used to define regressed internal states (right, above) using the ISR-UCG method, which trains internal states according to cross-entropy minimization (right, middle). A UCG lipid model is trained via the UCG REM method (right, below).

FIG. 1.

Flow chart (left) depicting protocol for creation of a UCG model utilizing the ISR-UCG and UCG REM methods developed in this work, using a DPPC bilayer as an example. The hexatic order parameter is used to define regressed internal states (right, above) using the ISR-UCG method, which trains internal states according to cross-entropy minimization (right, middle). A UCG lipid model is trained via the UCG REM method (right, below).

Close modal

We detail the protocols that were used in AA data collection, CG modeling, and CG simulation below.

1. All-atom molecular dynamics

Simulation results were taken from Ref. 39. A total of 1728 methanol molecules parameterized using the OPLS/AA force-field40 were placed in a 5 × 5 × 10 nm3 slab geometry. Hydrogen atoms were constrained using LINCS, and electrostatics were calculated using the particle mesh Ewald summation method. A Nose–Hoover thermostat was employed to maintain a thermostat at 298.15 K with a coupling time of 0.5 ps. All-atom simulation was conducted for 5 000 000 steps utilizing a 1 fs time step. Positions and forces were recorded every 5 ps as training data for CG modeling.

2. MS-CG

All bottom-up CG modeling was carried out using the OpenMSCG software package.39 For both CG and UCG models, pairwise interactions were parameterized as third order B-splines with a cutoff of 1.2 nm.

3. REM

All CG MD simulations were conducted using LAMMPS.41 REM CG and UCG models were parameterized using fourth order B-splines. REM gradient descent for the CG and UCG models proceeded from parameters obtained via CG and UCG FM, respectively. A learning rate of 1.0 was employed, with the first and last three coefficients held fixed during optimization. A maximum absolute step size of 0.01 kcal/mol was used; all parameter updates were rescaled when this threshold was reached. Both CG and UCG models were trained for 100 epochs. The CG simulation at each iteration consisted of 50 000 time steps with the temperature maintained via a Nose–Hoover thermostat. A time step of 2 fs and 1 fs was employed for CG and UCG models during training, respectively. Production simulations for all CG models employed a 5 fs time step to capitalize on simulation efficiency.

4. ISR

Training was conducted using PyTorch Geometric.42,43 To learn a regressed set of internal states, a learning rate of 8 × 10−3 was used to optimize internal state parameters for 800 epochs with a batch size of 32. The training dataset consisted of 17 280 examples. The cutoff to define the connectivity matrix was 1.2 nm.

1. All-atom molecular dynamics

The DPPC lipid bilayer consisted of 4608 DPPC lipids distributed symmetrically across each leaflet. The bilayer was solvated with ∼43 waters per lipid in 0.15M NaCl. The lipid bilayer system was constructed using CHARMM-GUI44–46 and simulated using GROMACS.47 The CHARMM36 force-field was employed to parameterize lipids,48 and the TIP3P force-field was used for the parameterization of solvent.49 Hydrogen atoms were constrained using LINCS, and the particle mesh Ewald summation method was employed for electrostatics calculations. The energy of the system was minimized to a force tolerance of 1000 kJ mol−1 nm−1, and constant NPT simulation was carried out using a Nosé–Hoover thermostat50 at 300 K and a semi-isotropic Parrinello–Rahman barostat51 at a pressure of 1 atm with coupling times of 1 and 5 ps, respectively. A time step of 2 fs was used throughout all simulations. After 90 ns of equilibration in the NPT ensemble, the bilayer was simulated in the constant NVT ensemble for 10 ns, which was then used as reference data for CG modeling. Positions and forces were recorded every 5 ps.

2. MS-CG

For both CG and UCG models, pairwise, bonding, and angular interactions were parametrized as third order B-splines with resolutions of 0.03 nm, 0.005 nm, and 0.5°, respectively. For nonbonding interactions, a cutoff of 2.0 nm was employed. Bayesian regularization was conducted for 250 iterations for both models.52 

3. REM

Packmol was used to generate a configuration of randomly dispersed lipids, which was used as an initial condition for all simulations in REM.53 Constant NVT simulation was performed via a Langevin thermostat with a coupling constant of 10 and 0.25 ps for the REM and UCG REM models, respectively. A time step of 20.0 fs and 5.0 fs was used throughout training for CG and UCG models, respectively. For both models, the simulation for each REM iteration consisted of 50 000 time steps of equilibration followed by 50 000 time steps, which were used for REM gradient descent. Due to the large parameter space present in optimization, the RMSProp algorithm was employed for gradient descent.54 A learning rate of 8 × 10−3 was used throughout training, and a memory and decay factor of 0.9 and 0.999 were used, respectively. For both CG and UCG REM models, all bonding and angular interactions were held fixed as the potentials obtained from their respective FM models, and the nonbonding potentials obtained from each FM model were utilized as a starting point for REM optimization. Fourth order B-splines with a resolution of 0.03 and 0.1 nm were utilized to describe all nonbonding interactions for the REM and UCG REM models, respectively. Both CG and UCG models were trained for 100 epochs.

4. CG MD

Production statistics were obtained from CG simulations of each bilayer for 150 ns of CG MD simulation starting from a configuration of the mapped atomistic data. Simulation of spontaneous vesicle formulation was conducted via initial placement of lipids in a spherical droplet formation in a 50 × 50 × 50 nm3 box.

5. ISR

Optimization of internal states proceeded via gradient descent with the Adam optimizer and a learning rate of 8 × 10−3 for 200 epochs with a batch size of 10. The cutoff to define the connectivity matrix was 2.0 nm.

Interfaces are prevalent in molecular systems and are pertinent for various chemical processes, including chemical adsorption,55 emulsification,56 and liquid extraction.57 Examining the differing behavior of CG models of simple liquid/vapor interfaces provides quantitative insight into the capabilities of various CG methods in reproducing heterogeneity present in realistic atomistic systems, in this case due to variation in local density of the two-phase system. We develop four CG models (FM, REM, UCG FM, and UCG REM) of a methanol liquid–vapor interface and examine the variations in structural correlations and density profiles along the interfacial axis.

A center-of-mass CG bead mapping operator was employed to define the CG configurational space. As the density profile along the interfacial axis provides quantitative evidence for phase coexistence, we defined a reference internal state for UCG modeling of methanol according to the z-axis position of the CG methanol sites. Although this state definition accurately assigns phases to the methanol molecules, its inherently global property hinders the transferability of the resulting UCG models. Instead, a set of local internal states was learned through the ISR UCG method using the set of reference internal states as training data.

To accomplish this, the density along the z-axis was fit as the sum of two hyperbolic tangent functions, as shown in the following equation:
(23)
The results of this fit are shown in Fig. 2. Once this function was fit, the individual inflection points z1 = −c/b and z2 = −g/f were identified and used to define the reference internal state probabilities in the following manner. Define the liquid state as state “0” and the vapor state as state “1,” then the internal state probabilities are given by
(24)
(25)
To define the regressed states, we assign the message-passing algorithm of Eq. (19) in the main text the following functional forms:
(26)
(27)
Logistic regression produced the following values for the parameters defining the regressed internal states: Rth = 0.24 nm, α = 2.66, wth = 9.58, and β = 0.13. We observed that the optimal parameter values obtained are somewhat dependent on their initial values during training, likely due to the nonlinearity of the model. In practice, a grid search over initial parameter values may be suitable for identifying optimal parameter sets. The training profile of the logistic regression model across epochs is shown in Fig. S1; the accuracy of the logistic regression model was 90.2%. The resulting mapping and state backmapping operations are summarized in Fig. 3.
FIG. 2.

Hyperbolic tangent density fit for methanol liquid–vapor interface. Z-distance is centered by removing the mean z-distance of all methanol particles.

FIG. 2.

Hyperbolic tangent density fit for methanol liquid–vapor interface. Z-distance is centered by removing the mean z-distance of all methanol particles.

Close modal
FIG. 3.

Atomistic representation of methanol interface (left), the CG mapping operator defines the CG representation of the methanol interface (middle), and the conditional state probability enables sampling of liquid (pink) and vapor (blue) states (right).

FIG. 3.

Atomistic representation of methanol interface (left), the CG mapping operator defines the CG representation of the methanol interface (middle), and the conditional state probability enables sampling of liquid (pink) and vapor (blue) states (right).

Close modal

Figure 4 demonstrates the capabilities of each CG model in recapitulating interfacial behavior. It is clear from Fig. 4 that only the UCG REM model demonstrates the correct decay in local density in the vapor region, whereas the density of the FM and REM models significantly overestimates the density in the vapor region at and around the Gibbs dividing surface. The UCG FM model fails to capture interfacial behavior as indicated through its broader density profile. This suggests that the incorporation of state variables into the CG model directly enables the UCG model to recapitulate the multi-phase behavior present in the AA model.

FIG. 4.

Density of CG methanol sites along the interfacial axis for mapped AA and CG models.

FIG. 4.

Density of CG methanol sites along the interfacial axis for mapped AA and CG models.

Close modal

As shown in Fig. 5, the structural correlations at the CG resolution are markedly different between the force and structure-based models. Both the FM and UCG FM models largely fail to capture the radial distribution function (RDF) of CG methanol. The REM and UCG REM RDFs are similar; however, the UCG REM model provides a more accurate representation of the RDF. Despite the REM and UCG REM models providing similar RDFs, the phase behavior of the two models is distinct. This disparity can be attributed to the tendency of the REM method to capture averages at the cost of significant fluctuations and is more evident through analysis of the structural correlations at the UCG resolution.

FIG. 5.

Radial distribution function of CG methanol sites for mapped AA and CG models. We note that the 3D RDF here does not decay to 1 due to the inherent anisotropy of the interfacial system.

FIG. 5.

Radial distribution function of CG methanol sites for mapped AA and CG models. We note that the 3D RDF here does not decay to 1 due to the inherent anisotropy of the interfacial system.

Close modal
Figure 6 analyzes the one-body correlations at the UCG resolution for all models, respectively. The one-body correlations constitute the average number of CG methanol sites, which can be considered as either in the liquid state or the vapor state. These averages are clearly related through the following equation:
(28)
where NL is the number of liquid state CG sites, NV is the number of vapor state CG sites, and N is the total number of CG methanol sites.
FIG. 6.

Distribution of occupation number for internal states corresponding to the liquid (left) and vapor phase (right) for mapped AA and CG models of the methanol interface. Note that since the total number of CG methanol sites is conserved, the two distributions are mirrored.

FIG. 6.

Distribution of occupation number for internal states corresponding to the liquid (left) and vapor phase (right) for mapped AA and CG models of the methanol interface. Note that since the total number of CG methanol sites is conserved, the two distributions are mirrored.

Close modal

It is clear from Fig. 6 that the UCG REM model most accurately captures the occupation numbers for the liquid and vapor states. All other models significantly overestimate the vapor state as a consequence of failing to capture interfacial behavior. As the interfacial density profile becomes more uniform along the z-axis, the probability of being in the bulk liquid phase decreases due to decreased local density, which consequently assigns more CG sites to the vapor phase. This diminished density also produces the offset in magnitude of the RDF in Fig. 5 for the FM and UCG FM models. Nevertheless, the UCG REM model produces a slightly greater bulk density than the AA model, which is correlated with the bias toward occupancy of the liquid state as shown in Fig. 6. To remedy this, one-body “chemical potential” terms can be introduced to the UCG force-field (and their parameters subsequently optimized according to the REM framework) to penalize deviations in, e.g., average state occupation number.

Further analysis at the UCG resolution requires defining an analogy to the RDF that incorporates internal state values. For two distinct states μ and ν that emanate from the same CG site definition, we define a UCG RDF as
(29)
where N is the total number of CG sites from which the states emanate, Nμ is the number of CG sites in state μ, ρν is the density of sites in state ν, ωI is the state of CG site I, ωJ is the state of CG site J, and rIJ is the distance between sites I and J.

Note that the average is taken over the UCG resolution, and consequently Nμ and ρν, which are not constant, cannot be removed from the average. When the number of states is equal to 1, i.e., as in standard CG MD, the resulting expression is either identical or proportional to the standard definition of the RDF, depending on the combinatoric normalization.

The UCG RDFs are shown in Fig. 7 for the mapped AA and CG models. The UCG REM model is clearly the most accurate in recapitulating the two-body correlations of each phase as well as the cross-correlation between phases. It is of particular note that both the REM and UCG REM models produce quantitatively similar CG methanol RDFs, as shown in Fig. 5, yet produce distinct RDFs at the UCG resolution. This can be understood as a result of error cancellation for the REM model, as the methanol CG RDF is a weighted sum over the state RDFs of Fig. 7. However, the UCG REM model produces a similar RDF to the mapped AA model for CG methanol directly because of its matching of correlation with the AA model across both phases. This underlying error in the REM model manifests in the interfacial density profile in Fig. 4, for which the REM and UCG REM models produce different results. This illustrates that judging the quality of a CG model via matching of two-body correlations alone produces a misleading assessment of model behavior. We additionally report the simulation efficiencies of the MeOH models in Table S1. Simulation efficiency was assessed via parallel simulation on 40 compute cores.

FIG. 7.

Radial distribution functions at the UCG resolution for mapped AA and CG models of the methanol liquid–vapor interface. Liquid–liquid (left), liquid–vapor (middle), and vapor–vapor (right) RDFs are shown. Due to the anisotropy in the interfacial system, the UCG RDFs possess different behaviors at long range.

FIG. 7.

Radial distribution functions at the UCG resolution for mapped AA and CG models of the methanol liquid–vapor interface. Liquid–liquid (left), liquid–vapor (middle), and vapor–vapor (right) RDFs are shown. Due to the anisotropy in the interfacial system, the UCG RDFs possess different behaviors at long range.

Close modal

The potentials for all CG models are shown in Fig. 8. Both FM and REM models exhibit qualitatively similar potentials. For the bulk liquid interactions, the FM and REM potentials possess greater attractive wells than their UCG counterparts. It is apparent this attraction is somewhat repartitioned into the vapor interactions for the UCG models. Both UCG FM and REM models exhibit similar potentials for the liquid and liquid–vapor interactions; however, the potentials describing the vapor–vapor interaction are markedly different. The UCG REM vapor potential exhibits long-range repulsion and a greater short-range attraction. Further analysis on conducting UCG REM from a different starting point from UCG FM is warranted to assess whether these features are general attributes of UCG REM models or an artifact from the initial starting point.

FIG. 8.

Potentials for all CG models. Note that the FM and REM models have identical potentials across state designations, as these models do not consider internal state degrees of freedom.

FIG. 8.

Potentials for all CG models. Note that the FM and REM models have identical potentials across state designations, as these models do not consider internal state degrees of freedom.

Close modal

The ripple phase (Pβ) of lipid bilayers is an intermediate phase in which the fluid, liquid-disordered (Ld) phase coexists with the gel (Lβ) phase. The Pβ phase exhibits diverse tail packing behavior in which regions of Lβ phase lipids are connected via Ld phase membrane kinks marked by pronounced leaflet interdigitation in an undulating pattern.58 Due to the significant heterogeneity present, constructing a CG model of the Pβ of lipid bilayers remains an ongoing challenge. For example, the top-down Martini CG force-field (including Martini 3.0) is unable to form stable Pβ phases,59 while bottom-up CG models of DPPC have not so far fully recapitulated the structural heterogeneity present.18 

The application of UCG theory to CG modeling of the Pβ phase, in this case of a DPPC bilayer, follows naturally from the identification of order parameters that distinguish between the Ld and Lβ phases. Indeed, similar work has been reported for bottom-up models in which internal state degrees of freedom were assigned to lipid tails.27 The Nelson–Halperin bond orientational order parameter quantifies the degree of hexatic packing in two-dimensional systems60 and can be applied to assess lipid tail packing behavior in lipid bilayers, as lipid bilayers are quasi-two-dimensional liquids.

UCG modeling of DPPC proceeded in the following manner. We utilize a solvent-free, six-site mapping for DPPC (Fig. 9) used in previous work,18 which consisted of a head group (HG) bead, a middle group bead (MG), two outer tail beads (T1), and two inner tail beads (T2). A binary set of reference internal states, which we refer to as the Ld and Lβ states, was proposed for the T2 bead of DPPC according to the Nelson–Halperin order parameter. To calculate the Nelson–Halperin order parameter for CG bead I, the six nearest T2 beads were identified and their coordinates projected to a plane of best fit via singular value decomposition.61,62 The order parameter for CG site I, ψ6I, is then calculated from the angle, θIJ, each distance vector makes with respect to the x-axis (we note that the choice of reference vector is arbitrary), via the following equation:
(30)
FIG. 9.

(a) Mapping of atomistic DPPC to a six-site CG representation. (b) Spatial variation of hexatic ordering of T2 beads is shown for one leaflet of mapped AA DPPC bilayer (top); the hexatic order parameter is used to define internal states for UCG modeling (bottom).

FIG. 9.

(a) Mapping of atomistic DPPC to a six-site CG representation. (b) Spatial variation of hexatic ordering of T2 beads is shown for one leaflet of mapped AA DPPC bilayer (top); the hexatic order parameter is used to define internal states for UCG modeling (bottom).

Close modal

The absolute value of the order parameter |ψ6| quantifies the degree of hexatic ordering and ranges from 0 to 1, with values of 1 indicating perfect hexatic ordering. The general mapping scheme and state definitions for DPPC are shown in Figs. 9(a) and 9(b), respectively.

The reference internal states for the T2 beads of DPPC were determined according to the Nelson–Halperin bond orientational order parameter, ψ6I. Denote the state “0” as the state assigned to the liquid-disordered phase and state “1” as the state assigned to the gel phase,
(31)
(32)

The threshold and switching values were assigned as ψth = 0.5 and ψsw = 0.1. Logistic regression was conducted in the same manner as the methanol liquid–vapor interface, and the following values for the regressed internal state parameters were obtained: Rth = 0.39 nm, α = 0.62, wth = 1.52, and β = 0.46. The training profile of the logistic regression model across epochs is shown in Fig. S2. The accuracy of the logistic regression model was 62.6%. We observed sensitivity in the parameters obtained variationally with respect to their initial values, as in the interfacial MeOH example.

The results of bilayer simulations of all four models are shown in Fig. 10. Of the four CG DPPC models produced, only the REM and UCG REM models produced CG bilayer models. We further note that the REM model exhibits defects, i.e., tail protrusions into the solvent-exposed surface as well as hydrophilic head groups buried in the hydrophobic region. These deficiencies are not observed in the UCG REM model. In addition, we find that training stability is significantly greater for the UCG REM model than the REM model, i.e., a significantly greater percentage of UCG REM iterations enable stable bilayer simulation than REM iterations. We find a significant number of REM iterations exhibit membrane poration, which is in turn correlated with pronounced hexatic ordering, suggesting a delicate balance in the parameter space of tail interactions between ensuring overall membrane stability and capturing the largely structured pair correlations of lipid tails (see Fig. S3 of the supplementary material). This suggests that the training instability of REM models is related to the challenge in simultaneously capturing hexatic ordering as well as the hydrophobic force of self-assembly in a pairwise CG model. We report the simulation efficiency of the functional REM and UCG REM bilayer models in Table S2.

FIG. 10.

Simulations of (a) FM, (b) REM, (c) UCG FM, and (d) UCG REM models. Only REM and UCG REM models maintain bilayer configurations; however, defects in the REM model are present.

FIG. 10.

Simulations of (a) FM, (b) REM, (c) UCG FM, and (d) UCG REM models. Only REM and UCG REM models maintain bilayer configurations; however, defects in the REM model are present.

Close modal
We quantitatively assessed the training stability of the REM and UCG REM models by defining the following error metric:
(33)
This unitless error metric, χS2, in the above-mentioned equation weights the squared-error loss of the potential energy derivatives for each parameter optimized by the variances of each potential energy derivative. The error metric as defined in Eq. (33) applies to the REM model, and its extension to a UCG REM model is straightforward. Figure 11 details the behavior of this error metric during training. We find that the UCG REM model exhibits a stable learning curve, whereas the REM model is highly oscillatory. These results are intriguing, as the UCG REM potential introduces more complexity than a standard Hamiltonian. This suggests that the UCG methodology is fundamentally different from a means of adding more parameters to naively enhance model flexibility and can instead be utilized to train a CG model when significant physical heterogeneities render training of more simplistic Hamiltonians unstable.
FIG. 11.

Log-plot of relative entropy error metric during training for (a) REM and (b) UCG REM models. REM model training exhibits oscillatory behavior, whereas the UCG REM model exhibits stable training.

FIG. 11.

Log-plot of relative entropy error metric during training for (a) REM and (b) UCG REM models. REM model training exhibits oscillatory behavior, whereas the UCG REM model exhibits stable training.

Close modal
Due to the deficiencies of all other models, we only include the UCG REM model for further structural comparison to the CG mapped AA model. The RDFs for UCG REM and mapped AA models are shown in Fig. S4 of the SI and are indicative that the two-body correlations are largely retained in the UCG REM model. The probability distribution as well as spatial distribution of the Nelson–Halperin bond-orientational order parameter for the UCG REM model are shown in Fig. 12. While non-uniform behavior is observed in the UCG REM model as shown in Fig. 12(a), the structural organization characteristic of the Pβ phase is still not observed, as evident in Fig. 12(b). This suggests that while the incorporation of UCG modeling has in some way addressed the necessity of multi-body interaction terms in the effective Hamiltonian to develop a stable REM model, a more complete recapitulation of Pβ phase behavior necessitates further inclusion of multi-body terms. We note that the ISR model developed to predict the internal states for DPPC displayed a markedly lower accuracy than the interfacial example, suggesting a more sophisticated ISR model could be employed to better correlate state definitions with changes in the relevant (in this case, hexatic) order parameter. In addition, the UCG theory has been minimally included in the CG modeling of DPPC in this example. Future avenues of exploration may include defining more than two internal states for the T2 beads, as well as incorporating internal states for the remaining HG, MG, and T1 beads to address these deficiencies. An alternative order parameter for reference state assignment for, e.g., the MG bead of the DPPC CG lipid, is the acyl chain order parameter, which quantifies the packing orientation of the lipid with respect to the bilayer normal. The acyl chain order parameter, P2, can be calculated via the following equation:
(34)
where θ is the angle a lipid bond vector makes with the bilayer normal. The acyl chain order parameter ranges from [−0.5, 1], where −0.5 indicates an orthogonal orientation with respect to the bilayer normal and 1 indicates a bond vector that is parallel with the bilayer normal. The spatial distribution of this order parameter for the HG-MG bond vector is provided in Fig. S5. Heterogeneity is observed in this order parameter, commensurate with the diversity in lipid packing behavior exhibited in the Pβ phase. We additionally note that the reference state assignment in the DPPC example utilized a threshold value of 0.5. A possible protocol to select an “ideal” threshold value may be to identify the threshold value that ensures the greatest change in structural environment between UCG states via, e.g., the Kullback–Leibler divergence in the UCG statewise RDFs.
FIG. 12.

Hexatic ordering distributions of the UCG REM model. The probability distribution (a) for the hexatic order parameter for the mapped AA and UCG REM models is shown, as well as the spatial distribution for one leaflet of the UCG REM model (b).

FIG. 12.

Hexatic ordering distributions of the UCG REM model. The probability distribution (a) for the hexatic order parameter for the mapped AA and UCG REM models is shown, as well as the spatial distribution for one leaflet of the UCG REM model (b).

Close modal

Furthermore, the UCG REM methodology is compatable with recently developed virtual site approaches, enabling better capture of solvent-mediated behavior.18,19 Alternatively, iterative force-matching,63,64 which has remained unexplored in UCG modeling, may prove useful in capturing these higher order-correlations. Iterative force-matching has exhibited the capability to capture two-as well three-body correlations and does not target two-body correlations alone as in REM, suggesting a fundamentally different approach to UCG modeling which may prove useful.

Finally, we investigated the transferability of the UCG REM model. As shown in Fig. 13, we find the UCG REM model is capable of spontaneous self-assembly into a vesicle at low density conditions, indicating its general understanding of amphipathic behavior. We note that vesicle UCG simulation is only possible due to the usage of the UCG ISR method, which allows the internal state definitions to be expressed via three-dimensional configurational information rather than two-dimensional information via the Nelson–Halperin bond orientational order parameter.

FIG. 13.

Spontaneous vesicle formation of UCG REM DPPC model. (a) Full vesicle and (b) cross section are shown.

FIG. 13.

Spontaneous vesicle formation of UCG REM DPPC model. (a) Full vesicle and (b) cross section are shown.

Close modal

We have introduced in this work two methods for the development of UCG models for systems that exhibit marked heterogeneities such as phase co-existence. Both methods are founded on UCG principles, which partition the configurational space of CG systems according to a defined set of internal CG bead “states.” These internal states are defined probabilistically through association with a system order parameter(s), which ideally are correlated with the inhomogeneities present in the atomistic system. Our first method, ISR-UCG, can be utilized to regress an order parameter that describes the heterogeneous behavior into an order parameter that is computationally convenient and scalable for incorporation into a UCG force-field. The functional form and practical implementation of the ISR-UCG method utilize a graph network architecture, enabling significant flexibility in defining the regressed order parameter. Our second method, UCG-REM, provides the foundations for the development of a UCG force-field from a relative entropy minimization principle. As demonstrated for the systems investigated in this work, the UCG-REM method provides a particular advantage in capturing, e.g., phase coexistence in CG systems. So long as the discrete states are related to the phases observed, the correlation matching feature of REM methods will ensure some recapitulation of phase coexistence in the CG model.

While we have demonstrated the utility of these methods in capturing phase coexistence, we additionally expect the ISR-UCG and UCG-REM methods to be particularly useful in the development of CG models of biomolecules in a variety of contexts. For example, the ISR-UCG method can be used to define a local order parameter approximately describing various stable basins in a protein free energy landscape, e.g., a protein undergoing various degrees of folding, and the UCG-REM method can be utilized to develop a force-field capable of capturing features across these regions.

Future theoretical developments will focus on the expansion of the ISR-UCG and UCG-RLE methods beyond the binary state architecture employed in this work. This can readily be incorporated into the ISR-UCG method through adaptation of multinomial logistic regression and incorporated into the UCG simulation, as neither the UCG-REM method nor the UCG-RLE method are contingent on a binary state description. We expect this to be particularly fruitful in CG modeling of systems that display multiple free energy minima, as is frequently exhibited in proteins and other biomolecules.

While the ISR-UCG method presented enables computationally practical internal state definitions, it is limited by the necessity that a pertinent order parameter be known prior to UCG model development. It would instead be useful to employ the MS-CG and REM loss functions as a means of determining variationally optimal state definitions, thus tying the state definition search directly to the CG force-field optimization process. We note that the UCG framework, when applied to both the MS-CG and REM methods, admits a direct optimization of the state definitions via the MS-CG and REM loss functions. For example, optimization of UCG internal state parameters via the REM method would proceed through gradient descent as in Eq. (11). However, these model parameters influence the conditional state probability rather than the UCG potential energy, producing a different gradient than Eq. (13) for gradient descent. We intend to elaborate upon this proposed method for autonomous internal state discovery in future work.

See the supplementary material for additional information regarding ISR training for interfacial MeOH and DPPC bilayer. Simulation efficiencies of AA and CG models. Tail ordering of REM DPPC model. RDFs of UCG REM DPPC model. Order parameter spatial distribution for AA DPPC.

This material is based upon work supported by the National Science Foundation (NSF Grant No. CHE-2102677). Simulations were performed using computing resources provided by the University of Chicago Research Computing Center (RCC).

The authors have no conflicts to disclose.

Patrick G. Sahrmann: Conceptualization (equal). Gregory A. Voth: Conceptualization (equal); Funding acquisition (lead).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
M.
Karplus
and
J. A.
McCammon
,
Nat. Struct. Biol.
9
,
646
(
2002
).
2.
S. J.
Marrink
,
H. J.
Risselada
,
S.
Yefimov
,
D. P.
Tieleman
, and
A. H.
de Vries
,
J. Phys. Chem. B
111
,
7812
(
2007
).
3.
M. G.
Saunders
and
G. A.
Voth
,
Annu. Rev. Biophys.
42
,
73
(
2013
).
4.
A. J.
Pak
and
G. A.
Voth
,
Curr. Opin. Struct. Biol.
52
,
119
(
2018
).
5.
J.
Jin
,
A. J.
Pak
,
A. E. P.
Durumeric
,
T. D.
Loose
, and
G. A.
Voth
,
J. Chem. Theory Comput.
18
,
5759
(
2022
).
6.
W. G.
Noid
,
J. Phys. Chem. B
127
,
4174
(
2023
).
7.
S.
Izvekov
and
G. A.
Voth
,
J. Phys. Chem. B
109
,
2469
(
2005
).
8.
W. G.
Noid
,
J. W.
Chu
,
G. S.
Ayton
,
V.
Krishna
,
S.
Izvekov
,
G. A.
Voth
,
A.
Das
, and
H. C.
Andersen
,
J. Chem. Phys.
128
,
244114
(
2008
).
9.
J. W.
Mullinax
and
W. G.
Noid
,
Phys. Rev. Lett.
103
,
198104
(
2009
).
10.
M. S.
Shell
,
J. Chem. Phys.
129
,
144108
(
2008
).
11.
A.
Chaimovich
and
M. S.
Shell
,
Phys. Rev. E
81
,
060104
(
2010
).
12.
A.
Chaimovich
and
M. S.
Shell
,
J. Chem. Phys.
134
,
094112
(
2011
).
14.
J. F.
Rudzinski
and
W. G.
Noid
,
J. Chem. Phys.
135
,
214101
(
2011
).
15.
J.
Wang
,
N.
Charron
,
B.
Husic
,
S.
Olsson
,
F.
Noe
, and
C.
Clementi
,
J. Chem. Phys.
154
,
164113
(
2021
).
16.
L.
Larini
,
L.
Lu
, and
G. A.
Voth
,
J. Chem. Phys.
132
,
164107
(
2010
).
17.
J. W.
Wagner
,
T.
Dannenhoffer-Lafage
,
J.
Jin
, and
G. A.
Voth
,
J. Chem. Phys.
147
,
044113
(
2017
).
18.
A. J.
Pak
,
T.
Dannenhoffer-Lafage
,
J. J.
Madsen
, and
G. A.
Voth
,
J. Chem. Theory Comput.
15
,
2087
(
2019
).
19.
P. G.
Sahrmann
,
T. D.
Loose
,
A. E. P.
Durumeric
, and
G. A.
Voth
,
J. Chem. Theory Comput.
19
,
4402
(
2023
).
20.
L. F.
Christians
,
E. V.
Halingstad
,
E.
Kram
,
E. M.
Okolovitch
, and
A. J.
Pak
,
J. Phys. Chem. B
128
,
1394
(
2024
).
21.
J.
Wang
,
S.
Olsson
,
C.
Wehmeyer
,
A.
Perez
,
N. E.
Charron
,
G.
de Fabritiis
,
F.
Noe
, and
C.
Clementi
,
ACS Cent. Sci.
5
,
755
(
2019
).
22.
B. E.
Husic
,
N. E.
Charron
,
D.
Lemm
,
J.
Wang
,
A.
Perez
,
M.
Majewski
,
A.
Kramer
,
Y.
Chen
,
S.
Olsson
,
G.
de Fabritiis
,
F.
Noe
, and
C.
Clementi
,
J. Chem. Phys.
153
,
194101
(
2020
).
23.
S.
Thaler
,
M.
Stupp
, and
J.
Zavadlav
,
J. Chem. Phys.
157
,
244103
(
2022
).
24.
J. F.
Dama
,
A. V.
Sinitskiy
,
M.
McCullagh
,
J.
Weare
,
B.
Roux
,
A. R.
Dinner
, and
G. A.
Voth
,
J. Chem. Theory Comput.
9
,
2466
(
2013
).
25.
A.
Davtyan
,
J. F.
Dama
,
A. V.
Sinitskiy
, and
G. A.
Voth
,
J. Chem. Theory Comput.
10
,
5265
(
2014
).
26.
J. F.
Dama
,
J.
Jin
, and
G. A.
Voth
,
J. Chem. Theory Comput.
13
,
1010
(
2017
).
27.
T.
Murtola
,
M.
Karttunen
, and
I.
Vattulainen
,
J. Chem. Phys.
131
,
055101
(
2009
).
28.
V. C.
Chappa
,
D. C.
Morse
,
A.
Zippelius
, and
M.
Muller
,
Phys. Rev. Lett.
109
,
148302
(
2012
).
29.
C. E.
Sing
and
A.
Alexander-Katz
,
Macromolecules
44
,
6962
(
2011
).
30.
J. M. A.
Grime
,
J. F.
Dama
,
B. K.
Ganser-Pornillos
,
C. L.
Woodward
,
G. J.
Jensen
,
M.
Yeager
, and
G. A.
Voth
,
Nat. Commun.
7
,
11568
(
2016
).
31.
M.
Gupta
,
A. J.
Pak
, and
G. A.
Voth
,
Sci. Adv.
9
,
eadd7434
(
2023
).
32.
J.
Jin
,
Y.
Han
, and
G. A.
Voth
,
J. Chem. Theory Comput.
14
,
6159
(
2018
).
33.
J.
Jin
and
G. A.
Voth
,
J. Chem. Theory Comput.
14
,
2180
(
2018
).
34.
J.
Jin
and
G. A.
Voth
,
J. Phys. Chem. Lett.
14
,
1354
(
2023
).
35.
H. H.
Katkar
,
A.
Davtyan
,
A. E. P.
Durumeric
,
G. M.
Hocky
,
A. C.
Schramm
,
E. M.
De La Cruz
, and
G. A.
Voth
,
Biophys. J.
115
,
1589
(
2018
).
36.
S.
Mani
,
H. H.
Katkar
, and
G. A.
Voth
,
J. Chem. Theory Comput.
17
,
1900
(
2021
).
37.
W. G.
Noid
,
P.
Liu
,
Y.
Wang
,
J. W.
Chu
,
G. S.
Ayton
,
S.
Izvekov
,
H. C.
Andersen
, and
G. A.
Voth
,
J. Chem. Phys.
128
,
244115
(
2008
).
38.
D.
Jurafsky
and
J. H.
Martin
,
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
(
Prentice Hall PTR
,
2000
).
39.
Y.
Peng
,
A. J.
Pak
,
A. E. P.
Durumeric
,
P. G.
Sahrmann
,
S.
Mani
,
J.
Jin
,
T. D.
Loose
,
J.
Beiter
, and
G. A.
Voth
,
J. Phys. Chem. B
127
,
8537
(
2023
).
40.
W. L.
Jorgensen
,
D. S.
Maxwell
, and
J.
Tirado-Rives
,
J. Am. Chem. Soc.
118
,
11225
(
1996
).
41.
S.
Plimpton
,
J. Comput. Phys.
117
,
1
(
1995
).
42.
M.
Fey
and
J. E.
Lenssen
, arXiv:1903.02428 (
2019
).
43.
A.
Paszke
,
S.
Gross
,
F.
Massa
,
A.
Lerer
,
J.
Bradbury
,
G.
Chanan
,
T.
Killeen
,
Z.
Lin
,
N.
Gimelshein
,
L.
Antiga
,
A.
Desmaison
,
A.
Köpf
,
E.
Yang
,
Z.
DeVito
,
M.
Raison
,
A.
Tejani
,
S.
Chilamkurthy
,
B.
Steiner
,
L.
Fang
,
J.
Bai
, and
S.
Chintala
, arXiv:1912.01703 (
2019
).
44.
S.
Jo
,
J. B.
Lim
,
J. B.
Klauda
, and
W.
Im
,
Biophys. J.
97
,
50
(
2009
).
45.
E. L.
Wu
,
X.
Cheng
,
S.
Jo
,
H.
Rui
,
K. C.
Song
,
E. M.
Dávila-Contreras
,
Y.
Qi
,
J.
Lee
,
V.
Monje-Galvan
,
R. M.
Venable
,
J. B.
Klauda
, and
W.
Im
,
J. Comput. Chem.
35
,
1997
(
2014
).
46.
J.
Lee
,
X.
Cheng
,
J. M.
Swails
,
M. S.
Yeom
,
P. K.
Eastman
,
J. A.
Lemkul
,
S.
Wei
,
J.
Buckner
,
J. C.
Jeong
,
Y.
Qi
,
S.
Jo
,
V. S.
Pande
,
D. A.
Case
,
C. L.
Brooks
III
,
A. D.
MacKerell
, Jr.
,
J. B.
Klauda
, and
W.
Im
,
J. Chem. Theory Comput.
12
,
405
(
2016
).
47.
M. J.
Abraham
,
T.
Murtola
,
R.
Schulz
,
S.
Páll
,
J. C.
Smith
,
B.
Hess
, and
E.
Lindahl
,
SoftwareX
1–2
,
19
(
2015
).
48.
J. B.
Klauda
,
R. M.
Venable
,
J. A.
Freites
,
J. W.
O’Connor
,
D. J.
Tobias
,
C.
Mondragon-Ramirez
,
I.
Vorobyov
,
A. D.
MacKerell
, Jr.
, and
R. W.
Pastor
,
J. Phys. Chem. B
114
,
7830
(
2010
).
49.
W. L.
Jorgensen
,
J.
Chandrasekhar
,
J. D.
Madura
,
R. W.
Impey
, and
M. L.
Klein
,
J. Chem. Phys.
79
,
926
(
1983
).
50.
51.
M.
Parrinello
and
A.
Rahman
,
J. Appl. Phys.
52
,
7182
(
1981
).
52.
P.
Liu
,
Q.
Shi
,
H.
Daume
III
, and
G. A.
Voth
,
J. Chem. Phys.
129
,
214114
(
2008
).
53.
L.
Martinez
,
R.
Andrade
,
E. G.
Birgin
, and
J. M.
Martinez
,
J. Comput. Chem.
30
,
2157
(
2009
).
54.
G.
Hinton
,
N.
Srivastava
, and
K.
Swersky
,
Coursera
264
(
1
),
2012a
(
2012
).
56.
K.
Bouchemal
,
S.
Briancon
,
E.
Perrier
, and
H.
Fessi
,
Int. J. Pharm.
280
,
241
(
2004
).
57.
S.
Riaño
and
K.
Binnemans
,
Green Chem.
17
,
2931
(
2015
).
58.
M.
Rappolt
and
G.
Rapp
,
Eur. Biophys. J.
24
,
381
(
1996
).
59.
P.
Sharma
,
R.
Desikan
, and
K. G.
Ayappa
,
J. Phys. Chem. B
125
,
6587
(
2021
).
60.
B. I.
Halperin
and
D. R.
Nelson
,
Phys. Rev. Lett.
41
,
121
(
1978
).
61.
G. A.
Pantelopulos
,
T.
Nagai
,
A.
Bandara
,
A.
Panahi
, and
J. E.
Straub
,
J. Chem. Phys.
147
,
095101
(
2017
).
62.
G. A.
Pantelopulos
and
J. E.
Straub
,
Biophys. J.
115
,
2167
(
2018
).
63.
H. M.
Cho
and
J. W.
Chu
,
J. Chem. Phys.
131
,
134107
(
2009
).
64.
L.
Lu
,
J. F.
Dama
, and
G. A.
Voth
,
J. Chem. Phys.
139
(
2013
).