Computer simulations can provide mechanistic insight into ionic liquids (ILs) and predict the properties of experimentally unrealized ion combinations. However, ILs suffer from a particularly large disparity in the time scales of atomistic and ensemble motion. Coarse-grained models are therefore used in place of costly all-atom simulations, accessing longer time scales and larger systems. Nevertheless, constructing the many-body potential of mean force that defines the structure and dynamics of a coarse-grained system can be complicated and computationally intensive. Machine learning shows great promise for the linked challenges of dimensionality reduction and learning the potential of mean force. To improve the coarse-graining of ILs, we present a neural network model trained on all-atom classical molecular dynamics simulations. The potential of mean force is expressed as two jointly trained neural network interatomic potentials that learn the coupled short-range and many-body long range molecular interactions. These interatomic potentials treat temperature as an explicit input variable to capture its influence on the potential of mean force. The model reproduces structural quantities with high fidelity, outperforms the temperature-independent baseline at capturing dynamics, generalizes to unseen temperatures, and incurs low simulation cost.

## I. INTRODUCTION

Ionic liquids (ILs) are room temperature liquids composed of a molecular cation and anion. ILs have a wide variety of applications, including ion transport in batteries and catalysis.^{1,2} Because of the wide chemical diversity of molecular anions and cations, and their many possible combinations, a vast design space for ILs exists. Molecular dynamics simulations can characterize the structure and dynamics of ILs at the nanoscale, and, to some degree, predict their properties before experimental realization. However, due to the complex intermolecular forces in ILs, classical force fields need to be fine-tuned to recover the correct kinetics and structure.^{3,4} The properties of ILs are governed by a variety of interactive forces, including weak and isotropic forces such as van der Waals, solvophobic, and dispersion forces,^{5,6} strong Coulombic forces, anisotropic hydrogen bonding,^{7} halogen bonding,^{8} and magnetic dipole, electron pair, and dipole–dipole interactions.^{9} Further complicating simulation is the sluggish motion of the ensemble. This leads to a large gap in the time scales of atomistic and ensemble motion, limiting the size of the molecular system and the time scales that can be accessed through simulation.

Coarse-graining (CG) can increase accessible simulation sizes by reducing the dimensionality of the full atomistic model to a pseudoparticle representation.^{10,11} A coarse-grained model can be described by a potential energy surface, the many-body potential of mean force (PMF). The PMF can be seen as the configurational free energy in a reduced phase space and is determined by the all-atom potential and the mapping operator from a full to a coarse-grained resolution.^{12}

However, constructing an accurate many-body PMF that can faithfully capture the structure, thermodynamics, and kinetics of the coarse-grained system is not an easy task. This is because reducing the representation and averaging out the fast atomic motions introduce complex many-body correlations in the PMF. Furthermore, the PMF is conditioned on the choice of mapping from the all-atom to the compressed model and on the temperature of the reference data. Typically, these mappings are chosen based on chemical intuition and trial-and-error, although there exist some methods for automatic and statistical mapping algorithms.^{13–15}

CG-IL models are particularly challenging because of the different transport properties of the two components, which further complicates the modeling of the many-body PMF.^{16–18} Many authors have developed CG-IL models, and while each has achieved some success, none has been completely satisfactory. For example, Refs. 19–22 studied the 1-butyl-3-methylimidazolium cation with different anions. In one case, idealized models recovered the structural and energetic properties, but their transport properties were unreasonably slow.^{19} The model was improved upon by Merlet *et al.*,^{21} and the transport properties were recovered to a higher degree, but with a loss of structural properties.^{21} Wang *et al.*^{20} described an effective force coarse-graining applied to an IL system and recovered the structural radial distribution function (RDF) properties over a range of temperatures, but they did not investigate dynamics. Reference 22 described Newton inversion and iterative Boltzmann inversion methods for coarse-graining ILs, and the thermodynamic properties were well recovered. However, the model underperformed with respect to self-diffusivity, both in proximity to experimental data and in temperature scaling.^{22} Moreover, ILs exist as liquids over a wide range of temperatures, and many of their desired applications involve temperature swings, but none of the CG approaches mentioned above are temperature transferable. Although energy renormalization has been applied^{23} for temperature-transferable CG focused on recovering the structure of ILs, work on temperature transferability is lacking.

Each of the methods above achieved success in many, but not all, aspects of coarse graining. Moreover, each method required user expertise to create CG models that were not transferable to other ionic liquids. An alternative approach is to use machine learning (ML). ML approaches can achieve dimensionality reduction in a variety of domains^{15,24} and learn complex functions such as the CG many-body PMF.^{15,25–27} Neural coarse graining has been shown to recover ensemble quantities of liquid-phase alkanes with high accuracy and without expert user input.^{15} Here, we expand on this bottom-up approach to learn the encoding function and the associated PMF for an 1-butyl-3-methylimidazolium tetrafluoroborate system, with two key extensions. First, we condition the PMF on the temperature, achieving temperature-transferable dynamics at temperatures between two extremes. Second, in order to capture the complex many-body intermolecular forces acting on CG-ILs, we utilize two separate neural network models to learn the coupled intra- and inter-molecular PMF. These two key features allow coarse-graining more complex condensed-phase systems such as ILs and offer a reliable, automated alternative to expert-based IL coarse graining.

## II. METHODS

### A. Auto-encoder coarse-graining

The general schematic of auto-encoder coarse graining is shown in Fig. 1. The model is an autoencoder, consisting of an encoder that maps the atomistic data to a coarse-grained latent representation and a decoder that maps the latent representation back to the atomistic representation. To train the model, we minimize the autoencoder loss, *L*_{AE}, which was introduced in Ref. 15 as

The first term in Eq. (1) is a reconstruction loss, where *D* is the decoder mapping operator from the coarse-grained variables to the all-atom representation, *E* is the encoder mapping operator from the all-atom representation to the coarse-grained beads, *x* is the observed atomistic samples from all-atom MD snapshots (atomic number and Cartesian coordinates of every atom in the all-atom MD snapshot). *N* is the number of coarse-grained beads. *g* is sampled from a Gumbel distribution and is used for learning the categorical, discrete coarse-grained weights for each atom using Gumbel-softmax reparameterization.^{28} *τ* is a temperature-like variable that decreases along training to achieve discrete encoding in the coarse-grained mapping *E*. The second term is an instantaneous force regularization, minimizing of it produces less noisy mapped forces (*F*_{inst}) of a therefore smoother coarse-grained energy landscape. *F*_{inst} is the instantaneous mapped force acting on the CG beads and is calculated from the encoding function and the atomic forces on a given all-atom MD frame, *F*_{inst}(*z*) = −**b**∇*V*(*x*). The reconstruction loss statistically minimizes the loss of geometrical information that occurs when going through the latent space, and the force regularization is used to minimize the fluctuations in the mean force of the encoded space. By choosing the CG mapping that minimizes force fluctuations, a smoother energy landscape is achieved and more intuitive mappings are obtained. A weight parameter *ρ* is introduced as a heuristic to adjust the balance between the two terms.

### B. Temperature-dependent graph neural network potentials

Once a mapping *E* was chosen, a matching PMF was fitted through a graph convolutional neural network based on the SchNet model.^{29} In the SchNet model, the geometry is transformed into a set of atomic fingerprints (the “convolution” component), and a fully connected network transforms the fingerprints into an output (the “readout” component). The convolution blocks consist of a message step and update step to systematically gather information from neighboring atoms. A neighbor is defined as an atom within a fixed distance cutoff (discussed further below). Defining *v* as an index for each atom and its neighbors as *N*(*v*), the *t*th graph convolution updates the atomic feature vectors *h*_{v} by aggregating “messages” from their connected atoms *v* and their edge features *e*_{uv} through^{30}

The initial feature vector *h*^{0} is generated from a learnable embedding of atomic numbers. By performing the convolution operations several times, a many-body correlation function can be parameterized to represent the many-body potential of mean force of the coarse-grained ionic liquid complex. In the case of SchNet, the update function is simply a sum over the atomic embeddings. The message function is parameterized by

where MLP denotes a multi-layer perceptron and ○ is the concatenation operation. The readout consists of a sum over MLPs acting on each of the *N* atomic embedding at the final (*T*th) convolution,

Further details of the original model can be found in Ref. 29.

Figure 2 shows the neural network architecture for fitting a coarse-grained potential. The model is based on the SchNet architecture, but it uses graph convolutions performed at two scales through two separate networks [as defined in Eq. (2)]. Specifically, an intramolecular embedding (*h*_{v,intra}) is calculated by performing graph convolutions only among atoms belonging to the same molecule and within an intramolecular distance threshold (*d*_{intra}). A second intermolecular embedding (*h*_{v,inter}) is calculated by performing graph convolutions among atoms not belonging to the same molecule and within an intermolecular distance threshold (*d*_{inter}). *d*_{intra} and *d*_{inter} are hyperparameters chosen through hyperparameter optimization, typically with *d*_{inter} > *d*_{intra}. The parallel graph convolution passes create two embeddings for each CG bead. These embeddings are summed, and the single resulting fingerprint is then fed into a multi-layer perceptron for atomic energy parameterization. The per-atom energies are summed into a total energy, and its derivative is fitted through stochastic force matching. The purpose of this restrictive design was to differentiate the atomic environment after coarse-graining: the convolution procedure can differentiate its CG bead neighbor by inter-molecular interactions and intra-molecular interactions. This is necessary because intra- and inter-molecular distances in CG systems are closer than in all-atom simulations but have distinct and complicated functional forms. For example, covalently linked beads in the cation interact through different mechanisms than non-bonded ion pairing, regardless of the distance. Even point charge interactions become complex anisotropic functions in the PMF landscape, yet they occur over distances comparable to intramolecular interactions.

A classical bond prior with fixed bond lengths between the cation pseudoparticles achieves better learning of the intramolecular interactions,

where *k* is a spring constant hyperparameter, and *r*_{i}, *r*_{j} are the positions of neighboring particles. Furthermore, an excluded volume prior was added to obtain better intermolecular interactions,

with *σ*_{ex} and *n*_{ex} hyperparameters related to the excluded volume.

We further included temperature as a neural network input to parameterize a temperature-dependent potential of mean force. We considered that temperature induces fluctuations and biases in the mean forces and thereby affects structural correlation. To this end, we transformed the inverted temperature *T* into an embedding to be combined with the convolution operations, thereby modifying the structural representation. In particular, we generated a thermodynamic embedding *γ* according to

where $M\u0303$ is a linear matrix of size 1 × *N*_{basis} that maps the temperature dependence into the same size as the atomic embedding and *T* is the thermostat temperature of the all-atom NPT simulation. After the convolution layers, the intramolecular embedding was summed with the intermolecular embedding; the dot product of the atomic and thermodynamic embeddings was taken. A shallow, two-layer MLP was used as readout layer to map individual embeddings into atomic energies as in Eq. (4),

### C. Stochastic force matching

The force-matching approach for parameterizing force fields was proposed in Ref. 10 to reproduce structural correlation functions. Given an atomistic potential energy function *V*(*x*) with the partition function *Z*, the probabilistic distribution of atomistic configurations is

The distribution function of coarse-grained variables *P*_{CG}(*z*) and the corresponding many-body potential of mean force *A*(*z*) is the log-likelihood of the distribution of coarse-grained variables. It is a free energy obtained from marginalizing over micro-states that are mapped to the coarse-grained coordinates,

The mean force of the coarse-grained variables is the average of instantaneous forces conditioned on the mapping **b***x* = *z*,^{31,32}

where **b** corresponds to the mapping assignment from the all-atom to the coarse-grained representation. Force matching requires the minimization of the loss function,

where *θ* are the parameters in *V*_{CG} and ∇*V*_{CG} represents the “coarse-grained forces” that can be obtained from automatic differentiation. As proposed by Zhang *et al.*, if $Ez[Finst]=F(z)$ is assumed, the loss function can be reformulated as the following minimization target:

This minimization target has also been used in several works with stochastic gradient descent.^{15,26} Instead of matching mean forces that need to be obtained from constrained dynamics, the model minimizes *L*_{inst} with respect to *V*_{CG}(*z*) and *E*(*x*). This target provides a data-driven target for optimizing coarse-grained force fields.

## III. RESULTS

The model was trained on molecular dynamics simulations performed with the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) engine,^{33} with a publicly available custom force field for ionic liquids.^{3,4} The simulation box was created with a size of 300 ion pairs with PACKMOL.^{34} A 1 ns NPT simulation with a Nosé–Hoover thermostat and barostat was run to equilibrate the box followed by a 10 ns simulation performed at different temperatures, each repeated three times with different randomly initiated starting velocities to obtain larger space sampling. A subsample of the dataset was used for training: out of the 30 000 timesteps, 6000 were evenly sampled and used further. The model was then trained on data from 300 K, 350 K, 450 K, and 500 K simulations, with hyperparameters chosen using Bayesian hyperparameter optimization as detailed in the supplementary material. New CG molecular dynamics simulations with the trained models were run at 300 K–500 K in 50 K increments, thus including experiments at 400 K, outside of the training data. The new CG simulations were run at the NVT ensemble with the Nosé–Hoover thermostat at a time step of 1 fs. The structure and dynamics of CG simulations were then compared with the all-atom ground truth.

In keeping with the convention in the field, we utilized one bead to represent the anion. The choice of number and mapping of beads to represent the cation is less settled. Here, we tested two representations of 1-butyl-3-methylimidazolium through either two or four pseudoparticles (CG3 and CG5, respectively, when counting anion beads).

Figure 1(a) shows the mapping and average reconstruction of 1-butyl-3-methylimidazolium. As expected, the four-bead model led to good reconstruction, since it captures the orientation of the alkyl chain. The assignment learned by the auto-encoder structure agrees with prior intuition: one bead for the side methyl group and two ring atoms, one bead for the two following carbon atoms in the ring, one bead for the other imidazolium nitrogen and two connected linear carbons (Fig. 6), and a final bead for the terminal ethyl group. Because the model only learns to reconstruct average positions, rotationally equivalent atoms were typically reconstructed near their center of gyration.

Figure 3 shows the radial distribution functions (RDF) of the CG5 model (four-bead cation plus tetrafluoroborate anion). The ground truth reference data are the result of applying the CG mapping to held out all-atom frames. The RDFs are in close agreement with the ground truth data. Both the cation–anion and anion–anion interactions have been reproduced to a high degree. Additionally, the cation–cation interactions and the positions of all peaks have been accurately recovered, but their intensities show some deviations from the ground truth. The difference between the second and third cation–cation peaks further decrease at higher temperatures.

The RDF at 400 K shown in Fig. 3(b) shows that the model produces stable dynamics at temperatures outside the training set and can recover the structural properties of systems with similar accuracy as at reference temperatures. Bond distance distribution comparisons are given in the supplementary material.

The transport properties such as self-diffusivity are very relevant to the use of ILs as ionic conductors, and effective CG simulations should ideally recover the dynamic properties to the best degree possible. Experiments have shown that the self-diffusivity of the 1-butyl-3-methylimidazolium tetrafluoroborate ionic liquid exhibits Arrhenius behavior.^{17,18} Figure 4(a) shows the mean-squared displacement calculated by the Einstein relation at 300 K,

The evolution of CG T-NFF displacement is about an order of magnitude faster than the ground truth all-atom system. This is because in the CG model, the small, fast movements of hydrogen atoms are averaged out, thus reducing the overall friction of the system and speeding up the dynamics, and more discussion about change of dynamics is given in the supplementary material. Figure 4(b) shows the self-diffusivity across a temperature range from 300 K to 500 K, including out-of-sample simulations at 400 K. Both the ground truth and the T-NFF show similar relations between the cation and the anion. At lower temperatures, the cation has strictly faster dynamics than the anion, but at higher temperatures, this difference shrinks. The T-NFF shows the same relation at lower temperatures, but at higher temperatures, the difference shrinks faster than for the ground truth.

Figure 5(b) shows the Arrhenius-type relation of the coarse-grained anion and cation. The Pearson correlation coefficient between for the linear fit is −0.981 and −0.982 for cations and anions, respectively. The cation and anion curves intersect in the high temperature range as in the ground truth simulation (Table I), although the intersection occurs at a higher temperature.

. | . | E_{a}
. | (kJ mol^{−1})
. | . |
---|---|---|---|---|

Molecule . | GT . | T-NFF . | NFF . | CG3 . |

Cation | 20.7 ± 1.5 | 14.3 ± 1.6 | 10.4 ± 0.6 | 9.4 ± 0.7 |

Anion | 21.6 ± 2.9 | 16.7 ± 1.8 | 12.2 ± 0.5 | 8.7 ± 0.6 |

. | . | E_{a}
. | (kJ mol^{−1})
. | . |
---|---|---|---|---|

Molecule . | GT . | T-NFF . | NFF . | CG3 . |

Cation | 20.7 ± 1.5 | 14.3 ± 1.6 | 10.4 ± 0.6 | 9.4 ± 0.7 |

Anion | 21.6 ± 2.9 | 16.7 ± 1.8 | 12.2 ± 0.5 | 8.7 ± 0.6 |

Figure 5 weighs the impact of the temperature embedding in the model by comparing the outcome of CG simulations from a temperature-independent model (NFF) with the temperature-dependent T-NFF results. Both models were trained on the same data from all-atom simulations at 300 K, 350 K, 450 K, and 500 Kand have the same hyperparameters. Figure 5 shows the RDFs and Arrhenius relations between the ground truth and the models. The two models reproduce the structural properties to a similar degree, and the Arrhenius relation of the system is similarly linear. However, as seen from the slopes in Fig. 5(b) and activation energies in Table I, the temperature-averaged NFF has not learned the temperature effect on dynamics as well as its counterpart and produces a much lower activation energy in addition to accelerated dynamics. It also underestimates the temperature at which cation and anion diffusivities cross.

Figure 6 compares the performance the CG3 model (two-bead cation) to the CG5 model. In Fig. 6(a), it can be seen that both the three- and five-bead models recover the structural properties at a temperature that has not been included in the training data. Nevertheless, significant differences are seen in the dynamic properties derived from the Arrhenius relation. Both the models have recovered the linear relation, including the 400 K out-of-sample trajectory, but the CG3 model grossly underestimates the activation energy (Table I).

## IV. DISCUSSION

We have introduced a novel extension of graph convolutional neural networks to learn many-body PMFs by separating short-range intramolecular and longer-range many-body intermolecular terms, and incorporated a temperature transferable embedding to learn the temperature dependence of a many-body PMF. The model has been trained on all-atom simulations of a ionic liquid system, 1-butyl-3-methylimidazolium tetrafluoroborate, whose PMF is known to be difficult to construct. The reconstruction of the structural properties has been achieved to a high degree. Furthermore, the Arrhenius relation for self-diffusivity has been obtained with the neural force field, but with faster dynamics. The fast dynamics arise due to the averaging of fast motions in the system, which, in turn, decrease the overall friction of the system. If dynamics cannot be reconstructed exactly, it is preferable that the activation energy for diffusion remains constant between all-atom to CG simulations, and a simple scaling of the dynamics is observed. Other strategies based on spectral matching^{25} or differentiable simulations^{35} could be utilized to further condition the encoding function and PMF to reproduce system dynamics.

Results from a single graph convolutional neural network with a single interaction block for both inter- and intra-molecular interaction have not been included in this work for comparison with the novel dual graph convolutional neural network. This is because no stable simulations could be performed with the simpler model. No choice of hyperparameters led to a model that could run coarse-grained IL molecular dynamics simulations without the system collapsing on itself. This is likely because in a coarse-grained model, the distances between the beads of a single molecule are on a much larger scale than those of atoms in the same molecule. These intramolecular distances are then closer to intermolecular distances, and as such it becomes harder for a single interaction layer to separate them and learn the structural and dynamic properties of the system. This highlights the key contribution of the dual graph convolutions introduced in this work.

A comparison in performance was made between the temperature-dependent neural force field (T-NFF) and the neural force field without the temperature learning embedding layer (NFF) (see Fig. 5). Both models overestimated the dynamics at lower temperatures equally, but at higher temperatures, NFF was significantly slower. The T-NFF showed more homogeneous acceleration of dynamics across the temperature range and as such reproduced the Arrhenius-like temperature dependence much better, largely scaling the overall kinetics by a constant factor of 3–6 at all temperatures. Interestingly, the T-NFF model faithfully captured the loss of linearity in the Arrhenius plot at the lowest temperature that is present in the all-atom simulation and may be related to the system getting trapped in a glassy state. Previous works on coarse-grained ILs, such as the work of Roy *et al.*^{19} and Moradzadeh *et al.*,^{36} heavily underestimated dynamics. The model of Roy *et al*. was improved by Merlet *et al*. to attain an agreement of diffusivity,^{21} although at the cost of accuracy of structural properties.

Different cation representations were also compared. Both three- and five-bead models recovered the structural properties to a similar degree, but the five-bead model led to better learning of the dynamic temperature relation, despite the overestimation of the rate of the self-diffusion in both cases. The model described in this work can accurately learn the many-body PMF of a CG model for a complicated system, is accurate and transferable over a range of temperatures, requires little user input, and can be applied to other condensed-phase systems, leading to easier, more accurate, and more generalizable CG simulations of complex condensed-phase systems.

## SUPPLEMENTARY MATERIAL

See the supplementary material for the description of training details and hyperparameter selection, further description of parameters of Eq. (1), bond distance distribution comparisons for the five-pseudoparticle model, RDF, and bond distance distribution comparisons of the three-pseudoparticle model. Finally, mean-squared displacement comparisons by shifting timescale or temperature are also provided.

## DATA AVAILABILITY

The computer code use and training all-atom trajectories used to train the models are available at https://github.com/learningmatter-mit/T-NFF.

## ACKNOWLEDGMENTS

The authors thank the MIT Energy Initiative (MITEI), Toyota Research Institute, MIT-IBM„ and DARPA (Award No. HR00111920025) for financial support.