The phase behavior of complex fluids is a challenging problem for molecular simulations. Supervised machine learning (ML) methods have shown potential for identifying the phase boundaries of lattice models. In this work, we extend these ML methods to continuous-space systems. We propose a convolutional neural network model that utilizes grid-interpolated coordinates of molecules as input data of ML and optimizes the search for phase transitions with different filter sizes. We test the method for the phase diagram of two off-lattice models, namely, the Widom–Rowlinson model and a symmetric freely jointed polymer blend, for which results are available from standard molecular simulations techniques. The ML results show good agreement with results of previous simulation studies with the added advantage that there is no critical slowing down. We find that understanding intermediate structures near a phase transition and including them in the training set is important to obtain the phase boundary near the critical point. The method is quite general and easy to implement and could find wide application to study the phase behavior of complex fluids.

## I. INTRODUCTION

The phase behavior of complex mixtures is important in many applications, and understanding them from a molecular perspective is of fundamental importance. Molecular simulation of complex fluids has become feasible, even at the atomistic level, and elucidating the phase behavior of these systems is significant. An example is the phase behavior of polymers in ionic liquids. There are several interesting features in the phase diagram, and studying them using atomistic models would be interesting.

Computer simulations for the phase behavior of complex mixtures are challenging. Traditional methods using the Gibbs ensemble^{1,2} require the insertion and deletion of molecules, which is difficult in a dense system. The thermodynamic integration method of Kofke,^{3} which is very successful in single component systems, requires the insertion/deletion of one of the components in a binary mixture. Many systems of interest, such as polymers in deep eutectic solvents, have a minimum of four components. We had recently introduced a method,^{4} which we refer to as the interface method, where the phase diagram is obtained from the simulation of a single cell with an interface. The interfaces move during the simulation, however, and a spatial concentration autocorrelation function was used to (spatially) align the instantaneous concentration profiles from different snapshots. The method is successful for several model systems but fails near the critical point or when the surface tension is low enough that the interface does not maintain its integrity. Other methods for estimating the phase diagram are therefore of interest.

Machine learning (ML) methods have shown promise in predicting the phase behavior of lattice models. Two types of methods have been used for phase transitions, unsupervised methods, and supervised methods. Unsupervised methods attempt to reveal patterns in the data using a principal component analysis. As a phase transition is approached, the data separate into different categories, i.e., show a “clustering,” thus allowing for an identification of phases. Supervised methods train a model, for example, a neural network, on known phases. When unknown phases are interrogated by the model, it predicts the probability that it belongs to one of the learned phases, thus allowing one to determine the phase boundary.

Supervised machine learning using the convolutional neural network (CNNs) has been successfully used to study the phase behavior of lattice systems. In this work, we extend these approaches to off-lattice systems and test the performance against two model systems.

Convolutional neural networks (CNNs) have been used to predict the phase diagram of a variety of lattice Hamiltonians.^{5} The essential idea is that different phases are composed of distinct patterns than can be learned by a CNN. CNNs provide accurate predictions for the phase diagram of a number of lattice models, including the ferromagnetic Ising model,^{5,6} Hubbard model,^{7,8} XY model,^{9} and Anderson model.^{10,11} This is true even if some of the training set data are intentionally mislabeled.^{12}

There have studies of off-lattice systems using machine learning.^{13–23} Many of these studies focus on unsupervised methods to determine important collective variables and order parameters. Walters *et al.*^{13} used neural networks to study defects in liquid crystals. They discretized space and obtained the defect density for each type in each cell from molecular simulations, which they used in a neural network model for defect recognition. Jadrich *et al.*^{15,16} used unsupervised machine learning methods to study the phase behavior of a variety of systems. A key aspect is the construction of relevant features that are important for a given phase transition.

In this paper, we develop a supervised ML method using CNNs to study the phase behavior of off-lattice systems. Our method is complementary to the unsupervised methods described above. We map the off-lattice system onto a grid and then use ML methods that have been successful for lattice models. We test the model by computing the phase diagram of two models, the Widom–Rowlinson (WR) mixture, and a symmetric polymer blend (PB). For the WR mixture, we use two training sets—a phase separated and mixed system at the same mole fraction. We find that this can accurately predict the phase diagram of the mixture. For the polymer blend, we find that an additional training set is necessary for accurate predictions, and with that modification, the method is accurate for the phase diagram.

## II. SIMULATION MODELS AND METHODS

We consider two molecular models in this work, the Widom–Rowlinson (WR) mixture and symmetric polymer blend, for which results from conventional simulations are available.

The WR mixture is a binary (A/B) mixture where particles of like species do not interact, and there is a hard sphere repulsion between particles of species A and B, with a hard sphere diameter *σ*. As the total number density is increased, the system undergoes a liquid–liquid phase separation; by symmetry, the critical mole fraction is 0.5. At each density and concentration, initial configurations are prepared with the particles randomly distributed in a cubic cell. The system is evolved via single particle Monte Carlo (MC) moves at a constant number of molecules *N*, volume *V*, and temperature *T* (which in this system does not play a role). The total number of attempted moves is ∼7 × 10^{5} MC steps per particle. We investigate mole fractions, *x*_{A} = *N*_{A}/*N* = 0.125, 0.1875, 0.25, 0.3125, 0.375, 0.4375, 0.4863, and 0.5, where *N*_{A} is the number of particles of species A, and 51 number densities ranging from *ρσ*^{3} = 0.5 to 1.0. We study the system size of *N* = 1024, 2048, 3072, 4096, and 8192. We generate 100 independent configurations at each state point.

In the symmetric polymer blend (PB) model, each component consists of 32 freely jointed beads connected by harmonic springs. All beads interact via a Lennard-Jones (LJ) potential, which is cut and shifted at a distance of 2.5 *σ* where *σ* is the collision diameter and is the same for all beads. The LJ well depths are *ε*_{AA} = *ε*_{BB} = *ε* and *ε*_{AB} = 0.9*ε*. This system displaces an upper critical solution temperature with a liquid–liquid phase transition as the temperature is lowered. By symmetry, the critical mole fraction is *x*_{A} = 0.5.

We study systems consisting of 64, 96, or 128 molecules in a cubic cell. Initial configurations are generated by randomly inserting polymers on a cubic lattice at large volumes The system is evolved using molecular dynamics simulations at constant NPT (P is the pressure) using the GROMACS 5.1.2 software package,^{24} with a Nose–Hoover thermostat, a Parrinello–Rahman barostat with pressure *P* = 1*ε*/*σ*^{3}, and a leap-frog algorithm with a reduced time step of 0.004 $\sigma \epsilon /m$. The system is equilibrated until the density reaches a steady value, which takes ∼3 × 10^{6} time steps. We generate 30 trajectories at each state point using simulations in the *NVT* ensemble, averaged over 10^{8} time steps. We study mole fractions of *x*_{A} = 0.125, 0.1875, 0.25, 0.3175, 0.375, 0.4375, and 0.5 for at least 20 reduced temperatures *k*_{B}*T*/*ε* ranging from 0.6 to 4.0 (the system freezes below *k*_{B}*T*/*ε* = 0.6).

## III. CONVOLUTIONAL NEURAL NETWORK

The essence of the ML approach is to use pattern recognition to distinguish between mixed and phase separated states. We describe the ML with the Widom–Rowlinson mixture as an example, and this is summarized in Fig. 1. The input is configurations from equilibrium simulations, which undergo pre-processing in order to convert off-lattice particle positions into a grid using an interpolation scheme. A matrix of interpolated densities is then input into two convolutional neural network (CNN) layers, which are then converted (“flattened”) into a one-dimensional vector. Two linear classifier layers are used to predict the output, which for the WR model is either 0 (mixed) or 1 (phase separated). This neural network is trained on many mixed and phase separated configurations. Other configurations give an output between 0 and 1 and, therefore, a probability that the configuration is phase separated, which is used to estimate the phase transition density. We then optimize the CNN filter size to determine the best prediction for the transition density. Finally, finite size scaling (FSS) is used to determine the infinite system transition density. Details of this procedure are given below.

The first step is to convert molecular simulation data, in terms of positions of the particles, into a color map. We assign particle positions to values on a grid using linear interpolation using the Scipy interpolation library. For the WR model, we assign a value of 1, 0, and 0.5 for species A, B, and vacant space, respectively. For the polymer blend, we use 1, −1, and 0 for species A, B, and vacant space, respectively. To increase the number of coordinates, we produce virtual coordinates using an augmentation method that generates a number of datasets by flipping, translating, and rotating images. This is standard way to increase coordinate information in ML approaches, even though the relative positions of the atoms are identical in all virtual coordinates. We make 100–400 augmented images per original image for the training dataset and ten augmented images per original image for the prediction data in the case of the polymer blend.

The output of the pre-processing stage is a vector, *S*_{i} of dimension *n*_{g} × *n*_{g} × *n*_{g} where *n*_{g} is the number of grid points on each axis, chosen to be an integer close to $N3$ and *i* is the index describing a particular snapshot from the molecular simulations. Each element of *S*_{i} has a value between 0 and 1 for the WR model and between −1 and 1 for the PB model.

The ML model consists of two CNN layers and two linear classifier layers. The CNN layers have *n*_{f} = 16 filters. The first CNN layer loads input data with periodic padding of dimension 1 × 1 × 1. The filter size *n*_{s} of the first layer is optimized as described shortly and is in the range 2 ≤ *n*_{s} ≤ *n*_{g}/2. The filter size of the second layer is kept fixed at 2. In the first CNN layer, the *j*th feature map contains a set of *n*_{s} × *n*_{s} × *n*_{s} weights and a single bias *b*_{j}. The output of the second CNN layer is flattened into a single vector and fed into the two linear classifier layers that have *n*_{p} = 256 weights and a scalar-valued bias. All scores as outputs of CNN layers are obtained via a linear (ReLu) activation function, e.g., $fx=max0,x$. The linear classifier layers use the normalized exponential function (Softmax) to predict an output. We implement the networks with the Keras library using the Tensorflow backend.^{25}

The weights and biases of the ML model are optimized in the training process, but the hyper-parameters, *n*_{s}, *n*_{f}, and *n*_{p}, have to be specified *a priori*. The filter size *n*_{s} determines the correlation or patterning range of small spatial features in the color map. Small values of *n*_{s} = 2 or 3 are often used in studies of the phase behavior of simple models.^{5,6,8,9,26} While small filter sizes might be appropriate in systems with short-ranged correlations, systems with longer-ranged correlations might require larger filter sizes.^{16} The choice of *n*_{s} has an impact on the predicted transition density (or temperature). Figure 2 depicts the predicted transition density of the WR model for *x*_{A} = 0.125 as a function of *n*_{s}. Too small a filter size probably misses longer ranged correlations, and too large a filter size increases variance error. We therefore choose *n*_{s} at the minimum in the curve in Fig. 2 for the WR model. For the PB model, the transition temperature shows a maximum with respect to *n*_{s}, and we choose the value of *n*_{s} for which it is maximum. We do not optimize the parameters *n*_{f} and *n*_{p}.

The supervised machine learning algorithm has two steps: training and prediction. For the training of the CNN we use known phases. For the WR model and *x*_{A} < 0.375 of the PB model, these are a phase separated and a mixed phase for which the output values are 1 and 0, respectively. For *x*_{A} ≥ 0.375 of the PB model, we use three training phases, the three-dimensional percolating channel (3C) phase, the lamellar phase, and a disordered phase, with outputs 2, 1, and 0, respectively. The choices for the training set are discussed in detail shortly. Each color map, *S*_{i}, in the training set has a known output value, e.g., 0 or 1 in the WR model or 0, 1, or 2 in the PB model. The CNN is trained to minimize a loss function, the standard cross entropy function. The weights and biases of the CNN model are optimized to minimize the loss function through back-propagation with the Adam algorithm.^{27,28} We use 30 epochs through the entire dataset with every 200 images per batch for the WR model and 50 images per batch in the PB model.

Once the ML model has been trained, it can make predictions for the unknown configurations. Since all weights and biases in the model are known from the training process, the input configuration results in an output, which is the probability of observing one of the training phases. The probability of phase separation, *P*, is plotted in Fig. 3 for the WR model. The transition point is *P* = 0.5. This gives the finite size transition density *ρ*_{b}(*N*).

We use finite size scaling (FSS) to obtain the infinite system phase transition point from the finite systems studied in this work.^{29–31} We find that plotting the transition point for a given *N*, i.e., *ρ*_{b}(*N*) vs *N*^{−1/3ν}, where *ν* = 0.630 12, allows us to extrapolate to the infinite *N* limit. This is depicted in Fig. 4 for the WR model.

A key aspect of the process is the selection of the training set. Since we perform the equilibrium simulations for various states, an examination of these configurations provides insight into how the training sets must be chosen. In the case of the WR model, the phase separated system displays different morphologies, e.g., sphere, cylinder, or lamellar, at different mole fractions. This behavior has been previously observed in simulations of the vapor–liquid phase behavior of Lennard-Jones particles^{32} and polyelectrolytes in poor solvents.^{33} We therefore use different training sets for different mole fractions where the mole fraction is fixed, and we train the data at two different densities (one mixed and one phase separated); 0.5 and 1.0 *ρσ*^{3} for all mole fractions.

In the case of the PB model, there are intermediate structures in the simulations at different temperatures at a certain range of mole fractions of polymer. For 0.375 ≤ *x*_{A} ≤ 0.5, for example, there is a lamellar phase at low temperatures, and there are several other structures, e.g., perforated lamellar, two dimensional channels, and three dimensional channels, as the temperature is increased, before the disordered phase is observed. If these structures are not learned, the ML method will not be able to distinguish them from the disordered phase. Therefore, in addition to the disordered and phase separated state, we include a third structure, with coordinates corresponding to the three dimensional percolating channel (3C) into the training set.

The procedure is modified as follows: We first use three training sets with *n*_{s} = 2 with an output of 0 for the mixed phase, 1 for the lamellar, and 2 for the 3C phase. Using these, we find the temperature *T*_{int,max} where the probability of the 3C phase is maximum. We then repeat the procedure with all images at *T*_{int,max} and at *T*_{mix} = 4.0 *k*_{B}*T*/*ε* as the training sets of separated and mixed phases, respectively. With these two training sets, we find the optimum filter size *n*_{s}. Once the transition temperature is obtained, we extrapolate to infinite systems using finite size scaling.

For *x*_{A} < 0.375, the PB model shows either sphere or cylinder phases as being separated, and therefore (as in the WR model), we train the images at 4.0 *k*_{B}*T*/*ε* and the lowest temperature where the cell length is close to an integer of *σ*, i.e., 0.8, 1.0, and 0.6 *k*_{B}*T*/*ε* for *N* = 64, 96, and 128, respectively.

## IV. RESULTS AND DISCUSSION

The bias of the first CNN layer provides a way to visualize the phase separation. Figure 5 shows the color map for various value of the bias in the first layer. Note that these biases are obtained from the fitting procedure and are not used any further beyond that. There are roughly two types of CNN filters, those that show a spatial gradient of weights and those that show a random weight parameter distributions, which is in good agreement with previous studies of lattice models.^{9} Interestingly, when sorted in the order of bias, the filters that have two large domains of weights have a high bias. This is because when the weight distribution of filters matches the distribution of particles in a cell, the convolution calculation has a high score, which gets corrected by adding bias. The high positive value is likely for a system in a phase separated state. The values of the bias thus provide insight into the nature of the configuration. Similar results are seen for the polymer blend.

The ML predictions for the phase behavior of the WR model are in good agreement with previous simulation results.^{34,35} Figure 6 depicts the phase diagram obtained from the ML method to previous Monte Carlo simulation results. The agreement is very good over the entire range of concentrations. Note that the ML method has no disadvantages near the critical point where standard Monte Carlo methods are challenged. For the WR model, the ML approach is not as efficient as the Monte Carlo methods. However, this is a challenging test of the method because the phase separation occurs from an entropic driving force.

In the polymer blend model, the incorporation of a third training set proves important because the training set for the phase separated state has a lamellar morphology, and several other phase separated morphologies appear as the temperature is raised, which are much closer to the mixed state than the lamellar state. These morphologies are clearly seen in the simulations of the phase separated states at intermediate temperatures (below the critical point). It is therefore important to include a third training set that is phase separated but does not have a lamellar morphology.

We therefore include a third training set, which is a three-dimensional channel (3C) phase, in the training set. Figure 7 depicts the probability of the three phases with all three training sets (lamellar, 3C, and mixed). In this figure, the temperature at with *P*_{3C} is maximum is *k*_{B}*T*_{3C,max}/*ε* = 1.9. We then chose the two training sets to be the 3C and mixed phase and proceed with the phase transition temperature evaluation.

Incorporating the 3C training set makes a significant difference to the prediction of the transition temperature and makes the calculations consistent with those at lower mole fractions. Figure 8 compares the probability of a mixed phase as a function of temperature for *x*_{A} = 0.4375 and *N* = 64 for two schemes: when the lamellar and mixed phase are the training sets and when the 3C and mixed phase are the training sets. The predictions for the finite size transition temperature are *k*_{B}*T*/*ε* = 1.75 for the former and *k*_{B}*T*/*ε* = 2.56 for the latter, which is more consistent with the data at lower mole fractions.

The ML method is in good agreement with previous simulations for the phase diagram of polymer blends. Figure 9 compares the results of this work to that of Gromov and de Pablo^{36} and Jung and Yethiraj.^{4} The good agreement highlights the utility of the method as well as the importance of using appropriate training sets in the procedure.

## V. CONCLUSIONS

We present a supervised machine learning framework to obtain the phase diagram of off-lattice models. The coordinates of the particles in the simulation are assigned to a grid and input into a convolutional neural network (CNN). The filter size in the first CNN layer is optimized to obtain the phase diagram. The results for the Widom–Rowlinson mixture and symmetric polymer blend are in excellent agreement with previous simulations using Monte Carlo methods. A key aspect is the selection of training sets for the CNN. For the WR model, we use two training sets, i.e., phase separated and mixed, at a constant mole fraction. For the PB model, it is important to use a third training set to characterize structures that are formed as the transition point is reached, which are different from both the mixed and lamellar phase separated states. It is essential to include, in the training set, a structure that is closer to the mixed state.

The method should be generally applicable to both macro-phase and micro-phase separation. For example, it might be a useful way to map out the phase diagram of systems that show lyotropic liquid crystalline phases, such as lamellar, gyroid, and hexagonal, from simulations of systems of modest size.

The method has some drawbacks. For one, the results can be sensitive to the choice of hyper-parameters. We optimized one of the parameters, and an investigation of the importance of the other parameters and that of other ML architectures might be interesting. Automated ML (AutoML) techniques that tune deep neural networks, such as Cloud AutoML by Google and AutoKeras by DATA Lab at Texas A&M University,^{37} are also interesting directions. The second drawback is that long simulations both for the training and prediction phases are necessary.

Simulations of the phase diagram of mixtures require the insertion of molecules, which becomes essentially impossible for complex fluids at high density. Methods that require the simulation of an interface fail near the critical point. In principle, the ML method does not suffer from any of these fundamental disadvantages. It requires only standard simulations of mixed phases and can be applied to any system. It is therefore likely to find applications where other methods are not feasible, but where it is still a viable option.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## ACKNOWLEDGMENTS

This work was supported by the U.S. Department of Energy, Basic Energy Sciences (Contract No. DE-SC0017877). We acknowledge computational resources and assistance of the UW-Madison Center for High Throughput Computing (CHTC). Molecular simulations were primarily performed at the Oak Ridge Leadership Computing Facility through the Director’s Discretion Program under Project No. CHM132.