The functionality of many polymeric materials depends on their glass transition temperatures (*T*_{g}). In computer simulations, *T*_{g} is often calculated from the gradual change in macroscopic properties. Precise determination of this change depends on the fitting protocols. We previously proposed a robust data-driven approach to determine *T*_{g} from the molecular dynamics simulation data of a coarse-grained semiflexible polymer model. In contrast to the global macroscopic properties, our method relies on high-resolution microscopic details. Here, we demonstrate the generality of our approach by using various dimensionality reduction and clustering methods and apply it to an atomistic model of acrylic polymers. Our study reveals the explicit contribution of the side chain and backbone residues in influencing the determination of the glass transition temperature.

## I. INTRODUCTION

Most polymeric materials in application are used in their glassy state. Therefore, functionality and the applicability of these materials depend on the glass transition temperature, *T*_{g}. This necessitates an accurate prediction of *T*_{g} to better control the properties of the material for specific applications. The static properties, such as radial distribution function and structure factor, do not show any notable change around *T*_{g}; in contrast, the dynamic properties, such as viscosity and relaxation time, substantially increase in a super-Arrhenius manner.^{1–4} Experimentally, *T*_{g} of polymer melts is calculated from the change in heat capacity using differential scanning calorimetry (DSC),^{5} thermal expansion coefficient using thermomechanical analysis (TMA),^{6} or viscoelastic behavior of polymers using dynamic mechanical analysis (DMC).^{7} However, the experimentally relevant cooling rates are inaccessible in computer simulations, and this leads to ambiguity while connecting computational models of polymeric materials to experimentally observed properties. Computationally, *T*_{g} is determined from changes in the macroscopic properties such as specific volume, density, or energy. The reliable predictions of *T*_{g} are challenging, particularly for systems where these changes occur gradually.^{8–12} Recent studies have shown attempts to define *T*_{g} based on the changes in molecular structures of polymeric materials. For example, by quantifying the changes in specific dihedral angles and transitions between states defined by those angles.^{10,13} Baker *et al.*^{14} have observed that the glass transition temperatures for polymer melts with different chain lengths can be collapsed on a master curve using the local intra-chain conformational dynamics with packing. Machine learning (ML) methods hold great promise to automatize the determination of structural properties of the glassy systems that can reflect the changes in *T*_{g} from molecular dynamics (MD) simulation data, but their application for polymeric materials is limited.^{10,15,16} Iwaoka and Takano^{15} applied principal component analysis (PCA)^{17} to Cartesian coordinates of short polymer chains in a melt. They showed a change in eigenvalue distributions before and after glass transition and connected this change to approximate conformational entropy differences.

Recently, we proposed a new unsupervised machine-learning approach to estimate *T*_{g} from molecular dynamics simulation trajectories.^{18} Our data-driven methodology takes into account the structural information of individual chains and makes use of high-resolution information obtained from molecular dynamics simulations. By combining principal component analysis^{17} and a density-based clustering algorithm (DBSCAN),^{19} we identified *T*_{g} at the asymptotic limit even from relatively short-time trajectories. The PCA captured the change in nature of the fluctuations in the system: from conformational fluctuation above *T*_{g} to localized rearrangements below *T*_{g}. Our method was introduced on a coarse-grained model of polymer melts containing weakly semiflexible polymer chains.^{20} In this work, we extend it and show the generality of our approach by using nonlinear dimensionality reduction, agglomerative hierarchical clustering^{21} and apply it to other glass-forming liquids.

Here, we analyze the simulation trajectories of all-atom acrylic polymers^{22} found in acrylic paints: poly(methyl methacrylate) (PMMA), poly(ethyl acrylate) (PEA), and poly(n-butyl acrylate) (PnBA); see Fig. 1. The glass transition temperatures for acrylic polymers are close to room temperature, i.e., 333–387 K^{23–26} for PMMA, 249 K^{27} or 231 K^{28} for PEA, and 223 K^{25} for PnBA. This proximity to room temperature is relevant for both, the stability of a painting and the applicability during the painting process. However, it also means that the paints can suffer from crack formation at low temperatures while becoming sticky at high temperatures. Consequently, the temperature has a significant impact on the degradation of acrylics. As our analysis relies on the microscopic observables, we explicitly investigate the role of the side chain and backbone atoms in determining the glass transition temperature. Extending our approach to another nonlinear multi-dimensional scaling approach enables us to apply it to a very small number of observations per temperature, which is the case for simulation data obtained with, e.g., continuous cooling protocols.^{9} Here, we employ agglomerative clustering that requires minimum prior knowledge about the system and hyperparameter tuning. Overall, with various dimensionality reduction and clustering techniques, the generality of our approach is tested for these acrylic polymer melts.

## II. METHODS AND SIMULATION DETAILS

### A. Simulation details and data preparation

#### 1. Atomistic simulations

Molecular dynamics simulations of bulk homopolymers (PMMA, PEA, and PnBA) were presented in Ref. 22. They were performed using the general Amber force field^{29} with Gromacs 2019 software.^{30,31} The isotactic 15-mer polymer chains were constructed with AmberTools,^{32} and *n*_{ch} = 100 polymer chains were placed randomly in a box with initial dimensions 9 × 9 × 9 nm^{3} using Packmol.^{33} Each system was minimized for 1000 steps by using the steepest descent algorithm. Following the minimization, we equilibrated for 200 ps using the NVT (constant number of particles, volume, and temperature) at 600 K. We further equilibrated the system with NPT (constant number of particles, pressure, and temperature) ensemble for 10 ns to maintain temperature (600 K) and pressure (1 bar), respectively. After this short equilibration at 600 K, the temperature was decreased to 100 K using a cooling rate of 20 K/ns to calculate the glass transition temperature and obtain correct polymer density at each temperature. The coordinates of the system at temperature intervals of 50 K were saved and further equilibrated for 10 ns using NPT ensemble to study the temperature-dependent properties of the polymers. Atomic coordinates were saved every 100 ps for the trajectory analysis resulting in 100 frames per temperature. Further information on the simulation details is given in Ref. 22.

#### 2. Input data

To identify the liquid-to-glass transition from the simulation trajectories, we extract information from each individual chain independently. As possible descriptors, we use sets of internal pairwise distances. To retain the information about conformational fluctuations of individual polymer chains of 15-mers, we choose three different sets of descriptors—all pairwise distances between (a) the backbone C-atoms (C2, C3) (**bb-bb**), (b) side chain C-atoms (C5, C6 for PMMA; C5, C7 for PEA; C5, C9 for PnBA) (**sc-sc**), and (c) the backbone C-atoms (C2) and the end side chain C-atoms (C6 for PMMA; C7 for PEA; and C9 for PnBA) (**bb-sc**). A snapshot of selected carbon atoms is given in Fig. 1. The yellow beads correspond to the backbone atoms (C2, C3), and blue beads, to the side chain atoms (C5, C6/C7/C9) for each monomer.

We extract this information from 100 snapshots of each NPT run (we get similar results using NVT simulations) of simulation trajectories at 11 temperatures in the range 100–600 K for each chain independently. In this way, for each chain in the melt, we construct the data matrix $Xch\u2208RM\xd7L$, where *ch* = 1, …, *n*_{ch} is a chain index, *n*_{ch} is a number of chains in the melt, *M* is the number of simulation snapshots/frames, and *L* is the number of descriptors: all pairwise distances between side chains C-atoms (**sc-sc**), backbone C-atoms (**bb-bb**), and backbone side chain (**bb-sc**) separately (for our systems here, the length of descriptors *L* = 435 for all three cases).

### B. Data-driven identification of *T*_{g}

#### 1. Dimensionality reduction

For both methods I and II, we first perform a dimensionality reduction once the possible sets of the internal descriptors are defined (as described in Sec. II). In the original paper,^{18} we used (PCA),^{17} which identifies linearly uncorrelated subsets of the data space containing most of the variance of the original data. PCA has been successfully used to characterize different physical phenomena, i.e., the phase transition in conserved Ising spin systems,^{34,35} secondary structure prediction of proteins,^{36} shape fluctuation in DNA,^{37} and glass transition temperature prediction in polymer melts.^{15,18} Here, PCA is applied on the internal pairwise distances’ matrix **X**_{ch} of a randomly selected single chain. Before applying PCA, **X**_{ch} is standardized column wise, i.e., it is mean free, and standard deviation is equal to one. First, the covariance matrix $Cch=XchTXch\u2208RL\xd7L$ is calculated. Then, the pairs of eigenvalues *λ*_{ch,i} and eigenvectors **v**_{ch,i} for *i* = 1, 2, 3, …, min (*L*, *M*) are found for *C*_{ch} and sorted in the decreasing order of *λ*_{ch,i}. The original data **X**_{ch} are then projected $X\u0303ch,i=Xchvch,i$ to the new orthogonal basis of principal components (PCs) formed by *P* leading eigenvectors **v**_{ch,i}, where *i* = 1, …, *P* and *P* ≤ min (*L*, *M*) is the reduced number of dimensions. For the index notations of the above equations, see Ref. 18. The number of leading principal components (*P*) to project the input data on is usually chosen to retain a specific amount of variance in the input data reflected by the magnitude of the respective eigenvalues (e.g., see Fig. S4).

^{38,39}—that was successfully applied for the analysis of protein data.

^{40,41}It is a variant of multi-dimensional scaling methods

^{42}in which a dimensionality reduction is performed by minimizing the loss function between the distances (or other metrics) between data in original high- and target low-dimensional spaces. The key feature of cc_analysis is that it minimizes the differences between Pearson correlation coefficients

^{43}of pairs of high-dimensional datasets and the scalar product of the low-dimensional vectors representing them [see Eq. (1)] projecting the data into a unit sphere,

*r*

_{ij}is the correlation coefficient between configurations

*X*

_{i}and

*X*

_{j}in the high-dimensional space (the rows of the matrix

**X**

_{ch}), $Xi\u2208RL$,

*i*,

*j*= 1, …,

*M*;

**x**

_{i}·

**x**

_{j}denotes the dot product of the unit vectors

**x**

_{i}and

**x**

_{j}representing the data in the low-dimensional space $X\u0303ch$. Importantly, similar to PCA, for deciding on the number of required dimensions for low-dimensional projection of cc_analysis, one can use the gap in the eigenvalues of the $XchXchT\u2208RM\xd7M$ matrix (please note the difference to the

*C*

_{ch}of the PCA covariance matrix, which is calculated column wise but will have the same eigenvalues and eigenvectors up to normalization). The dependence of the results on the number of dimensions for Method I using both dimensionality reduction methods is discussed in the results sections.

As a result of dimensionality reduction, each data point in the latent projection corresponds to the configuration of one chain at a given time and temperature.

#### 2. Clustering

For Method I, after the dimensionality reduction using linear or nonlinear methods, we perform clustering on the new lower-dimensional projected space $X\u0303ch\u2208RM\xd7P$ for each polymer chain. In our previous paper,^{18} we used a density-based spatial clustering of applications with noise (DBSCAN),^{19} which groups together the data points that are close based on two hyperparameters: the Euclidean distance to create a neighborhood and the minimum number of points to form a dense region. The clustering performance depends on the careful choice of the hyperparameters and can produce any number of clusters. In this manuscript, instead of DBSCAN, we use agglomerative clustering.^{21} For this clustering method, the number of expected clusters is the main hyperparameter. Since our dataset consists of either glassy or liquid states, our natural choice for the number of clusters to find is 2. As other two parameters, we used Euclidean distance as the metric and Ward linkage.^{44} We use agglomerative clustering both on PCA and cc_analysis space to observe the change from liquid to glassy states. As the result of the clustering, each chain *ch* at each simulation frame and temperature will get a cluster index (ID) *n*_{i}, where *n*_{i} is an integer (i.e., *n*_{i} ∈ {0, 1, …, *n*_{cluster} − 1}, *n*_{cluster} = 2 for agglomerative clustering in this work) for *i* = 1, 2, …, *n*_{ch}*M*, *n*_{ch} is the number of chains and *M* is the number of frames. Subsequently, the matrix $X\u0303ch\u2208RM\xd7P$ will be transformed to a vector $X\u0303ch\u2208{ni|ni=0\u2009or\u20091}M$. In this work, *n*_{i} = 0 corresponds to the glassy state, whereas *n*_{i} = 1 corresponds to the liquid state.

#### 3. Method I

*ch*,

*ch*= 1, …,

*n*

_{ch}. We perform the dimensionality reduction on

**X**

_{ch}(linear or nonlinear) obtaining a new reduced projection space $X\u0303ch$. We cluster this new space assigning a cluster index

*n*

_{i}to each configuration of the chain

*ch*and obtaining a vector of cluster indices $X\u0303ch$. Then, we repeat dimensionality reduction with subsequent clustering on each chain present in the system for a total of 100 chains. This implies that each chain will be represented as a vector of 1100 cluster IDs (each 100 elements of this vector corresponds to the chain’s configuration at the same temperature). Then, we calculate average cluster indices $ID\u0304(T)$ as a mean over cluster IDs of all chains that were simulated at the same temperature. At each temperature

*T*, the average cluster index $ID\u0304(T)$ is given as

*P*(

*n*

_{i},

*T*) is the probability distribution of cluster IDs for all

*n*

_{ch}chains over all simulation frames at each

*T*.

*s*and

*d*are the fitting parameters and

*C*is the gap between the two states at

*T*≫

*T*

_{g}and

*T*≪

*T*

_{g}, respectively. The inflexion point [a point in which

*g*″(

*T*) = 0] defines

*T*

_{g}and is estimated as

*T*

_{g}=

*T*

_{inflextion}=

*d*/

*s*.

#### 4. Long time approximation with method I

With the aim to extrapolate glass transition temperature at the long observation time limits, we calculate the average cluster indices for different observation time windows Δ*t* with equal intervals. We choose *k*_{t} different observation time windows Δ*t* (*k*_{t} = 9 with Δ*t* from 2 to 10 ns in this work). For each Δ*t*, we have used Δ*t*/*t*_{lag} consecutive frames with *t*_{lag} = 100 ps. We fit the average cluster ID values using Eq. (3) and calculate the inflexion points at different Δ*t*. Then, we observe the change in *T*_{g}(Δ*t*) as the inverse of Δ*t*. At relatively larger time windows, it follows a linear behavior and an extrapolation of this linear behavior to 1/Δ*t* → 0, i.e., infinite observation time, allows us to estimate an asymptotic limit of *T*_{g} [*T*_{g}(Δ*t* → *∞*)] from a relatively short trajectory length.

#### 5. Method II

*λ*

_{ch,i}(

*T*), where

*ch*is a chain index,

*T*is the temperature at which we performed PCA, and

*i*is the eigenvalue index (they all sorted in descending order and

*i*= 1 correspond to the highest eigenvalue),

*i*= 1, …,

*L*. Additionally, we calculate the participation ratio (PR) over the set of eigenvalues defined at each temperature as

*λ*

_{ch,i}(

*T*) are eigenvalues of PCA sorted in descending order. PR can be viewed as the effective dimensionality of the data. It reflects the decay rate of eigenvalues: faster decay results in a smaller PR compared with a slower decay rate.

*ch*= 1, …,

*n*

_{ch}chains in the melt. In Ref. 18, we observed that

*T*

_{g}can also be determined from the change in monotonic behavior of the first eigenvalue of PCA or the participation ratio (PR) averaged over all chains as a function of temperature:

## III. RESULTS

### A. Method I: Analysis of combined temperatures

#### 1. Results with PCA

We start our analysis for each of the three systems (PMMA, PEA, and PnBA) by performing PCA on a randomly selected single chain of the homopolymer using the input descriptors (**sc-sc**, **bb-bb**, and **bb-sc**) over the 10 ns classical MD simulations concatenated for all temperatures. In Fig. 3, we show the projection of the data onto two leading principal components (PCs) for **sc-sc** descriptors. The projections in the new PCA space can be viewed as linear combinations of all input descriptors. Each point in the plot corresponds to the chain’s conformation at a given temperature at each time. The amount of variance explained by eigenvectors for PCA analysis is given in supplementary material, Fig. S4). Even in two-dimensional projection, we observe that for all three homopolymers, the low-temperature states are concentrated in a small region on the PCA projection (Fig. 3). We clustered the data using agglomerative clustering, as described in Sec. II B. In the inset of Fig. 3, we plot the same projection colored based on obtained cluster indices. The agglomerative clustering groups the low temperature (*T* < *T*_{g}) part into one cluster with cluster ID = 0 (maroon color) and the liquid state to cluster with ID = 1 (gray color).

To obtain a general estimate of the temperature at which the separation between the liquid and the glassy states (characterized by a change in cluster IDs) occurs, we perform PCA for each chain separately, followed by agglomerative clustering and averaging over obtained cluster IDs as a function of temperature as given in Fig. 4(a). To quantify the jump, we use Eq. (3) for fitting the data (detail in Sec. II B) and define the inflexion points as the glass transition temperatures. Figure S2 shows the results do not change after applying DBSCAN instead of agglomerative clustering to the same data as in Fig. 4(a). Comparing the obtained results for the three systems considered here we observe that *T*_{g} decreases with increasing the side chain length of the polymer, which is in good agreement with the previous analysis^{22} and shown in Fig. 3. The methods of predicting *T*_{g} based on the macroscopic properties such as density or volume are sensitive to the fitting protocols as the transition around *T*_{g} is not sharp. For example, when the same data were fitted with a different fitting range, the estimated *T*_{g} for PMMA was lower^{22} compared with the value we report in Fig. 4(c) or a similar model in Ref. 10. A different choice of the fitting range leads to around 50 K shift in *T*_{g} value for PnBA system, see Fig. S1, suggesting that *T*_{g} is highly affected by the bilinear fitting uncertainties.

In Fig. 4(b), we plot *T*_{g} values obtained for different input descriptors **bb-bb**, **sc-sc**, and **bb-sc** as a function of the number dimensions the data were projected on. The estimated *T*_{g} values averaged over different clustering results for each of the descriptors are given in Table I. We do not see notable differences in *T*_{g} values when we compare side chain and backbone contributions. Since in Method I the fluctuations of all the temperatures are scaled together, the dominant contribution in fluctuation is expected to come from the high-temperature states suggesting that both the side chain and backbone fluctuate more at high temperatures compared with their low-temperature states. However, the scenario changes completely when we treat each temperature separately in Method II. This will be discussed later in Sec. III B.

. | MMA . | EA . | NBA . |
---|---|---|---|

Specific volume fittinga | 478 ± 8,^{22} 526,^{10} 524 ± 6b | 416 ± 8^{22} | 334 ± 14^{22} |

PCAc (sc-sc) | 511 ± 4 | 447 ± 4 | 421 ± 6 |

PCA (bb-bb) | 513 ± 5 | 453 ± 4 | 425 ± 5 |

PCA (bb-sc) | 507 ± 4 | 450 ± 4 | 414 ± 5 |

cc_analysis (sc-sc) | 521 ± 2 | 457 ± 1 | 422 ± 3 |

cc_analysis (bb-bb) | 537 ± 1 | 465 ± 1 | 444 ± 2 |

cc_analysis (bb-sc) | 520 ± 2 | 457 ± 1 | 421 ± 3 |

Δt → ∞ | 506 ± 5 | 430 ± 5 | 402 ± 2 |

. | MMA . | EA . | NBA . |
---|---|---|---|

Specific volume fittinga | 478 ± 8,^{22} 526,^{10} 524 ± 6b | 416 ± 8^{22} | 334 ± 14^{22} |

PCAc (sc-sc) | 511 ± 4 | 447 ± 4 | 421 ± 6 |

PCA (bb-bb) | 513 ± 5 | 453 ± 4 | 425 ± 5 |

PCA (bb-sc) | 507 ± 4 | 450 ± 4 | 414 ± 5 |

cc_analysis (sc-sc) | 521 ± 2 | 457 ± 1 | 422 ± 3 |

cc_analysis (bb-bb) | 537 ± 1 | 465 ± 1 | 444 ± 2 |

cc_analysis (bb-sc) | 520 ± 2 | 457 ± 1 | 421 ± 3 |

Δt → ∞ | 506 ± 5 | 430 ± 5 | 402 ± 2 |

It is important to note here, as was already discussed in Ref. 22, the experimental *T*_{g} values for the considered polymers are lower than our prediction due to the differences in the cooling rates accessible to MD simulations and experiments (experiential rates are much lower). Several studies have attempted to link those cooling rates^{46} and proposed an adjustment to *T*_{g} values for acrylic polymers calculated from MD simulations.^{22} Our results agree well with the previously calculated *T*_{g} values (Table I) from simulations of acrylic polymers or coarse-grained bead-spring polymer model.^{18} Overall, our result also shows a good qualitative agreement with the experimental findings, where *T*_{g} values decrease with increasing side chain length.

One of the limitations of all-atom MD simulation is the short time that can be accessed due to the large number of atoms. This presents a challenge when calculating properties, such as glass transition temperatures, because the dynamics of the polymer melt are slow. Therefore, a long equilibration (hundreds of nanoseconds) is necessary. Here, the simulation data we have are only 10 ns for each temperature, which is considered short for equilibration of macroscopic properties, such as specific volume (Fig. 3). In Sec. II, we provide a method to extrapolate *T*_{g} to long time limits from a relatively short trajectory. We perform PCA followed by clustering at nine different observation time windows Δ*t* from 2 to 10 ns with 1 ns interval and interpolate the data using Eq. (3). The inflexion point, i.e., *T*_{g}, is now calculated for different lengths of time, Δ*t*, along the trajectory. Taking into account, this finite-time effect, we found a linear dependency of *T*_{g} and Δ*t* for the bead-spring entangled polymer model that allows estimating an asymptotic limit of *T*_{g} from a relatively short trajectory length. We extended this approach to acrylic polymers. In Fig. 5, we plot the inflection points at different Δ*t* values and find linear-type behavior at larger observation times. Although the fitting uncertainty in the all-atom system is relatively high compared with the CG model, one can still extrapolate the *T*_{g} values in the asymptotic limit within some error bars of the fitting. The obtained values are *T*_{g} ≈ 506 ± 5(*K*) for MMA, 430 ± 5(*K*) for EA, and 402 ± 2(*K*); however, one requires longer simulation runs for reliable linear fitting. We can observe the linear tendency to the lower *T*_{g} values with an increase in simulation time, nonetheless, obtained estimates are within the cooling step range.

#### 2. Results with cc_analysis

In this section, we use cc_analysis for dimensionality reduction instead of PCA to identify the glass transition. We apply it with the same input descriptors (**sc-sc**, **bb-bb**, and **bb-sc**). The projections are shown in Figs. 6(a)–6(c) for the three systems for **sc-sc** input descriptor. The state separation results are clearer than the PCA projections [Figs. 3(a)–3(c)]. To quantify the jump in cluster ID, we use the same procedure with Eq. (3) for fitting the data and observe that inflection points, i.e., the glass transition temperatures, decrease on increasing the side chain lengths of the polymer melt. The data are tabulated in Table I. cc_analysis results are less sensitive to the choice of dimensions [Fig. 6(e)] compared with PCA [Fig. 4(b)]. We observe that the backbone descriptor systematically overestimates the *T*_{g} value. Moreover, it performs equally well with only a few observations per temperature [Fig. 6(f)].

### B. Method II: Analysis of individual temperatures

In this section, we use Method II as described in Sec. II B where we perform PCA on single polymer chains at different temperatures independently and use averaged eigenvalues [Eq. (5)] and the participation ratio [Eq. (6)] as described in Sec. II B to determine *T*_{g} [see schematic Fig. 2(b)]. With this method, the information on single chain conformations from different temperatures is not accessible to the dimensionality reduction method, rather we investigate whether the data-driven protocol itself can distinguish different temperature inputs. Examples of resulting projections with **sc-sc** input descriptors for three of the systems are shown in Fig. S5 in supplementary material. We observe a change from the completely random distribution of points in the projection at high temperatures to a more “clustered” projection around the glass transition temperature. Similar behavior is also observed for the bead-spring polymer model near *T*_{g}.^{18} Below *T*_{g}, only small random fluctuations dominate the behavior of the chain, and as a result, the projections look scattered. Here, we would like to emphasize that due to the standardization of the data, those fluctuations at different temperatures have the same magnitude, resulting in visually similar projections below and above *T*_{g} in Fig. S5.

The magnitude of the PCA eigenvalues can be used to quantify the observed behavior. In general, for independently projected data, this magnitude does not have a uniform value, but in our case, all distances are standardized. As a result, we could average the first eigenvalue, $\lambda 1\u0304(T)$, across all projections [see Fig. 7(a)]. As more general criteria, we plot the participation ratio, $PR\u0304(T)$, for the three systems in Fig. 7(b). We observe a non-monotonic behavior for all the systems on lowering the temperature. We argue that the change in $PR\u0304(T)$ [or $\lambda 1\u0304(T)$] is connected with a change in nature of the fluctuations in the system: from local configurational rearrangements above *T*_{g} to only local rearrangements along the chain below *T*_{g}. As a result, more dimensions are required for the random motion description below *T*_{g}. With this analysis, we also confirm that the point at which the sudden change occurs (i.e., *T*_{g}) shifts toward lower values on increasing side chain length (Fig. 7).

Previously, we showed that the results obtained with Method I did not depend on the choice of descriptors, i.e., the atoms chosen for the analysis. However, the backbone diffuses slower than the side chains (Fig. S3); hence, different descriptors can play a role in this method where each temperature is treated independently. Therefore, we repeat the same analysis using Method II with the **bb-bb** input descriptors for all three systems. Remarkably, the projections do not show “clustering” around *T*_{g} (Fig. S6) as well as there are no clear changes in $\lambda 1\u0304(T)$ and $PR\u0304(T)$ (Fig. 8). Thus, our result suggests that the side chains play an important role in determining *T*_{g} compared with the backbone contribution. Such difference is more prominent for PnBA system that has the longest side chain. As mentioned in Sec. II B, eigenvalues of cc_analysis and PCA are the same (up to normalization); hence, we show results of Method II only with PCA.

In our combined temperature analysis, we do not see notable differences in *T*_{g} values when we compare side chain and backbone contribution. Since fluctuations of all the temperatures are scaled together in the combined temperature analysis, the dominant contribution is expected to come from the high-temperature states. On the contrary, here, each temperature is considered separately, resulting in the explicit role of the side chain and backbone contribution to *T*_{g}.

## IV. DISCUSSION

Both methods have their advantages and drawbacks that we would like to summarize shortly, but at the same time, they complement each other. In contrast to Method II, Method I is performed on concatenated data from all temperatures; as a result, there are no eigenvalues/eigenvectors associated with a specific temperature.

Method I requires following the same chain over all temperatures, meaning it would not allow us to compare the data of the same systems with independently generated configurations from different simulation runs (as typically done for the glass-forming liquids simulations), whereas Method II does not have this limitation as each chain is analyzed independently at each temperature. One more important consequence of such an independent application is that one can test the simulation results after each cooling step. After observing the non-monotonic behavior in $\lambda 1\u0304(T)$ or $PR\u0304(T)$, no further simulations at lower temperatures are required, in contrast to Method I or the conventional bilinear fitting. However, Method II requires relatively long NVT/NPT simulation runs at each temperature for a certain time and cannot be applied to continuous cooling simulations in contrast to Method I, which in combination with, e.g., cc_analysis, can be used having only one snapshot per temperature.

Note that we performed PCA on a single polymer chain, followed by taking an average over all chains in the system. Performing PCA on 100 chains combined, we only observe the same Gaussian-like distribution within fluctuations, resulting from different chains, which is essentially independent of the temperatures [Fig. S7(a)]. This result is similar to the observation that the radius of gyration (*R*_{g}) or end-to-end distance (*R*_{e}) distributions over all the chains are independent of temperature^{22} [Figs. S7(b) and S7(c)].

The value of *T*_{g} from Method II can be defined only within the cooling step range (e.g., between 400 and 450 K for PEA system), whereas Method I allows for more precise prediction as well as extrapolations to the long time limits. To check the influence of the cooling step range, we performed additional simulations with the smaller temperature gap (25 K instead of 50 K) for PnBA system and obtained the same *T*_{g} (Fig. S8).

Naturally, both methods are sensitive to the input descriptors (as shown in Fig. 8) as well as parameters [but in this case, we show that the results are robust for a big range of parameters, e.g., Fig. 4(b)]. Compared with the projections shown in Ref. 18, the separation between glassy and liquid states in two-dimensional projection (Fig. 3) is not as clear for the considered system. In the general case, it can be either due to the analysis routine (choice of descriptors, dimensionality of the projection, dimensionality reduction algorithms, and clustering parameters) or due to internal properties of the considered polymers. In the latter case, the choice of the clustering method, which takes the number of clusters as input parameters, might not be optimal and require a deeper understanding of the considered system. Nonetheless, for considered systems, both clustering algorithms with completely different input hyperparameters (agglomerative and DBSCAN) show a sharp change in average cluster indices around *T*_{g} (see Fig. S2).

Our analysis shows that the internal degrees of freedom in the polymer chains help to employ the intra-monomer distances within a single polymer chain as an input descriptor. In the future, we plan to check our analysis with different sets of input descriptors (including those that can account for intermolecular interactions explicitly) such as local bond orientation order parameters,^{47} softness,^{48} dihedral angles,^{10} local chemical environment descriptors,^{49,50} etc.

## V. CONCLUSIONS

In summary, we extend and validate our recently developed data-driven approach to determine the glass transition temperature from all-atom and coarse-grained molecular dynamics simulation data. Our approach utilizes high-resolution microscopic details available from simulations and considers conformational fluctuations of polymer melts over time at the level of individual chains. Here, we apply it to all-atom simulations of acrylic homopolymers of different side chain lengths. Our result qualitatively agrees well with those of other experimental studies where *T*_{g} values decrease with the growing length of the polymer side chains. By using different dimensionality reduction methods and clustering algorithms, we show the generality of our approach, which can be applied to a wide variety of systems ranging from coarse-grained polymer models to all-atom systems and, therefore, is robust. Finally, we provide a way to quantify the role of the side chain and backbone in determining the glass transition.

This method could be applied to other systems with “soft” glass transitions such as organic light-emitting diodes where precise prediction of *T*_{g} is debatable.

## SUPPLEMENTARY MATERIAL

The supplementary material contains plots for determining glass transition temperatures from the bilinear fits with different fitting ranges (Fig. S1); glass transition temperature determined using another clustering method (DBSCAN) (Fig. S2); comparison of the diffusion values for center of mass, backbone atoms, and side chain atoms of the polymer chains (Fig. S3); fitting parameters of g(T) used to calculate Tg shown in Fig. 4(a) (Table S1); explained variance ratio for PCA (Fig. S4); temperature dependent PCA projections of one selected chain with pairwise distances between all side chains (Fig. S5) and backbone (Fig. S6) atoms; PCA projections of all chains (Fig. S7); and glass transition temperatures obtained with our methods for the simulations with additional cooling steps (Fig. S8).

## ACKNOWLEDGMENTS

We acknowledge open-source packages MDTraj,^{51} Numpy,^{52} Matplotlib,^{53} and Scikit-learn^{54} used in this work. A.I. and K.K. are supported by the European Union Horizon 2020 APACHE (Active and Intelligent Packaging Materials and Display Cases as a Tool for Preventive Conservation of Cultural Heritage), under Horizon 2020 Research and Innovation Program Grant Agreement No. AMD-814496-10. A.B. thanks Rituparno Mandal for discussion. We acknowledge Sharon Volpe and Denis Andrienko for critical reading of our manuscript.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Atreyee Banerjee**: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Aysenur Iscen**: Data curation (lead); Formal analysis (supporting); Visualization (supporting); Writing – original draft (equal); Writing – review & editing (equal). **Kurt Kremer**: Conceptualization (supporting); Supervision (lead); Writing – review & editing (equal). **Oleksandra Kukharenko**: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

Raw simulation data are available at https://doi.org/10.17617/3.4JHOMW. Derived data supporting the findings of this study and all relevant calculations are available at https://gitlab.mpcdf.mpg.de/banerjeea/acrylic_polymers.

## REFERENCES

*Dynamics of Polymeric Liquids, Vol. 1: Fluid Mechanics*

*Dynamic Mechanical Analysis (DMA)*

^{+}I

^{−}polymer electrolytes: A combined simulation and experimental study

*Introduction to HPC with MPI for Data Science*

*n*-butyl acrylate) and gradient poly(

*n*-butyl acrylate-co-methyl methacrylate)