Recent developments in machine learning have enabled accurate predictions of the dynamics of slow structural relaxation in glass-forming systems. However, existing machine learning models for these tasks are mostly designed such that they learn a single dynamic quantity and relate it to the structural features of glassy liquids. In this study, we propose a graph neural network model, “BOnd TArgeting Network,” that learns relative motion between neighboring pairs of particles, in addition to the self-motion of particles. By relating the structural features to these two different dynamical variables, the model autonomously acquires the ability to discern how the self motion of particles undergoing slow relaxation is affected by different dynamical processes, strain fluctuations and particle rearrangements, and thus can predict with high precision how slow structural relaxation develops in space and time.
When a liquid is rapidly cooled, it freezes into a glass, which retains a random amorphous structure. This phenomenon, referred to as the glass transition, is ubiquitous, but its origin still remains to be clarified. The elucidation of this origin has been an important research topic for decades.1,2 Along with the drastic slowdown of atomic motion accompanying the glass transition, localized domains of particles that rearrange more preferentially than others grow up.3–6 Identifying the structural origin of this phenomenon has long been a central problem in the field. Structural analyses of local geometric orders7–9 and of real-space normal modes of vibrations derived from static structures10–14 have accumulated evidence for structure–dynamics correspondence. However, the random, featureless structures of glasses still prevent us from fully understanding the origin and predicting the dynamics based on the static structure.
Recently, machine learning methodologies have enabled the accurate determination of structure–dynamics correspondence in glassy systems.15–23 Among them, graph neural networks (GNNs),24 a type of deep learning algorithm that operates on a graph, have been found to be suitable.20 GNNs had already proven to be useful for making predictions of material properties by directly representing structures of molecules or atomic clusters as nodes and edges of the graph.25–28 Bapst et al.20 applied this ability of GNNs to the task of predicting particle mobility in a glass-forming liquid from its static structure and showed that GNNs outperform other machine learning methodologies. This high accuracy in the predictions of glassy dynamics has enabled the development of inverse modeling as a further application.29 It has also led to the development of a cost-effective linear-regression (LR) model that can overcome the high computational cost of GNNs but still achieves equivalent precision in its prediction.21,22
In the study by Bapst et al.,20 the prediction accuracy of GNNs was found to exceed not only that of other machine learning methods but also that of normal mode analysis at all temperatures accessible by simulations and across all temporal scales up to the structural relaxation time. Normal mode analysis is a method to predict the spatial distribution of short-time vibrations. Because short-time vibrations (soft modes) are intimately linked to the slow α relaxations,9,30–34 normal mode analysis yields a strong baseline of physics-based prediction of glassy dynamics from a static structure.10,35 Nevertheless, the prediction accuracy of GNNs becomes relatively low in the case of fast β dynamics compared to that of slow α relaxation.20,22 We may therefore conjecture that the dynamics is made difficult to predict owing to elastic vibrations, which are erased after coarse-graining the short-time dynamics. This decline in prediction accuracy can be rectified, as shown by a recent study,36 if we encode short-time vibrations obtained from the actual simulations. However, there are still no methodologies to make predictions solely from a static structure by bypassing the challenges caused by such transient fluctuations.
In this article, we propose a GNN model named “BOnd TArgeting Network (BOTAN)” that realizes highly precise predictions of the entire glassy relaxation from a static structure. BOTAN can learn a target characteristic quantity assigned on edges corresponding to pairs of close-by particles, in addition to the previous target quantity assigned on nodes. BOTAN must be positioned as a straightforward but strong extension of a GNN used in the previous study by Bapst et al.,20 which we term as “node-targeting GNN” (NT-GNN). In this study, such a feature of BOTAN is exploited to characterize and learn relative motion: It allows itself to characterize structural rearrangements being unaffected by elastic fluctuations as well as the local degree of local elastic strains. We show that this modeling allows for highly improved predictions of complex dynamics of slow structural relaxations of glassy systems.
II. MACHINE LEARNING MODEL
In this section, we explain the neural network architecture of BOTAN. Similar to NT-GNN, it is based on the interaction network37,38 consisting of an encoder–process–decoder architecture, where features of nodes and edges in a graph are mutually computed by exchanging messages between a pair of two-layer multilayer perceptrons (MLPs) assigned for nodes and edges. This architecture is suitable for extracting intricate relationships from simulation data computed with high precision. Because the network treats node and edge features equivalently, it is possible to regress the network toward the target quantities on edges as well as nodes39 by decoding edge embeddings using an MLP.
III. DATASET DETAILS
For each of these three target temperatures, 500 independent particle configurations are generated as follows: After keeping the temperature at T = 5.0 for the time lapse of th = 104, the system is cooled rapidly (in tcool = 20) to the target temperature. The temperature is subsequently kept constant by using the Nosé–Hoover thermostat until the system reaches a steady state up to the time scale of 40τα. The final particle configuration thus obtained is used as the initial configuration at t = 0 in the production run for generating the dataset.
From the 500 initial configurations, we introduce the isoconfigurational ensemble10,43 by running 32 separate microcanonical (NVE) simulations with each starting from the same configuration; however, the initial velocities are randomly given from the Maxwell–Boltzmann distribution at the target temperature. These simulations are conducted approximately up to 30τα, where the particle configurations are sampled at logarithmically increasing time points, as summarized in Table I. Among the 500 initial configurations, 400 configurations (12 800 runs) are used for training and the other 100 (3200 runs) are used for testing.
|Tagging .||1 .||2 .||3 .||4 .||5 .||6 .||7 .||8 .||9 .||10 .|
|T = 0.44||0.13||1.30||13.0||130||412||1300||4120||13 000||41 200||130 000|
|0.50||0.108||0.434||2.17||6.86||21.7||68.6||217||686||2 170||6 860|
|Tagging .||1 .||2 .||3 .||4 .||5 .||6 .||7 .||8 .||9 .||10 .|
|T = 0.44||0.13||1.30||13.0||130||412||1300||4120||13 000||41 200||130 000|
|0.50||0.108||0.434||2.17||6.86||21.7||68.6||217||686||2 170||6 860|
IV. PROCEDURES OF MACHINE LEARNING
A graph for training or prediction test is constructed based on the particle configuration wherein the particles are represented by their nodes. Edges are assigned to pairs of particles i and j within a distance threshold value rij(t = 0) ≤ re so that the neighborhood relation can be represented. We choose the threshold length re = 2.0σAA that has the best predictive performance regarding particle propensity.20 The constructed graphs are then fed into the model by encoding particle types (A or B) into the nodes and 3D relative positions between particles (rij = ri − rj) into the edges. Specifically, for input feature vectors and (=rij) , the first layers of the node and edge encoders [Eqs. (1) and (2)] are implemented by linear transformations and , respectively, where d is the dimension of the hidden layers (64 in our case). For the node feature, type A and B are converted into real numbers 0 and 1 beforehand, and both the input features on nodes and edges are treated as real number with 32-bit single-floating point precision.
The learning setups and hyperparameters in the present study are similar to those used by Bapst et al.20 The training dataset is learned to minimize the loss function, wherein the neural network is optimized using the Adam algorithm without norm regularization, with a learning rate of 10−4 and a standard implementation in PyTorch. The data loaded for training are augmented by applying random rotations of the simulation box to the particle positions, as all the target quantities for learning are scalar variables that are rotationally invariant.
In all these training scenarios, all the particles (all the particle pairs) are learned together, regardless of the particle types. No minibatching of graph inputs are applied in the training, except that the minibatch size of the graph input is fixed at 5 in the training of NT-GNN (p = 1). Although Bapst et al. applied no minibatching in their previous study,20 this difference does not significantly change the results presented in this article. The training with NT-GNN required ∼2 h when using one NVIDIA A100 Tensor Core GPU (40 GB SXM). A similar amount of time is required for training BOTAN, except that there may exist a substantial overhead arising from computation of interparticle distances, for which the actual elapsed time depends on the implementation.
In fact, many training epochs are required if we start the training of BOTAN with random weight parameters. The reason is that, as will be discussed in Sec. V A, there is a simple relation between the initial distance rij(t = 0) and its change after time when the interval time t is short. Such a relation defines a “metabasin” to which many weight parameters are largely optimized. Therefore, in this study, we start all the training of BOTAN (p < 1) with weight parameters that are adjusted by pretraining on edges at a low temperature T = 0.44. This pretraining is performed for 2000 epochs with the pair-distance changes, by keeping p = 0 and using trajectory data at a short time t = 13, which results in a sufficient convergence of the loss functions, as shown in Fig. 3. Such pretraining allows the main part of the training scenario to be efficient enough to finish within 1000 epochs.
A. Predicting pair-distance changes as an edge feature
The high prediction accuracy in the short-time region originates from a simple relation between the pair-distance change and initial distance rij(t = 0). Figure 6 shows a probability distribution function map as functions of these quantities for the closest neighbor pairs of type A particles in the training dataset at time points ranging from t = 13 to 13 000 at T = 0.44. In the short-time region (t ≲ 0.1τα = 412), a simple anticorrelation exists, wherein the pair-distance change remains one order of magnitude smaller than the particle radii for most of the neighbor pairs. This quantity, therefore, characterizes local strains induced by the elastic vibrations. The anticorrelation gradually disappears but persists up to t = τα as the time increases, making the relation between the two less trivial. The anticorrelation determines the metabasin in the landscape of the loss function in the vast weight-parameter space. The number of training epochs required to optimize BOTAN drastically increases as the anticorrelation gradually disappears for a longer time t, as the loss landscape becomes more complex to make the metabasin less “noticeable.” Therefore, we pretrained BOTAN over the pair-distance changes at T = 0.44 after t = 13, then we used the weight parameters for training with varying time. As the metabasins in the parameter space are likely to be located close to each other between the different time points, the usage of pretrained models would allow us to concentrate on optimizing the weight parameters around the metabasins and thus reduce the number of training epochs required for the main part of training.
B. Predicting particle propensity for motion as a node feature
Because BOTAN has a pair of decoders that respectively computes output features for nodes and edges, it is possible to train BOTAN simultaneously with target quantities for both, particle propensity and pair-distance changes . This is realized by setting the loss function as a weighted sum of -norm losses for and . The model is then trained with other hyperparameters remaining unchanged.
We next assess the prediction accuracy of BOTAN in comparison with NT-GNN. Because particles of types A and B are expected to exhibit different diffusivity over a long period, the prediction accuracy is evaluated in terms of the Pearson correlation coefficient using only the data for type A particles. Figure 8 shows how accurately BOTAN and the NT-GNN predict the particle propensity for three different temperatures, T = (a) 0.44, (b) 0.50, and (c) 0.64. The trend of time dependence for NT-GNN is in agreement with the previous results:20 In the shorter time before reaching the plateau region (t < 0.2), the Pearson correlation coefficient assumes high values, then falls below 0.5, and afterward gradually increases to reach its peak at around the α-relaxation time. Conversely, BOTAN outperforms NT-GNN in its predictive accuracy over the entire temporal range and for all temperatures under investigation.
To this end, we note that the pair-distance change in Eq. (11) has its own significance despite the extreme simplicity of the definition. Because the pair-distance changes characterize relative motion, they can capture structural rearrangements. This feature is distinct from standard quantities defined on the basis of particle displacements that is affected by nonlocal phononic fluctuations and thus enables the model to better distinguish the rearranging hotspot separating nonlocal effects. Similar ideas have also been implemented in the method of counting the replacement of neighboring pairs,10,49,50 also termed “bond breakage.”5,51,52 Furthermore, the pair-distance changes can characterize the extent to which each pair tends to change owing to the local shear deformation around rearranging cores; this aspect of structural changes has long been addressed using the concept of “shear transformation zones.”47,53–55 By learning how strains of non-rearranging regions are distributed and how rearranging hot spots are localized, BOTAN autonomously fixes the errors in the prediction of particle displacements and overall acquires unprecedented predictive ability regarding the glassy dynamics.
C. Comparison with prediction accuracy of other quantities by NT-GNN
In Fig. 8, Pearson correlation coefficients between predictions of NT-GNN and the actual values of neighbor-relative particle propensities, and , are shown; furthermore, in (a), this correlation coefficient is also plotted for the prediction made by BOTAN. The correlation coefficient is evaluated only for type A particles, as a proximity between the predicted and ground truth values. The prediction accuracy of NT-GNN improves over these quantities, especially in the time regions shorter than 0.1τα. This clearly implies that spatially extended static fluctuations cause decline in the prediction accuracy of by NT-GNN in the short time and that this decline in the prediction accuracy is recovered because, in these quantities, the effect of these spatially extended static fluctuations is removed to a certain extent.
In conclusion, we have introduced a GNN model that realizes faithful predictions of glassy dynamics over the entire temporal range, with accuracy greatly improved from previous models, including the GNN model by Bapst et al.20 Typical deep learning models, including both convolutional neural networks and GNNs, are designed so that they can characterize local features; thus, they are not extremely good at capturing nonlocal characteristics. The key to the present improvement is that the model learns relative motion between neighbor pairs to relate it to two-body structural correlation in addition to the self-motion of the particles. In this way, the GNN model bypasses direct learning of nonlocal motion itself and acquires the ability to autonomously “interpret” how particle motion is affected by different dynamical effects nonlocal strain fluctuations and local particle rearrangements. As a consequence, BOTAN reconciles the differences between fast β and slow α relaxation dynamics and achieves state-of-the-art ability in the task of predicting the glassy dynamics from the static structure.
In the present study, particle propensity for motion defined on self-displacement is chosen as the dynamical quantity of interest, in line with many previous studies on structure–dynamics correspondence in glasses.7,9,10,20,21 It was shown that the GNN model proposed by Bapst et al. (NT-GNN) can be reduced to a linear-regression (LR) model once properly defined local structural parameters are averaged over neighbor shells, wherein predictive abilities are essentially the same between the LR model and NT-GNN.21 Our results have, somewhat contrarily, shown that such local structural parameters are not the only descriptor for self-displacement because they are under the influence of strain fluctuations. Recently, new neural network models that successfully improve the predictive ability on glassy dynamics by introducing different ideas have emerged. For example, a recently proposed machine learning framework, GlassMLP, has raised the predictive ability by introducing the physics-informed descriptors’ input and bottleneck layers;58 a couple of recent studies also seem to have improved the predictive ability of GNNs by introducing rotational-equivariant structural descriptors59 or by additionally introducing self-attention layers into GNNs.60 As the reasons for the improved prediction are likely to be different from each other, it may be possible to develop a more advanced framework by combining these features, which is left as an open task in the future. Given that a much higher prediction ability is realized, such models may also become useful for finding the “reaction coordinates” of glassy dynamics, along which the fluctuations should be enhanced for efficient sampling of molecular trajectories. Extensions to the recently discussed machine learning-aided sampling technique may be interesting.61,62
We thank T. Kawasaki for critical input regarding this research. We also thank Y. Asahi, J. J. Molina, and F. Landes for helpful discussions and comments. H. Shiba and T. Shimokawabe were supported by JSPS KAKENHI Grant No. JP19H05662 and by “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” and “High Performance Computing Infrastructure” in Japan (Project No. jh220052). M. Hanai and T. Suzumura were supported by “Advanced Research Infrastructure for Materials and Nanotechnology” from MEXT, Japan. Training and testing are performed using NVIDIA A100 Tensor Core GPUs (40 GB SXM) on Wisteria/BDEC-01 Aquarius subsystem at Information Technology Center, University of Tokyo. For generation of the dataset for training and testing, usage of the following computational resources is acknowledged: Oakforest-PACS at the Joint Center for Advanced High Performance Computing (JCAHPC), Oakbridge-CX at Information Technology Center, University of Tokyo, Supercomputer System B at Institute for Solid State Physics, University of Tokyo, and MASAMUNE-IMR at Institute for Materials Research, Tohoku University.
Conflict of Interest
The authors have no conflicts to disclose.
Hayato Shiba: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Investigation (lead); Methodology (lead); Resources (equal); Software (lead); Validation (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). Masatoshi Hanai: Formal analysis (supporting); Methodology (supporting); Software (supporting); Writing – original draft (supporting). Toyotaro Suzumura: Software (supporting); Supervision (supporting); Writing – review & editing (supporting). Takashi Shimokawabe: Conceptualization (supporting); Funding acquisition (lead); Investigation (supporting); Project administration (lead); Resources (equal); Software (supporting); Writing – review & editing (supporting).
The source code used for training and evaluation is openly available in GitHub repository at https://github.com/h3-Open-BDEC/pyg_botan. This code is an implementation of BOTAN using PyTorch Geometric.63 The dataset of simulation trajectories for evaluations is also available via the hyperlink indicated in the same GitHub repository for three years after the publication of this paper.
APPENDIX A: EFFECT OF THE ITERATION NUMBER IN MESSAGE PASSING
In the model architecture of BOTAN, a pair of two-layer MLPs, characterizing the features of nodes and edges, are mutually updated. Each edge is updated based on the embedding of the nodes it is connecting, in addition to its own previous embedding. Each node is updated based on the sum of incoming edges, in addition to the embedding of the nodes. This is how message passing between the nodes is realized in the present GNNs. Therefore, the number of repeat cycles determines how far away structural information can be incorporated into a node (or into an edge). In Fig. 9, the dependence of the Pearson correlation coefficient on this iteration number (denoted as n in Sec. II) is shown for t = 130 and 4120 at T = 0.44, where the training is performed over 1000 epochs as well. The predictive ability increases as a function of the number of iterations n, but the increase is much slower when n < 4.
APPENDIX B: OPTIMIZATION DETAILS RELATED TO TRAINING ON EDGES
In the simultaneous training on nodes and edges [via both and ], a value of 0.4 is chosen for the hyperparameter p in Eq. (12) that determines the training weight between nodes and edges. This choice of the parameter is justified by an ablation experiment, in which BOTAN is trained for various values of the hyperparameter p without using the pretrained model. In Fig. 10, we show Pearson correlation coefficients ρ computed using predicted values for and , after being trained for 3000 epochs at t = 130 and T = 0.44. The result suggests that 0.2 ≤ p ≤ 0.6 may be a requirement that yields an optimal condition for the simultaneous training.