Many molecular systems and physical phenomena are controlled by local fluctuations and microscopic dynamical rearrangements of the constitutive interacting units that are often difficult to detect. This is the case, for example, of phase transitions, phase equilibria, nucleation events, and defect propagation, to mention a few. A detailed comprehension of local atomic environments and of their dynamic rearrangements is essential to understand such phenomena and also to draw structure–property relationships useful to unveil how to control complex molecular systems. Considerable progress in the development of advanced structural descriptors [e.g., Smooth Overlap of Atomic Position (SOAP), etc.] has certainly enhanced the representation of atomic-scale simulations data. However, despite such efforts, local dynamic environment rearrangements still remain difficult to elucidate. Here, exploiting the structurally rich description of atomic environments of SOAP and building on the concept of time-dependent local variations, we developed a SOAP-based descriptor, *Time*SOAP (*τ*SOAP), which essentially tracks time variations in local SOAP environments surrounding each molecule (i.e., each SOAP center) along ensemble trajectories. We demonstrate how analysis of the time-series *τ*SOAP data and of their time derivatives allows us to detect dynamic domains and track instantaneous changes of local atomic arrangements (i.e., local fluctuations) in a variety of molecular systems. The approach is simple and general, and we expect that it will help shed light on a variety of complex dynamical phenomena.

## I. INTRODUCTION

Structure–property relationships, at the heart of modern materials science, are hard to elucidate in complex molecular systems. Multi-scale and many-body interactions among all the atoms make it challenging, yet inspiring, to reconstruct the macroscopic behavior of such systems from their underlying atomic structure.^{1,2} Ranging from materials with an intrinsically dynamic character, such as soft supramolecular architectures, to common crystal lattices, a thorough knowledge of atomic arrangements, including their structural and dynamic evolution, is required to unlock tangled material responses and features.^{3–7} In crystalline solids, for instance, the materials’ plasticity and viscosity and their microstructural evolution are dictated by the energy and kinetics of defects^{8} or structural imperfections.^{9,10} Furthermore, atomically disordered domains, such as surfaces, grain boundaries, and heterogeneous interfaces, have been widely recognized to be linked to transport, mechanical, electronic, and optical properties.^{11–14} Shedding light on the intimate connection between complex atomic arrangements and material dynamic properties would clearly pave the way for novel design rules and optimization of molecular systems for tailored behaviors.^{15,16} However, in most practical cases, this remains far from being trivial.

In recent years, advances in data availability and computational power have enabled the development of valuable tools for gaining a deeper understanding of the chemical–physical phenomena occurring in materials.^{17,18} In particular, molecular dynamics (MD) simulations have been playing an increasingly significant role in the exploration of materials, serving as a large source of potential information.^{19–27} The use of MD simulations to elucidate structure–property relationships substantially embeds a two-step protocol: (i) the translation of MD trajectories into a numerical representation of atomic neighborhood environments, resulting in highly detailed and high-dimensional data, known as *fingerprints* or *descriptors*, and (ii) the extrapolation of meaningful information from the large volumes of generated datasets. Regarding the latter step, Machine Learning (ML) algorithms have often exhibited promising advantages in regard to handling large and complex sets of data, thereby receiving increased interest.^{28–33} However, a low-dimensional representation facilitating the navigation and identification of hidden patterns and features would be desirable.

Within this framework, methods for adequately characterizing complex atomic arrangements from MD simulations have been remarkably expanded. Over the last decades, many *descriptors* relying on the use of order parameters or mathematical quantities have been proposed.^{1} Low-dimensional descriptors based on the use of order parameters often allow gaining very accurate information, though being dependent on *a priori* knowledge about system features. However, methods relying on the use of structural environments (i.e., order parameters), such as coordination analysis, bond order analysis,^{34} bond angle analysis (BAA),^{35} common neighbor analysis, (CNA)^{36} adaptive CNA (A-CNA),^{37} and Voronoi analysis, generally struggle to identify different local coordination environments when the geometric symmetry is lost or exhibit a short-range nature (e.g., in crystalline systems close to the melting temperature). On the other hand, coupling more mathematically sophisticated descriptors to ML approaches enables effective characterization of systems by exploiting the rich and high-dimensional datasets provided by MD simulations^{38–40} apart from being less dependent on *a priori* knowledge. Nonetheless, advanced mathematical descriptors, such as the Behler–Parrinello symmetry functions (BP),^{41} Chebyshev polynomial representations (CPR),^{42} adaptive generalizable neighborhood informed (AGNI) features,^{43,44} smooth overlap of atomic position (SOAP),^{45} and atomic cluster expansion (ACE),^{46} generally based on the representation of atomic environments, can efficiently capture local structural properties but are less efficient in providing information on microscopic dynamic events occurring in the studied systems.

Among such abstract mathematical descriptors, SOAP was recently found very efficient for characterizing of a wide range of systems,^{47–50} including soft, disordered, and complex assemblies.^{51–54} Despite being strongly connected to the structural features of local environments, the SOAP fingerprint coupled with unsupervised clustering approaches and statistical analyses has been recently used to also reconstruct the dynamics of complex systems, such as metal surfaces,^{55} metal nanoparticles,^{56} soft supramolecular polymers,^{51,57} self-assembled micelles,^{52} and complex hierarchical superlattices, to cite a few.^{54,58} Since SOAP descriptors are typically high-dimensional, both linear and nonlinear dimensionality reduction (DR) approaches are often employed for facilitating both analyses and data visualization.^{59–62} However, DR represents the fundamental roadblock because it inherently leads to a loss of information, resulting in challenging characterization of systems where ordered and disordered domains coexist in dynamic exchange and equilibrium. In addition, beyond a few valuable techniques,^{63,64} the time evolution of structural changes, including rare fluctuations, still remains weakly explored by simply classifying datasets with unsupervised and sophisticated ML tools.

Time-dependent descriptors offer a different approach. For example, a recently developed descriptor—Local Environments and Neighbors Shuffling (LENS)^{65}—monitors how much the microscopic surrounding of each molecular unit changes over time in terms of neighbor individuals/identities along an MD trajectory. LENS allows the identification of dynamic domains and detection of local fluctuations in a variety of systems tracking events of addition/subtraction of neighbors within a certain cutoff over time. However, LENS does not contain structural information on, e.g., the relative position or arrangements of neighbors inside the cutoff sphere. In this way, it does not capture, e.g., local structural rearrangement, adjustment, or rattling. A time-dependent descriptor capable of retaining rich structural information and of efficiently monitoring structural changes over time would be desirable.

Building on such a concept, here we report a time-dependent descriptor, *Time*SOAP (*τ*SOAP), that essentially exploits the structurally rich description of molecular/atomic environments guaranteed by the SOAP vectors and measures to what extent the SOAP power spectra of each unit within a complex molecular system change over time. An ML-based analysis of the time-series *τ*SOAP data allows us to robustly and efficiently detect, e.g., structural transitions, phase transitions, and the coexistence of phases in a variety of systems with rich and diverse intrinsic dynamics. It is worth noting that the time derivative of *τ*SOAP also provides sharp signals identifying local fluctuations, highlighting local and rare events that may be overlooked with other approaches. The paper is organized as follows: In Sec. II (Methods), we present our *τ*SOAP and *τ*SOAP-based descriptors and the coupled ML-based workflow. In Sec. III, we discuss the results obtained by performing our *τ*SOAP analysis on various systems characterized by solid/liquid coexisting phases, solid-like and fluid-like behaviors, respectively. Our tests indicate that *τ*SOAP analyses are flexible and robust and can shed light on complex molecular/atomic systems with nontrivial multilayered dynamics, providing insights that are difficult to attain with other approaches.

## II. METHODS

### A. SOAP as a descriptor of atomic environments

Recently, data-driven approaches capturing the structural complexity of materials from equilibrium MD trajectories have been proposed. A generic MD trajectory is represented by an ordered list of N atomic coordinates **R**(t) in the 3D space at each simulation time step, where N is the number of particles in the system. In order to characterize complex atomic arrangements, descriptors of atomic neighborhood environments have been widely employed. By associating a *feature* vector to each **R**(t), the descriptors enable passing from the 3D *coordinate* space to an S-dimensional *feature* space. Importantly, these representations are required (1) to be permutationally, translationally, and rotationally invariant—in order for physically equivalent configurations to be recognized as such—and (2) to smoothly vary with small changes in atomic positions. Among many developed descriptors, we adopt the Smooth Overlap of Atomic Position (SOAP) to examine our sample of materials ranging from crystalline to soft and liquid states. SOAP is a state-of-art, high-dimensional representation of atomic environments, and it has recently provided valuable insights into both material properties and structural features.^{49,57,66,67}

*j*, located at a distance

**r**=

**r**

_{ij}from the

*i*-th center, a Gaussian function is associated.

*σ*is the distribution width of each Gaussian. The environment related to each center

*i*incorporates information up to a fixed cutoff,

*rcut*, where the function

*f*

_{rcut}smoothly goes to 0. Then, by expanding Eq. (1) in the basis of orthonormal radial functions

*R*

_{n}(

*r*) and spherical harmonics

*Y*

_{l,m}($r\u0302$), the corresponding SOAP power spectrum is calculated. For the

*i*-th center, it can be expressed as

*i*-th center. The parameters

*n*and

*n*′ range from 1 to

*nmax*, while the index

*l*runs from 1 to

*lmax*. From the values of

*nmax*and

*lmax*, it is possible to derive the dimension S of the full SOAP feature vector, which can be written as

*i*-th center, which includes all the contributions from Eq. (2). Here, we used the in-house code SOAPify

^{68}to compute the SOAP vectors, with

*nmax*,

*lmax*= 8, and different

*rcut*values depending on the characteristics of the considered system (see the supplementary material, Table S1). From the 3D coordinate vector corresponding to each MD simulation time, we calculate the SOAP vector

**p**

_{i}for a selected set {

*i*} of centers (referred to as SOAP centers). In summary, we obtain a dataset containing S-dimensional SOAP vectors describing the structural arrangements related to the {

*i*} selected sites at each sampled configuration. Since these SOAP vectors encode information about the atomic environments surrounding each center, SOAP is referred to as a “local” descriptor.

*K*

^{SOAP}(

*i*,

*j*) goes from 0 for no overlapping to 1 for completely superimposed vectors. Furthermore, from Eq. (4), a metric referred to as “SOAP distance” between two environments can be defined as follows:

Importantly, **p**_{i} and **p**_{j} describe the local environments related to two *different* SOAP *centers*. Besides the SOAP kernel, this distance representation provides a bounded similarity measure between two local environments, indicating how their local densities match in the S-dimensional feature space.

### B. Tracking dynamical SOAP variations with *Time*SOAP

The output dataset containing the S-dimensional SOAP vectors is typically high-dimensional, and although rich in information on the atomic/molecular arrangements, it requires a crucial preprocessing step to both facilitate the interpretation of the results and effectively identify relevant molecular patterns. For this reason, after estimating **p**_{i} [Eq. (3)] for the whole set of SOAP centers {*i*} at each sampled configuration of the MD trajectory, a SOAP-based pattern recognition procedure typically relies on two successive key phases: (1) use of dimensionality reduction (DR) of SOAP spectra by means of, for instance, Principal Component Analysis^{69,70} (PCA), and (2) employment of unsupervised clustering techniques for the identification of molecular motifs. Despite providing some information on a wide range of molecular systems, this approach presents a few key shortcomings: (i) Since the time information is not emphasized, insights on consequential transition events as well as the temporal persistence of the individual molecular configurations is not retained, thus hindering a detailed comprehension of the rate of change of every individual molecular configuration; (ii) on such low-dimensional SOAP-based dataset, some poorly populated configurations may remain undetected by (e.g., density-based) unsupervised clustering approaches; and (iii) low-dimensional embedding of atom-density representations can fail in faithfully preserving valuable information, such that a high number of principal components would be desirable.^{71} This makes detecting local fluctuations and rare events typically awkward with such approaches.

*d*

^{SOAP}(

*i*,

*j*) introduced above, we present a new SOAP-based fingerprint, named “

*Time*SOAP (

*τ*SOAP),” which quantifies the local environment variation, over time, of each individual SOAP center

*i*. Indicating by

*λ*

_{i}the variable form of

*τ*SOAP, its instantaneous value is defined as

In contrast to Eq. (5), here both $pit$ and $pit+\Delta t$ describe the local environments related to the *same unit* (i.e., the *i*-th SOAP center) but at *different simulation times*, *t* and *t* + Δ*t*, respectively. Thus, $\lambda it+\Delta t$ measures how similar the *i*-th SOAP vector calculated at time *t* is to that calculated at the next sampled time step (*t* + Δ*t*). We analyze consecutive frames, that is, adjacent points, where Δ*t* represents the MD sampling time step (different for the various systems, see Molecular Dynamics Simulations for more details). As a result, *τ*SOAP evaluates how the *i*-th local environment changes, in terms of SOAP descriptor, at every consecutive time interval Δ*t*. We thus obtain *λ*_{i}(*t*), a *τ*SOAP signal over time for each individual in the selected set {*i*}, thereby allowing to track the evolution of each SOAP constituent unit center along the trajectory.

*τ*SOAP signal. Using the NumPy

^{72}Python package, we have

By computing it along the MD trajectory, we get $\lambda \u0307i(t)$. What $\lambda \u0307i$ represents is the *rate* of local environment changes over time for the *i*-th SOAP center. This allows us to highlight the relevant dynamic phenomena occurring along the trajectory, notably discriminating between local environments characterized by a constant variation and those exhibiting an increasing/decreasing variation.

To increase the signal-to-noise ratio (S/N), both *λ*_{i}(*t*) and $\lambda \u0307i(t)$ are preprocessed by employing the Savitzky–Golay^{73} filter from the SciPy^{74} python package, thus obtaining smoothed signals. A common polynomial order parameter of p = 2 is used for each signal *λ*_{i}(*t*), while different time windows are chosen depending on the analyzed system, in order to reach a compromise between an acceptable S/N value and a sufficiently smaller window compared to the length of signal (see the supplementary material, Fig. S1, for details). After having chosen the time window for *λ*_{i}(*t*), to adequately smooth its first derivative $\lambda \u0307i(t)$, we keep the same time window and use two applications of the filter (following a general rule: for the n-th derivative, use at least n + 1 applications of the filter).

### C. Dynamic domains detection

After increasing S/N, an ML-based analysis is performed on *λ*_{i}(*t*) data to detect relevant dynamics domains in each system. As a clustering method, we opted to use the KMeans algorithm from the Scipy python package,^{75,76} since it was demonstrated to be robust and capable of providing a good trade-off between clustering quality and computational cost.^{65} Nonetheless, it is worth noting that the analysis approach is versatile, and other clustering methods could be used. KMeans requires the number K of clusters to be created in the process to be predetermined. Here, with the aim of guaranteeing a wide variety of micro-cluster dynamics regardless of the analyzed system, we start anyway from *K* = 10 clusters. On the basis of the exchange probability matrix and the dendrogram associated with cluster interconnections, we then hierarchically merged the K clusters *a posteriori*. The exchange probability matrix contains, indeed, the percentage probability of a unit *i* belonging to a given cluster to persist in that cluster or to jump to another cluster in the sampling time step Δ*t*; from this, by means of an “average” linkage method, we built the associated dendrogram connecting the dynamic domains that have a high probability of exchanging elements. Ultimately, to establish the cutoff point of the dendrogram, we used the Elbow Curve Method as an indicative guideline for selecting the optimal number of clusters *k* (see the supplementary material, Fig. S2). However, for completeness, all the steps leading from the starting *K* = 10 clusters to the final *k* clusters are reported in the supplementary material, Figs. S3 and S4.

On the other hand, the domain recognition on $\lambda \u0307i(t)$ data has been performed via a different approach. On obtaining the $\lambda \u0307i$ distribution and the associated Kernel Density Estimate (KDE) for each system, we divide the KDE in deciles and consider the first and the tenth deciles to detect units significantly falling far from the mean local environment variation rate. The first decile and the tenth decile capture units with rapidly decreasing and increasing, respectively, local environment changes. This provides a clear distinction between domains moving toward more dynamic and those moving toward less dynamic configurations.

### D. Molecular dynamics (MD) simulations

We test our *τ*SOAP analysis on MD trajectories obtained for different systems with nontrivial dynamics: a water–ice interface system in correspondence of the transition temperature, a gold nanoparticle at 200 K, a copper surface at 700 K, and DPPC lipid bilayer where liquid and gel phases coexist at 293 K of temperature.

The atomistic ice/liquid water phase coexistence at the solid/liquid transition temperature is obtained by employing the direct coexistence technique^{77,78} using the GROMACS software.^{79} In order to model both the ice and the liquid water phase, the TIP4P/ICE water model^{80} is used. The direct coexistence technique is based on the idea of placing in contact two or more phases (in this case, the phase of ice *I*_{h} and the liquid water phase) in the same simulation box and at constant pressure. Since the energy is constant at *T* = 268 K, while the system melts at *T* = 269 K,^{81} we set the temperature at *T* = 268 K and keep it constant by means of the v-rescale thermostat with a relaxation time of *t* = 0.2 ps.

To get the initial configuration of ice *I*_{h}, the *Genice* tool proposed in the work of Matsumoto *et al.*^{82} is used, which generates a hydrogen-disordered lattice with zero net polarization satisfying the Bernal–Fowler rules. The solid lattice is equilibrated by performing a 10 ns-long anisotropic *NPT* simulation at ambient pressure (1 atm). The c-rescale barostat^{83} is used with a time constant of *t* = 20 ps and compressibility of 9.1 × 10^{−6} bar^{−1}. On the other hand, the liquid phase is obtained from the same initial configuration of ice *I*_{h} but performing a *NVT* simulation at *T* = 400 K in order to quickly melt the ice slab. Then, a 10 ns-long simulation at *T* = 268 K is performed to equilibrate the liquid phase, using the c-rescale barostat in semi-isotropic conditions and at a compressibility of 4.5 × 10^{−5} bar^{−1}. Since the initial ice slab is composed of 1024 water molecules, both the solid and liquid phases have the same number of molecules and box dimensions. The two phases are put in contact and, then, the system is equilibrated for *t* = 10 ns employing the c-rescale pressure coupling at ambient pressure with the water compressibility (4.5 × 10^{−5} bar^{−1}). The production *NPT* is carried out in semi-isotropic conditions, applying the pressure only in the direction perpendicular to the ice/water interface, thus reproducing the strictly correct ensemble for the liquid–solid equilibrium simulation by the direct coexistence technique.^{84} Finally, a 100 ns-long production run is performed, with a sampling time interval of Δ*t* = 0.1 ns.

The second case study analyzed in this work is an icosahedral gold nanoparticle (Au-NP) composed of 309 atoms. The parameterization of the model is performed according to the Gupta potential.^{85} The Au-NP system is simulated for *t* = 2 *µ*s at *T* = 200 K sampling every Δ*t* = 0.1 ns using the LAMMPS software.^{86} The details are described in Ref. 56.

The third system, the atomistic model of copper FCC surface Cu(210), is composed of 2304 Cu atoms and simulated at *T* = 700 K. A Neural Network potential built by means of the DeepMD platform^{87} is employed to perform Deep-potential MD simulations of the Cu(210) surface with the LAMMPS software,^{86} as reported in Ref. 55. The MD trajectory is conducted for 150 ns using a sampling time interval of Δ*t* = 0.3 ns.

Finally, the last case study is a DPPC lipid bilayer composed of 1152 lipids simulated at *T* = 293 K. As detailed in Ref. 53, DPPC lipids are simulated and parameterized in explicit water by using the Martini2.2^{88} Coarse-Grained (CG) force field. The CG-MD simulation is performed for *t* = 1 *µ*s and sampled every 0.1 ns with the GROMACS software.^{79} However, in our analysis, we use the last 500 ns of MD trajectory.

## III. RESULTS AND DISCUSSION

Herein, we use the descriptor *τ*SOAP to elucidate the dynamics of atomic/molecular structural environments, which are often key determinants of global material performances. In order to show the whole picture of dynamic information that can be extracted from *τ*SOAP signals, we analyze MD trajectories of different systems exhibiting various structures and nontrivial behaviors, thus indicating the transferability of such an approach to a wide range of materials. In particular, we first focus on ice/liquid water coexistence at the solid/liquid transition temperature, where structural and dynamic properties continuously alternate from solid-like to liquid-like character.^{89} We also carry out our analysis on systems revealing solid-like dynamics, such as metal nanoparticles and surfaces well below the melting point. Ultimately, a fluid-like soft system is included by testing our approach on a lipid bilayer below the gel-to-liquid transition temperature.

### A. Into the dynamics of ice/liquid water phase coexistence via *τ*SOAP signal

We start testing *τ*SOAP on a system where crystalline ice and liquid water coexist at the solid/liquid transition temperature, while exhibiting a dynamic equilibrium between solid-like and liquid-like regimes. We analyze a simulation box, in periodic boundary conditions, having 1024 hexagonal ice (*Ih*) molecules in contact with 1024 liquid water molecules [see Fig. 1(a)] at *T* = 268 K. We consider 1001 consecutive frames sampled every Δ*t* = 0.1 ns along 100 ns of an MD trajectory. As a first step, we compute the SOAP vectors for the oxygen atoms of each water molecule (2048 TIP4P/ICE water molecules) along all frames of the trajectory (see Sec. II for details). Before illustrating the *τ*SOAP analysis, we start by briefly discussing the results obtained via, e.g., a SOAP + PCA pattern recognition procedure—widely used for studying molecular systems—on such ice/liquid water system, here presented as a first case study. Figure 1(b) shows the results of this analysis, which detects, from the SOAP-based dataset, two main clusters, corresponding to the ice and liquid water domains (in green and gray). It is worth noting that DR (via PCA) to a three-dimensional subspace already allows to capture >90% of the cumulative variance of the SOAP-based dataset in this case (see Fig. S5a). A systematic analysis on the effect of increasing the dimensionality provided essentially the same results, demonstrating how two main SOAP domains are detected (ice and liquid water) independently on the number of PCs used and of the identified clusters (see also Fig. S5).

After computing SOAP vectors, *τ*SOAP signals are estimated by capturing the variations of local SOAP environments in Δ*t* = 0.1 ns [see Eq. (6)]. Figure 1(c) reports, on the left, the resulting *λ*_{i}(*t*) time profiles related to each of the 2048 oxygen atoms, while, on the right, it shows the ice/liquid water MD snapshots at *t* = 59 ns, *t* = 63 ns, and *t* = 65 ns. Notably, three distinct *λ*_{i}(*t*) profiles are highlighted in Fig. 1(c), left: (i) the black signal oscillating around *λ*_{i} = 0.2; (ii) the cyan signal lying in the highest *λ*_{i} region; and (iii) the crimson signal, which rapidly passes, at *t* ∼ 60 ns, from low to high *λ*_{i} values. The oxygen atoms related to the latter three *λ*_{i}(*t*) profiles are instead depicted on the MD snapshots in Fig. 1(c), right, with the respective color code, i.e., black, cyan, and crimson. The visualization of these selected atoms clearly shows that the black and cyan oxygen units belong to the ice and liquid water phase, respectively, regardless of the displayed time steps. On the other hand, the identified crimson oxygen represents an atom involved in the ice/liquid water transition occurring at *t* ∼ 60 ns. By lightening the behavior of such atoms, we attempt to emphasize the potential meaning provided by *τ*SOAP descriptor on the single unit dynamics: Following the time variation of atomic structural environments, *τ*SOAP allows both to distinguish atoms belonging to different phase states and to capture those undergoing phase transitions.

In order to systematically detect the complete scenario of distinct dynamic behaviors in our water system, an ML-based analysis is carried out on all *τ*SOAP signals. The results of the clustering investigation, performed via the KMeans algorithm, are shown in Fig. 1(d). The final four identified clusters (gray, crimson blue, and cyan) are displayed both on the time series of the *λ*_{i}(*t*) data [Fig. 1(d): left] and on the *λ*_{i}(*t*) data distribution reported with the correlated KDE [Fig. 1(d): right]. As already pointed out, the four different dynamic domains identify those water molecules undergoing specific transitions, i.e., instantaneously changing their local structural environments. In particular, the gray domain is dominated by oxygen units that are characterized by low *λ*_{i} values along the complete trajectory, i.e., by a weak variability of their local atomic environments. On the other hand, oxygen atoms showcasing high changes of their structural atomic distributions, and hence high values of *λ*_{i}, belong to the blue cluster or cyan domain. Oxygen units that, instead, tend to reveal medium values of *λ*_{i}—because of their transition from one *λ*_{i} regime to the other one—are classified into a distinct crimson cluster. It is interesting to note how, differently from the SOAP + PCA pattern recognition procedure shown in Fig. 1(b), an analysis of the time-series *τ*SOAP data reveals this third dynamically different environment—i.e., the ice/liquid water interface, which gets lost in SOAP + PCA-based analyses due to its reduced (negligible) statistical weight (see the supplementary material, Fig. S5 for SOAP + PCA-based analyses with increased number of clusters). Ultimately, the cyan domain is detected as a different cluster of units with higher local environmental changes. The graphical representation of such clusters is shown in Fig. 1(f) through an MD snapshot (see the supplementary material, Movie S1). Not surprisingly, the gray cluster, characterized by the lowest *τ*SOAP signal, corresponds to the ice phase; the blue domain, characterized by 0.4 $<\lambda i\u2264$ 0.54, is mainly correlated with the liquid phase; and the crimson one, including oxygen atoms with 0.2 $<\lambda i<$ 0.4, is instead located at the solid/liquid interface. Finally, the cyan cluster (*λ*_{i} > 0.54), although sited in the same region of the liquid phase (blue cluster), is identified as presenting a different dynamic behavior. In the considered ice/liquid water system, the exchange probabilities among the final four clusters are displayed in the matrix in Fig. 1(e): Although the oxygen atoms exhibit a probability higher than $\u223c94%$ to persist in a given cluster in the sampling time step Δ*t* (probabilities on the matrix diagonal), no negligible transient events occur between red–blue and cyan–blue clusters, demonstrating that a percentage of oxygen population is involved in instantaneous transitions among dynamic domains (out of diagonal probabilities).

After detecting the main dynamics clusters based on *τ*SOAP signals, *λ*_{i}(*t*), we carry out a further domain recognition analysis based on $\lambda \u0307i(t)$, that is, the instantaneous *rate* of local environment variations *λ*_{i}(*t*). The key information that can be gathered from the time derivative of *λ*_{i}(*t*) is pointed out in Fig. 2(a), where an explicative example is reported. Here, both *λ*_{i}(*t*) and $\lambda \u0307i(t)$ time profiles are associated with the same oxygen atom *i*: In gray, *λ*_{i}(*t*) shows the atom undergoing phase transition at *t* ∼ 60 ns when the time signal significantly and rapidly passes from the low to the high *λ*_{i} value region; in green, the first time derivative of the gray profile exhibits a peak corresponding to phase transition while fluctuating around zero in both the initial and final stages of the trajectory. Clearly, $\lambda \u0307i(t)$ tracks a notable signal leading up to a substantial dynamic change in the system. The first time derivative, indeed, offers a neat discrimination between small oscillations of *λ*_{i}(*t*)—which are intrinsic to the constituent units, independently from the proper dynamic domain—and large fluctuations driving significant changes in the atomic structure. Notably, $\lambda \u0307i(t)$ also provides a detailed comprehension of the *directionality* of the local environment variations, i.e., on the evolution of the material structures. While the presence of a peak, i.e., of a large fluctuation in $\lambda \u0307i(t)$ profile, suggests that a relevant event is occurring in that time interval, the sign of such fluctuations points out the evolution of a structural environment: A positive sign indicates that the atom is undergoing a local reconfiguration toward a more dynamic domain; a negative sign means that a local environment reconfiguration toward a more static domain is occurring.

Figure 2(b) shows, on the left, the time profiles of $\lambda \u0307i(t)$ related to each of the 2048 oxygen atoms and, on the right, the KDE of the $\lambda \u0307i$ data distribution. We color in blue and orange the domains corresponding to the first and the tenth decile, respectively, while we merge all the other deciles in a single white cluster. It is worth noting that the KDE distribution has a peak approximately corresponding to $\lambda \u0307i=0$, indicating that the local environment variations, *λ*_{i}(*t*), are, on average, constant. On the other hand, atoms that significantly increase or decrease, frame by frame, their local environmental changes are captured by positive (in the orange region) or negative peaks (in the blue region), respectively. In Fig. 2(c), we visualize these three different domains (blue, orange, and white) on a few snapshots along the MD trajectory, thereby showing that the positive and negative peaks allow characterizing melting and freezing phenomena occurring within small solid-like and liquid-like regions. In the first snapshot of Fig. 2(c), we represent a small portion of oxygen solid-like atoms (in orange) located at the ice/liquid water interface and exhibiting positive *τ*SOAP (*λ*_{i}) variations (i.e., undergoing rearrangements toward more dynamic configurations). Accordingly, in the second snapshot, those rings appear as broken, thus proving a melting-type process. In the third snapshot of Fig. 2(c), we report, instead, an example of oxygen units presenting negative *τ*SOAP (*λ*_{i}) variations (blue cluster), thus evolving toward more static configurations at the solid–liquid interface. Indeed, as shown in the fourth snapshot, an ordered ring structure forms, thus reproducing a typical freezing phenomenon. Ultimately, Fig. 2(d) shows a further detail potentially revealed by our analysis. In particular, water molecules exhibiting a high positive rate of change of their local SOAP environment (high $\lambda \u0307i$) turned out to be also associated with ice molecules that, at the interface with liquid water, undergo transitions out of the typical hexagonal packing, i.e., forming interface ice defects [Fig. 2(d)].^{90} In summary, besides capturing the local atomic rearrangements and characterizing their evolution, $\lambda \u0307i(t)$ seems to be useful for defect detection purposes also.

The previous results suggest how the use of *τ*SOAP descriptor and its first time derivative is a possible strategy to unveil microscopic phenomena occurring at the ice/water interface in dynamic equilibrium. In particular, by reliably detecting local fluctuations along with rearrangements and their evolution, the time variations of structural atomic environments show considerable potential for tracking crystallization or melting processes from MD trajectories.^{95} In order to outline the main features of *τ*SOAP and the differences with respect to other analysis approaches often used to study the dynamics, we also compared *τ*SOAP with a time-lagged Independent Component Analysis (tICA),^{91,92} a DR approach used to process high-dimensional input data by retaining valuable temporal information (see the supplementary material, Fig. S6). Concerning the study case of ice–liquid water transition, we projected the high-dimensional SOAP space on its highest-autocorrelation linear tICA subspace. The results in Figure S6 demonstrate how tICA essentially finds two main environments, corresponding to the ice and water domains. However, similar to a classical SOAP + PCA analysis [see Fig. 1(b)], such SOAP + tICA DR approaches do not recognize the ice/water interface as a separate environment, nor does it capture the local individual transitions as done by *τ*SOAP (Figs. 1 and 2). This shows how such standard pattern recognition approaches (e.g., PCA or tICA coupled with clustering analyses) can effectively detect dynamic domains with dominant statistical weight, while sparse and local fluctuations get typically lost due to their negligible statistical occurrence. In this sense, *τ*SOAP has the advantage to preserve any change in local structural environments, from the slowest to the fastest visited along the studied trajectories, thereby avoiding specific screening of structural variations.

### B. Application to discrete solid-like dynamics

As completely different test cases, we test our approach on systems revealing solid-like dynamics. We discuss the results of our analysis applied on MD trajectories of (i) a 309-atom icosahedral gold nanoparticle, denoted as Au-NP, at 200 K [Fig. 3(a)], and (ii) a copper Cu(210) FCC surface at 700 K [Fig. 4(a)].

Regarding case (i), we analyze 20 000 consecutive frames of a 2-*µ*s long MD trajectory sampled every Δ*t* = 0.1 ns at *T* = 200 K. It is well known that metal nanoparticles may exhibit nontrivial dynamics at room and at even sufficiently lower temperatures. Despite the reduced atomic motion and, accordingly, the more stabilized ideal icosahedron architecture, some local fluctuations and atomic rearrangements can be observed in the Au-NP even at *T* = 200 K. *τ*SOAP signals in Fig. 3(b), indeed, present a sudden increase after $\u223c0.1\mu $s, demonstrating that some atoms are experiencing intense instantaneous local environment variations. Our cluster analysis on *λ*_{i}(*t*) recognizes five main dynamic domains whose transition probabilities are reported in Fig. 3(c). This transition matrix proves a negligible attitude of the gold atoms to transfer to diverse dynamic domains, while they prefer to remain in their own cluster with probabilities higher than 98.4%. The MD snapshots in Fig. 3(d) show that these clusters identify distinct dynamical behaviors and structural domains (see also the supplementary material, Movie S2). First, the cluster analysis is able to accurately distinguish the inner core of the Au-NP (in gray): a more static region characterized—not surprisingly—by low *λ*_{i}(*t*) values along the whole simulation, from an interface region (in crimson) between the inner core and the outermost layer. Second, such a clustering approach sharply separates the surface of the Au-NP into two coexisting regions (pink and blue) related to different characteristic *λ*_{i}(*t*). While the pink face turned out to be more static, the blue domain reliably detects the portion of the surface where a fracture formation may occur, breaking down the symmetry (Fig. 3(d), second MD snapshots on the right). Interestingly, *τ*SOAP also identifies some local events such as the formation of concave ”rosettes” (a vertex, having five neighbors in an ideal icosahedron, penetrates inside the NP surface, thus passing to six neighbors). In Fig. 3(d) (third snapshot on the right), two rosettes can be observed as belonging to a more dynamic cluster—significantly varying their local environments—(in blue), while the associated vertices are identified as more stable (in crimson).

Furthermore, the estimation of $\lambda \u0307i$ [Fig. 3(e)] provides interesting details on the dynamic evolution of the system. A rather large percentage of Au atoms is characterized by constant variations of their surrounding environment (*λ*_{i}(*t*) = const, and $\lambda \u0307i\u223c0$). Rare and sharp fluctuations are anyhow clear in the *λ*_{i}(*t*) profile. To qualitatively illustrate some of these $\lambda \u0307$ peaks, five MD snapshots presenting different predominant domains are shown Fig. 3(f). In the first and second snapshots, the domain characterized by positive $\lambda \u0307i$ (in orange) prevails, suggesting that the atoms belonging to that cluster are collectively involved in a significant increase in the instantaneous local environment variations. Indeed, this predicts the symmetry breaking shown in the second snapshot. However, the prevalence of $\lambda \u0307i\u223c0$ represented by the white domain, along with a balance between positive (orange) and negative (blue) peaks, establishes a dynamic equilibrium leading to no relevant events along several trajectory frames (one example is presented in the third snapshot). In the last two snapshots, instead, a significant collective decrease of the instantaneous local variations (negative $\lambda \u0307i$) emerges (prevalence of blue domain in the fourth snapshot), thus predicting the evolution of the associated atoms toward more static environments (shown in the final snapshot). Ultimately, the information on the directionality of local rearrangements is also highlighted in Fig. 3(g): While positive $\lambda \u0307i$ values (in orange) mark a vertex evolving toward a less stable configuration where a missing atom appears, negative $\lambda \u0307i$ (in blue) predict rearrangement of the structure toward a stable rosette-like configuration.

For case (ii), we use 502 consecutive frames of 150 ns-long MD simulation of a Cu(210) surface composed of 2304 Cu atoms [Fig. 4(a)] sampled every Δ*t* = 0.3 ns at *T* = 700 K. Although metals tend to be traditionally considered as hard matter, it is now well known that their constituent surface atoms may exhibit nontrivial dynamics, undergoing rearrangements well below the melting temperature.^{55,93} Our clustering procedure applied on *τ*SOAP profiles identifies three main domains related to Cu atoms exhibiting very competing behaviors [Fig. 4(b)]: one dense and more static cluster in gray along with two less populated but more dynamic domains in red and cyan. The exchange probability matrix in Fig. 4(c) points out that the transient events among diverse domains mainly engage Cu atoms belonging to the red and cyan clusters. Figure 4(d) graphically represents the identified clusters at two explicative time steps, *t* = 85.8 ns and *t* = 92.7 ns: Not surprisingly, the gray domain corresponds to the crystalline bulk of the Cu(210) surface, reasonably detected by our analysis as the most static with small local environment variations (low *λ*_{i}(*t*) values); on the other hand, the surface atoms are identified as more dynamic clusters, thereby including all *λ*_{i}(*t*) > 0.07. However, two subsurface regions are recognized by KMeans: in crimson, a domain characterized by 0.07 < *λ*_{i}(*t*) ≤ 0.16, and in cyan, a cluster with the highest local environment variations. The two MD snapshots in Fig. 4(d) show significant correspondence between that more static surface region (crimson) and more stable surface atomic arrangements with increased coordination (see also the supplementary material, Movie S3).

The Cu(210) domain characterization based on $\lambda \u0307i$ confirms the effectiveness of this analysis in providing some key information on the time evolution of the material structure. Figure 4(e) highlights, also in this case, that the average rate of the local environment variations is null, i.e., most of the Cu atoms in Cu(210) show a steady-state behavior of *λ*_{i}(*t*). In addition, the cluster representation in Fig. 4(f) suggests that the domain with $\lambda \u0307i\u223c0$ essentially corresponds to the ice crystalline bulk (in white). On the other hand, most of the surface atoms are highly dynamic; consequently, a balance between domains with positive (orange) or negative (blue) $\lambda \u0307i$ is established over time. Consistent with the test cases discussed above, this dynamic balance indicates that no substantial reconfiguration toward more stable/dynamic arrangements is occurring. Nevertheless, some interesting cluster details are worth noting in Fig. 4(f): The snapshot on the top, corresponding to *t* = 25.8 ns, exhibits a portion of the surface with a clear predominance of atoms evolving toward more static configurations (in blue); the zoom onto that portion clarifies its stability in the two successive sampled times (*t* = 26.1 and *t* = 26.4). On the contrary, when the same surface region is characterized, after some frames, by a prevalence of positive $\lambda \u0307i$ (in orange in the snapshot on the bottom at *t* = 29.1 ns), the associated atoms experience an evident rearrangement as highlighted by the green atoms onto the zoom. Again on this system, $\lambda \u0307i$ was revealed to be useful in predicting the evolution toward more static/more dynamic configurations.

### C. Phase coexistence in soft dynamical systems

As a final test case, we apply our *τ*SOAP-based dynamic domain recognition on a soft system characterized by a two-phase coexistence: gel and liquid. Specifically, we analyze the last 500 ns of 1 *μ*s-long CG-MD simulation of a DPPC lipid bilayer composed of 1152 self-assembled DPPC lipids [see Fig. 5(a)] at *T* = 293 K, thus considering the last 5001 consecutive frames (Δ*t* = 0.1 ns). Although the gel-to-liquid transition temperature of a DPPC membrane is at $\u223c315$ K, here we investigate the dynamics of the lipid bilayer at a slightly lower temperature, thereby avoiding addressing the critical dynamics issues occurring at the transition temperature.

Our clustering analysis, displayed in Fig. 5(b) on *λ*_{i}(*t*) profiles and on the related KDE, identifies two main dynamic domains: one colored in crimson including *λ*_{i} < 2.8 and the other one in cyan, containing the highest values of *τ*SOAP fingerprints. Figure 5(c) reports, instead, the interconversion matrix between the two clusters. Beyond a small probability (3.6%) to transient from cyan to red cluster, the lipids manifest relatively high stability to preserve, along the complete trajectory, a specific local environment variation (*λ*_{i}), typical of each individual dynamics cluster. The graphical representation of the lipid bilayer in Fig. 5(d) suggests a close link between the dynamic domains and the phase states: The crimson cluster characterized by small *λ*_{i} is indeed associated with a more static, gel, phase, while the cyan domain, with higher local environment variations, is connected to a more dynamic—liquid—phase. Although the ability of SOAP to distinguish environments characterized by diverse structural features is known,^{53,57} this case demonstrates how *τ*SOAP is able to clearly detect the nucleation and emergence of distinct dynamic domains in intrinsically disordered systems, in a very agile and efficient way. This offers an additional proof of the versatility and robustness of this descriptor.

Further analysis on $\lambda \u0307i(t)$, reported in Fig. 5(e), reveals a predominance of $\lambda \u0307i\u223c0$ (white cluster) along with a balance between positive (orange) and negative (blue) peaks over time. We recall indeed that a null time derivative of *λ*_{i}(*t*) represents the behavior of those units exhibiting a constant variation of their local environment, with some statistical oscillations classified in the orange and blue domains. Within such a resulting scenario, the proposed analysis predicts gel–liquid phase coexistence in dynamic equilibrium, as shown in the MD snapshots in Fig. 5(f). In other words, the lipids are not evolving toward a more static/dynamic configuration, whereas each remains in its proper dynamic domain.

## IV. CONCLUSIONS

Investigating the dynamics of individual units in many atomic/molecular systems is essential to understand the behavior of complex molecular systems, and their physical and chemical properties, collective transitions as well as to design next-generation materials and molecular systems with desirable dynamical behaviors.^{94} However, because of the complexity of local structural environments along with their dynamics in such systems, a general approach is still lacking. Although faithful representations of atomic neighborhood environments—such as the SOAP descriptor—are available and widely employed, here we want to draw attention to the time evolution of these structures, which is typically overlooked in molecular motif recognition procedures.^{95}

In this work, we propose an alternative perspective allowing us to track the dynamical changes in atomic structural environments of the interacting subunits, thus enhancing the detection of dynamic domains and emerging phenomena. Building upon the SOAP descriptor, we implement *τ*SOAP, a new fingerprint that quantifies the variations of local SOAP environments surrounding each constituent unit along its MD trajectory. *τ*SOAP, indeed, retains the time information from high-dimensional SOAP vectors, thereby aiming at emphasizing the importance of consequential events for reconstructing dynamics and detecting rare fluctuations. Coupled to an ML-based analysis, we demonstrate the potentiality of such an approach to identify domains with different structural and dynamical behaviors. Ranging from an ice/liquid water system where solid-like and fluid-like domains coexist in dynamic equilibrium, to solid-like materials, and soft matter presenting gel and liquid coexisting phases, we prove that our analysis reliably addresses phase transitions, rare dynamic events, and coexisting phases. Moreover, by estimating the first time derivative of *τ*SOAP signal, we gain further information on the direction of the local structural changes. Indeed, besides detecting local rearrangements, the first time derivative of *τ*SOAP enables the characterization of their evolution toward either strongly or weakly dynamic environments. Finally, we can envisage that descriptors like *τ*SOAP, and its first time derivative, may be also useful in enhanced sampling methods, providing degrees of freedom along which structural variations or transitions within the system can be accelerated.

Nonetheless, *τ*SOAP-based investigation presents a few limitations. Although *τ*SOAP signal tracks the evolution of each constituent unit along the whole MD trajectory, thus providing time history data, the coupled ML-based approach relies on the instantaneous values of local environment changes, without performing time-series clustering for identifying dynamic domains. Importantly, time-series clustering and classification based on the frequency/duration of local environment variations could have a striking advantage in regard to discriminating fluctuations leading up to significant structural changes in the system. Notably, by including in our ML-based framework the first time derivative of *τ*SOAP, we start providing further insights on predicting the evolution of local changes and specifically how selected environments reconstruct or evolve in time. In summary, our approach turned out to be robust and versatile to capture fluctuating environments from SOAP spectra in a variety of systems by means of a completely agnostic and data-driven analysis.

## SUPPLEMENTARY MATERIAL

The supplementary material contains details about the length of MD simulation trajectories and SOAP vector parameters; the Elbow Curve Method profiles for the identification of the optimal number of final clusters; KMeans clustering analysis on the τSOAP data starting from K = 10 clusters with their relative transition probabilities and the associated dendrograms; SOAP + PCA-based analyses related to TIP4P/ICE ice/liquid water system; SOAP + tICA-based analyses related to TIP4P/ICE ice/liquid water system. This material is provided as a PDF file. Complete details of all molecular models used for the simulations, and of the simulation parameters (e.g., input files), as well as the complete TimeSOAP analysis code, are also available at https://doi.org/10.5281/zenodo.7962820.

## ACKNOWLEDGMENTS

G.M.P. acknowledges the support received by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 818776—DYNAPOL) and by the Swiss National Science Foundation (SNSF Grant No. IZLIZ2_183336).

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

G.M.P. conceived this research and supervised the work. C.C. developed the descriptor and implemented the analysis. C.C., M.C. D.R., and A.C. performed the simulations and the analyses. All authors analyzed and discussed the results. C.C., A.C., and G.M.P. wrote the manuscript.

**Cristina Caruso**: Conceptualization (equal); Data curation (lead); Formal analysis (equal); Investigation (equal); Methodology (equal); Validation (equal); Visualization (lead); Writing – original draft (lead). **Annalisa Cardellini**: Conceptualization (equal); Investigation (equal); Methodology (equal); Supervision (equal); Writing – original draft (equal). **Martina Crippa**: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Visualization (equal); Writing – original draft (supporting). **Daniele Rapetti**: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal). **Giovanni M. Pavan**: Conceptualization (lead); Funding acquisition (lead); Investigation (equal); Resources (lead); Supervision (lead); Writing – review & editing (equal).

## DATA AVAILABILITY

Complete details of all molecular models used for the simulations, and of the simulation parameters (e.g., input files), as well as the complete *Time*SOAP analysis code, are available at https://doi.org/10.5281/zenodo.7962820. Further details on the analyses are provided in the supplementary material.

## REFERENCES

*Introduction to Lattice Dynamics*

*ab initio*calculations for supramolecular complexes: Symmetry-adapted perturbation theory with many-body dispersion