Recent advances in electron, scanning probe, optical, and chemical imaging and spectroscopy yield bespoke data sets containing the information of structure and functionality of complex systems. In many cases, the resulting data sets are underpinned by low-dimensional simple representations encoding the factors of variability within the data. The representation learning methods seek to discover these factors of variability, ideally further connecting them with relevant physical mechanisms. However, generally, the task of identifying the latent variables corresponding to actual physical mechanisms is extremely complex. Here, we present an empirical study of an approach based on conditioning the data on the known (continuous) physical parameters and systematically compare it with the previously introduced approach based on the invariant variational autoencoders. The conditional variational autoencoder (cVAE) approach does not rely on the existence of the invariant transforms and hence allows for much greater flexibility and applicability. Interestingly, cVAE allows for limited extrapolation outside of the original domain of the conditional variable. However, this extrapolation is limited compared to the cases when true physical mechanisms are known, and the physical factor of variability can be disentangled in full. We further show that introducing the known conditioning results in the simplification of the latent distribution if the conditioning vector is correlated with the factor of variability in the data, thus allowing us to separate relevant physical factors. We initially demonstrate this approach using 1D and 2D examples on a synthetic data set and then extend it to the analysis of experimental data on ferroelectric domain dynamics visualized via piezoresponse force microscopy.
INTRODUCTION
The tremendous success of physical sciences over the last two hundred years has been largely predicated on the search for and discovery of physical mechanisms, meaning simple laws and factors that can explain observations. The paradigmatic example of this, as eloquently summarized by Wigner in his oft-cited opinion,1 is the discovery of Newton's laws. Similarly, numerous studies of celestial objects since ancient Egypt and Sumer times lead to the constant improvement of the model describing planetary motion and result in the well-known Keplerian model; further studies including Newton's law of universal gravitation and Einstein's general theory of relativity explain the planets’ elliptical motion and the irregular motion of Mercury, respectively. Such patterns are followed in most scientific fields still today, with the experimental observations used to derive correlative relationships that, in turn, underpin the emergence of physical models. These are often linked to symbolic regression, where simplicity and elegance of the mathematical law are considered as a strong indicator that the correct physical model has been identified. Overall, the greatest advantage of known symbolic or computational models (e.g., the lattice Hamiltonian in condensed matter physics2,3 or force fields in molecular dynamics4) is their capability to extrapolate outside of the original measurement domain, predicting the effect of parameter changes and generally allowing for interventional and counterfactual studies.5 For example, Newton's laws allow predicting the trajectories of man-made objects, whereas modern calculation methods allow exploring properties and functionalities of not-yet realized molecules and materials.6,7
The rapid development of deep learning8,9 methods over the last decade has provided a powerful new tool for physical research capable of building correlative relationships between multidimensional objects. While early applications have relied on purely correlative models, the developments over the last several years include the introduction of physical constraints and symmetries in the neural networks, making the interpolations consistent with prescribed physical models.10,11 Similarly, the advancement in symbolic regression methods has allowed for the discovery of physical laws from observational data, first implemented in the framework of genetic algorithms12 and subsequently extended toward deep learning symbolic regression methods,10,13 physics-enhanced neural networks,14–16 Koopman operator based methods,17–19 and Bayesian methods.20 Much of this effort relied on the presence of robust physical descriptors, such as planetary coordinates in astronomy or atomic nuclei in electron microscopy studies.
However, in many cases, accessible to observation are complex data sets representing static or dynamic fields, as exemplified by video data, atomic evolution movies in electron microscopy, and dynamic materials studies with scanning tunneling microscopy (STM) and scanning probe microscopy. In these cases, the presence of simple underlying physical mechanisms can also be postulated. For example, the contrast in STM is determined by the underlying atomic structure and associated spatial distribution of electronic densities, where the relationship between the two is defined by quantum mechanics. Similarly, the observed distribution of electromechanical activity on the surfaces of ferroelectric and ionic materials visualized by piezoresponse force Micromcopy (PFM) is determined by local variations in materias functionalities. Effects of the image formation mechanisms are often non-negligible, sometimes masking or even inverting the measured parameters,21 so also must be incorporated. Correspondingly, machine learning methods capable of physical discoveries from such data are of interest,22–24 including interpolating within and (à la Wigner) extrapolating outside of the original measurement domain.
Especially as experimental data sets continue to grow from a manageable handful to thousands of frames25 and hundreds of millions of pixels or voxels,26 recent advances in the generative statistical models such as simple and variational autoencoders (VAEs)27–29 offer a pathway for addressing these problems. The general premise of the autoencoder is that the observational data set can be encoded via a small (compared to the input space dimensionality of data) number of latent variables, where the relationship between the latent vector and the data object is defined by the encoder and decoder network. The multitude of available studies have illustrated that VAEs allow for the disentanglement of the latent representation, generally referring to the behavior where the variability along the selected latent variable corresponds to easily identifiable trends in data.22,23,30–36 Naturally, this poses the challenge as to whether latent variables can be identified with specific physical mechanisms, or predefined or controlled.37 Finally, of particular interest is whether generative models such as VAEs can be used to extrapolate outside the original distribution.
Here, we explore the introduction of known physical mechanisms by conditional variational autoencoders (cVAE), using the conditioning based on known (continuous) descriptors. We use the known or hypothesized physical factors as the condition and explore the unknown factors that can be reflected in the resultant latent distributions. As such, the latent distributions can show the discovery. For example, in the study of ferroelectric domain walls dynamics using piezoresponse force microscopy, it is easy to quantify the distance of domain wall motion; in contrast, it is challenging to quantify some factors, such as the domain wall shape and the effects of surrounding elements (e.g., other interacting walls, defects, strain conditions) on wall motions. In this case, the distance of domain wall motion can serve as a known physical factor, and then, the other factors can be encoded in latent variables and shown in the latent distribution. If these unknown factors are well disentangled in latent variables, we expect the latent distribution to be simple (e.g., shown well-classified distribution); otherwise, a complex latent distribution potentially implicates complex unknown factors. We further analyze the correlation between latent variables and ground truth properties. We further explore the potential of the conditional VAE approach to extrapolate outside of the original range of conditioning parameters. This approach is illustrated for model systems with known factors of variability and further extended to experimental PFM data of ferroelectric domain dynamics.
RESULTS
cVAE model
Application of cVAE in 1D model data
VAE analysis of the 1D synthetic peak data. (a) An example of a 1D peak with labeled ground truth parameters, where μ is the peak shift, σ is the peak width (full-width-half-max), and the peak maxima is the amplitude (A). (b) 25 randomly sampled examples out of 3000 in the synthetic peak data. (c) Simple VAE of analysis of the peak data shown as latent space colored by ground truth parameters, where it is observed that all peak parameters are encoded into latent variables. (d) Shift-invariant VAE analysis of the peak data shown as latent space colored by ground truth parameters, where there is no correlation between peak shift μ and latent variables because the peak shift is encoded into the shift variable as shown in Fig. S2(d) in the supplementary material.
VAE analysis of the 1D synthetic peak data. (a) An example of a 1D peak with labeled ground truth parameters, where μ is the peak shift, σ is the peak width (full-width-half-max), and the peak maxima is the amplitude (A). (b) 25 randomly sampled examples out of 3000 in the synthetic peak data. (c) Simple VAE of analysis of the peak data shown as latent space colored by ground truth parameters, where it is observed that all peak parameters are encoded into latent variables. (d) Shift-invariant VAE analysis of the peak data shown as latent space colored by ground truth parameters, where there is no correlation between peak shift μ and latent variables because the peak shift is encoded into the shift variable as shown in Fig. S2(d) in the supplementary material.
The simple VAE analysis of this data set is shown in Fig. 1(c). Here, the latent distributions of the data are plotted with the color overlay corresponding to the ground truth labels. The latter are not available to the algorithm, and hence allow identification of the latent variables in terms of the data set parameters. Of course, the data have three factors of variability, and the latent space is two dimensional; thus, we do not expect the full separation of the factors of variability. Still, examining the results clearly illustrates that the z1 variable is largely associated with peak shift , while the variability in the z2 direction represents the joint effect of the amplitude A and width .
A similar analysis using shift-invariant VAE encodes the data in terms of separating the shift into a shift variable and the rest of information into standard latent variables. Examination of the data in Fig. 1(d) illustrates that now variability associated with the peak shift μ has disappeared, whereas variability in latent space represents the collective effect of amplitude and width. These findings are equivalent to our previous work and illustrate the capability of physically defined invariances to disentangle them from real data.42,43 However, these invariances are encoded in the coordinate transform in the invariant VAE (iVAE) framework, as described in depth in earlier works.23,32,33,44–47 A systematic discussion of these iVAE can be found in our previous work.24
Next, we apply the cVAE approach to the same data set. In this case, the VAE receives the data set of the shape N × D (here N is the number of total 1D peak data and D is the dimension of a single 1D peak data) and a conditional vector of the shape N × 1 describing a known continuous parameter as an input. It is important to note that, unlike in iVAE models, there is no coordinate transform as a part of the model architecture, and the conditioning vector can represent any known salient feature or features. However, the use of the same physical parameters as for the toy model allows for physical comparisons.
The cVAE analysis conditioned on the peak shift is shown in Fig. 2(a). Examination of the ground truth labels illustrates that z2 is still associated with peak amplitude A and width σ, but not peak shift . In this manner, the cVAE and shift-invariant VAE each leads to comparable outcomes. We then further explore conditioning on a pair of variables, namely, and σ. In this case, the latent manifold is still 1D, but now z2 is clearly associated with the peak amplitude A. This is a demonstrable improvement over the shift-invariant VAE analysis, which did not allow for the separation of the two ground truth factors. Finally, the conditioning on all three variables results in the complete collapse of the representation, and the data manifold is now zero dimensional.
cVAE analysis of the 1D synthetic peak data. (a) Learned latent manifold of cVAE conditioned on the (known) peak shift μ. In this case, there is no correlation between peak shift and latent variables, whereas both peak width σ and peak amplitude A are correlated with latent variable z2. (b) Learned latent manifold of cVAE conditioned on the peak shift μ and peak width σ. There is still no correlation between peak shift, peak width, and latent variables, while peak amplitude A is now correlated with latent variable z2. (c) Learned latent manifold of cVAE conditioned on the peak shift μ, peak width σ, and peak amplitude A, where the latent manifold is completely collapsed because all peak parameters are added as conditions. (d) The correlation between z2 and ground truth peak amplitude A with different cVAE conditionings. It shows that conditioning simplifies analysis; that is, especially in the second plot when conditioning on peak shift μ and width σ (two of the three data set variabilities), cVAE successfully encodes peak amplitude A (the third variability) into z2, evidenced by the linear correlation between z2 and ground truth peak amplitude A. Noteworthily, increasing the latent dimension of simple VAE does not allow us to encode peak amplitude into latent variables, as shown in Figs. S3 and S4 in the supplementary material. Furthermore, when the peak amplitude A is added as a condition, this correlation disappears, as shown in the third plot. More examples assessing latent variables vs ground truths of cVAE are shown in Figs. S5–S7 in the supplementary material.
cVAE analysis of the 1D synthetic peak data. (a) Learned latent manifold of cVAE conditioned on the (known) peak shift μ. In this case, there is no correlation between peak shift and latent variables, whereas both peak width σ and peak amplitude A are correlated with latent variable z2. (b) Learned latent manifold of cVAE conditioned on the peak shift μ and peak width σ. There is still no correlation between peak shift, peak width, and latent variables, while peak amplitude A is now correlated with latent variable z2. (c) Learned latent manifold of cVAE conditioned on the peak shift μ, peak width σ, and peak amplitude A, where the latent manifold is completely collapsed because all peak parameters are added as conditions. (d) The correlation between z2 and ground truth peak amplitude A with different cVAE conditionings. It shows that conditioning simplifies analysis; that is, especially in the second plot when conditioning on peak shift μ and width σ (two of the three data set variabilities), cVAE successfully encodes peak amplitude A (the third variability) into z2, evidenced by the linear correlation between z2 and ground truth peak amplitude A. Noteworthily, increasing the latent dimension of simple VAE does not allow us to encode peak amplitude into latent variables, as shown in Figs. S3 and S4 in the supplementary material. Furthermore, when the peak amplitude A is added as a condition, this correlation disappears, as shown in the third plot. More examples assessing latent variables vs ground truths of cVAE are shown in Figs. S5–S7 in the supplementary material.
This simple 1D example demonstrates that conditioning on known factors of variability allows us to simplify the representations of the data and partially control the physical meaning of the remaining latent variables. As shown in Fig. 2(d), when conditioning on peak shift and width σ (two of three variabilities of the data set), cVAE clearly encoded peak amplitude A (third variability of the data set) into z2. The associated changes in the dimensionality of the latent manifold thereby allow the number of intrinsic factors of variability in a data set to be explored.
To continue the discussion, we also note that the trained cVAE model can be used to synthesize data with preserved latent traits, allowing for interpolation and extrapolation along the conditioning parameter. This behavior is illustrated in Fig. 3, which shows three latent distributions produced by conditioning the trained cVAE's decoder on different peak shifts. For example, Fig. 3(a) shows the manifold produced by conditioning the decoder on a peak shift of −2, resulting in a left-from-the-center shift of all peaks presented in Fig. 3(a); in the meantime, other peak parameters (e.g., peak width) still vary as expected. Similarly, manifolds in Figs. 3(b) and 3(c) are conditioned on peak shifts of 0 and 2, respectively. Consequently, peaks in Fig. 3(b) are located at the center, and peaks in Fig. 3(c) are shifted to the right-hand side.
Latent manifold learned by cVAE conditioned on a peak shift. (a)–(c) Latent manifold plots under various peak shift conditions. Note that each curve in the manifold shows a constant peak shift corresponding to the chosen condition (−2, 0, 2).
Latent manifold learned by cVAE conditioned on a peak shift. (a)–(c) Latent manifold plots under various peak shift conditions. Note that each curve in the manifold shows a constant peak shift corresponding to the chosen condition (−2, 0, 2).
Application of cVAE in 2D model data
We further expand this approach to 2D objects. Here, we use the previously developed cards data set48 that contains four classical card hands [as shown in Fig. 4(a)] augmented by rotations, translations, and shear. This data set allows for readily identifiable discrete classes as well as interesting degeneracies (e.g., rotated and deformed diamonds can be identical). The card data set used here includes 4000 cards (1000 per each card suite) with various disorders including random rotations in the range of [−30°, 30°] (here we note that the input in the Jupyter Notebook for rotation is [0, 30°], which performs both clockwise rotation and anticlockwise rotation), shifts in the range of [−4, 4] pixels, and shear in the range of [−0.002, 0.002]. Figure 4(b) shows some example card images in the data set.
Simple VAE, rVAE, shift-VAE, and cVAE analysis of the 2D cards data set. (a) The source card images. (b) Examples of the generated cards data with different shifts, rotations, and shears. (c) Latent space of simple VAE analysis, where clear correlations between (1) latent variables and class, (2) latent variables and x-translation, (3) latent variables and y-translation are observed. (d) Latent space of rVAE analysis, where there is no obvious correlation between latent variables and rotation angle because rotation angle is encoded into rotation latent variable. (e) Latent space of shift-VAE analysis, where there is no obvious correlation between latent variables and x-translation/y-translation because translations are encoded into the translation latent variable. (f) Latent space of cVAE analysis conditioned on rotation and x-translation, where there is no obvious correlation between latent variables and rotation angle as well as between latent variables and x-translation; in addition, (1) four classes are very obvious in latent space, (2) an obvious correlation between latent variables and y-translation is seen; these indicate that cVAE simplifies the disentanglements of class and y-translation with known factor of variabilities, i.e., rotation and x-translation. The plots of the discovered latent variables vs ground truth parameters for the VAE, rVAE, shift-VAE, and cVAE analyses in (c)–(f) are shown in Figs. S8–S11 in the supplementary material, respectively. More analyses of cVAE with other conditions are shown in Figs. S12–S14 in the supplementary material.
Simple VAE, rVAE, shift-VAE, and cVAE analysis of the 2D cards data set. (a) The source card images. (b) Examples of the generated cards data with different shifts, rotations, and shears. (c) Latent space of simple VAE analysis, where clear correlations between (1) latent variables and class, (2) latent variables and x-translation, (3) latent variables and y-translation are observed. (d) Latent space of rVAE analysis, where there is no obvious correlation between latent variables and rotation angle because rotation angle is encoded into rotation latent variable. (e) Latent space of shift-VAE analysis, where there is no obvious correlation between latent variables and x-translation/y-translation because translations are encoded into the translation latent variable. (f) Latent space of cVAE analysis conditioned on rotation and x-translation, where there is no obvious correlation between latent variables and rotation angle as well as between latent variables and x-translation; in addition, (1) four classes are very obvious in latent space, (2) an obvious correlation between latent variables and y-translation is seen; these indicate that cVAE simplifies the disentanglements of class and y-translation with known factor of variabilities, i.e., rotation and x-translation. The plots of the discovered latent variables vs ground truth parameters for the VAE, rVAE, shift-VAE, and cVAE analyses in (c)–(f) are shown in Figs. S8–S11 in the supplementary material, respectively. More analyses of cVAE with other conditions are shown in Figs. S12–S14 in the supplementary material.
We apply simple VAE, rotationally invariant autoencoders (rVAEs), shift-invariant VAE, and cVAE to this card data set. The simple VAE analysis with the ground truth labels is illustrated in Fig. 4(c). Note that, due to close similarity between different cards after various rotations and deformations, the VAE fails to cluster the data set based on class variability and classes form connected manifolds. Rather, class-specific clusters form complex interpenetrating distributions in the latent space. That said, translation in x and y directions show clear alignment with chosen directions in the latent space.
Shift-invariant VAE and rVAE allow us to separate translation and rotation into specific latent variables, and the rest of the information is encoded into the standard latent variables. Figure 4(d) shows the rVAE analysis results of the cards data sets. The rVAE reveals better performance for clustering the cards images based on class variability when the card rotation is separated into a rotation variable, though there is still a significant interpenetration between some classes. Then, z1 is associated with both x-translation and y-translation. As expected, the rotation is not associated with z1 and z2 because it is encoded into the rotation variable [as shown in Fig. S9(b) in the supplementary material]. Notably, there is also no correlation between rotation and simple VAE latent, but this is because the competing tendencies of representation disentanglement when several physical factors of variability compete for representation by latent variables. Figure 4(e) shows even better performance on clustering the card images into four classes when implementing shift-VAE. As expected, there is no correlation between translations and standard latent variables as the translations are encoded into the translation variables. In this case, the rotation is associated with the standard latent variables z1 and z2.
Figure 4(f) shows results from the cVAE analysis, where the data set was conditioned on both the rotation and x-translation. In this case, the latent space distribution clearly shows four unique clusters corresponding to the individual cards, with the variation between different classes associated with a selected direction in the latent space. The label distributions corresponding to the rotations and x-translation, i.e., to the variables on which conditioning has been performed, are featureless. At the same time, the translation in the y-direction, i.e., the only remaining factor of variability, becomes clearly associated with another direction in the latent space. Additional examples of cVAE analyses can be found in Figs. S12–S14 in the supplementary material, while the provided Jupyter notebook also allows more analyses to be explored. This illustrates that cVAE analysis with known physical parameters in the training session (e.g., rotation and x-translation) enables improving the disentanglement of latent representation of unknown physical parameters (e.g., y-translation).
It is also important to note that the behaviors of the latent representations of the data provide insight into physically relevant factors of variation within the data (i.e., classes, rotations, and translations). For invariant VAEs (rVAE and shift-invariant VAE), the introduction of invariances leads to the simplification of the latent distributions, which become controlled by the remaining (discrete or continuous) factors of variability; we do not discuss more details about rVAE and shift-VAE and direct the readers to our previous works.22,23,36 For the cVAE, introducing the known conditioning enables simplification of the latent distribution if the conditioning vector is correlated with the factor of variability in the data, thus allowing us to separate relevant physical factors and experimental artifacts [e.g., scan distortions in experimental scanning probe microscopy (SPM) data].
Similar to the 1D example, we further explore the potential of the cVAE to not just interpolate but also to extrapolate along the conditioning variables. Figure 5 demonstrates card images generated by a trained cVAE with rotation and x-translation as conditions. The training data includes 4000 cards with random rotations in the range of [−30°, 30°] and shifts in the range of [−15, 15]. The extrapolation was performed by adding specified conditions to cVAE (e.g., x-translation = 20, rotation = 60). The extrapolated card images shown in Fig. 5 are consistent with the specified conditions. The card images in each sub-image shift from the left-hand side to the right-hand side of the field of view as the x-translation condition changes from −20 to 20. The tilt of the cards changes in a counterclockwise manner as the rotation condition changes from −60° to 60°. The chosen example also demonstrates the variability of decoded objects within the chosen regions of latent space. The extrapolation process is also available in the provided Jupyter notebooks, allowing readers to explore it.
Learned latent manifolds of cVAE analysis of 2D cards data with rotation angle and x-translation as conditions. Each latent manifold shows constant x-translation and rotation corresponding to the defined conditions.
Learned latent manifolds of cVAE analysis of 2D cards data with rotation angle and x-translation as conditions. Each latent manifold shows constant x-translation and rotation corresponding to the defined conditions.
In Fig. 6, we summarized the interpolation and extrapolation performance by the mean squared error and structure similarity index49 between generated card images and the corresponding ground truth card image. The structure similarity index, first introduced in 2004,49 is widely used as a metric to measure the similarity between two given images from an image formation point of view. We analyzed three models in Fig. 6: (i) cVAE with rotation as a condition, (ii) cVAE with x-translation and rotation as conditions, and (iii) cVAE with x-translation and y-translation as conditions. We explored the interpolation and extrapolation performance of all three models in Fig. 6. Generally, interpolation performs extremely well, with almost perfect reconstruction within the training region. Figure 6 suggests that the cVAE allows for limited extrapolation on conditioning parameters. For the data set conditioned on the x-translation and rotation, the structure similarity index (SSID) for the images was reconstructed well outside the original training region, with the ground truth illustrating a clear matching pattern. Interestingly, the regions of good matching have a complex structure, where some directions in parameter space are associated with good interpolation, whereas in other parts of parameter space, the errors accumulate. These are unsurprising given the various local symmetries among the initial 4 cards. A similar behavior is observed for conditioning on x-translation and y-translation.
Interpolation and extrapolation performance of cVAE are shown as mean squared error and structure similarity index between the cVAE generated card data and the ground truth card data as a function of defined conditions. (a) Performance of cVAE conditioned on rotation. (b) Performance of cVAE conditioned on x-translation and rotation. (c) Performance of cVAE conditioned on x-translation and y-translation.
Interpolation and extrapolation performance of cVAE are shown as mean squared error and structure similarity index between the cVAE generated card data and the ground truth card data as a function of defined conditions. (a) Performance of cVAE conditioned on rotation. (b) Performance of cVAE conditioned on x-translation and rotation. (c) Performance of cVAE conditioned on x-translation and y-translation.
Application of cVAE in experimental data
With this thorough understanding of VAE based approaches, we extend this analysis to experimental data on ferroelectric domain switching. Previously, we have demonstrated the use of rVAE to explore the ferroelectric domain switching pathway and domain wall dynamics. When applying rVAE to consecutive PFM images revealing the ferroelectric domain switching process, the polarization switching mechanism can be visualized in the latent space.23 When applying rVAE to stacked ferroelectric and ferroelastic domain wall images (generated based on numerous continuously acquired PFM images during domain switching via electric field poling), it disentangles the factors affecting the ferroelectric domain wall dynamics. This includes how the distribution of ferroelastic domain walls affects the dynamics of ferroelectric domain walls, offering insights into the intrinsic mechanisms of ferroelectric polarization switching and hence approaches to engineer devices with more stable domains, domains that can switch faster, or for lower energy switching.46 In particular, we probed the ferroelectric domain wall pinning mechanisms by translating the latent space to physical descriptors.46 However, these analyses were enabled by the rotational invariances inherent to rVAE, but the physical interpretation of the latent variables was based exclusively on the analysis of the latent spaces. Here, we expand this analysis toward elucidation of the relevant latent mechanisms when the input data are conditioned on the a priori known physical descriptors.
As a model system, we explore the ferroelectric polarization switching dynamics in a 150 nm thick lead zirconate titanate (PZT) thin film grown on a SrTiO3 (001) substrate by pulsed laser deposition (PLD), with a heteroepitaxial intermediate conducting oxide electrode (SRO).50,51 We explore the domain switching dynamics as a function of time using PFM by applying a constant tip bias that just surpasses the coercive field. Consecutive PFM images (Fig. S15 in the supplementary material) show the ferroelectric switching from the (001) to the (00−1) states. Consequently, domain switching can be excited and observed at the same time.52,53 These PFM data were used in our earlier publication;23 here, we just reuse the PFM data to demonstrate the application of cVAE.
In the cVAE analysis, we introduced a time delay (dt). That means, the domain wall location is determined by a Canny filter54 at time t, the sub-image centered at the domain wall location is created at time t and t + dt. This leads to a comparison of domains at time t and t + dt in the sub-image data sets and hence the domain switching and wall dynamics are encoded as dependents of time. Figure 7(a) shows a comparison of domains at dt = 0 and dt = 5; in the first image (dt = 0) the domain wall is located at the center, in the second image (dt = 5) the domain wall moves away from the center. Figure 7(b) shows example sub-images used for cVAE analysis.
Simple VAE and rVAE analysis with conditions of experimental PFM data. (a) Examples of PFM image patches (window size = 30) with different time delays. (b) Examples of the generated PFM image patches. The color in the PFM image represents the polarization magnitude. (c)–(e) VAE analyses: (c) latent space of VAE analysis without conditions colored by switch degree and time delay, where a correlation between latent variables and switch degree and time delay is observed; (d) latent space of cVAE with switch degree as a condition; in this case, no correlation between ground truth parameters (switch degree and time delay) and latent variables is observed; (e) latent space of cVAE with time delay as a condition; in this case, the correlations between ground truth parameters (switch degree and time delay) and latent variables are modified but do not disappear. It is seen that adding time delay as a condition is not functioning; this is probably because there is conflict of information included in time delay and switch degree, as seen in (a), that is, a larger time delay generally corresponds to a larger switch degree. The latent variables vs ground truth parameter plots of these VAE analyses are shown in Fig. S16 in the supplementary material.
Simple VAE and rVAE analysis with conditions of experimental PFM data. (a) Examples of PFM image patches (window size = 30) with different time delays. (b) Examples of the generated PFM image patches. The color in the PFM image represents the polarization magnitude. (c)–(e) VAE analyses: (c) latent space of VAE analysis without conditions colored by switch degree and time delay, where a correlation between latent variables and switch degree and time delay is observed; (d) latent space of cVAE with switch degree as a condition; in this case, no correlation between ground truth parameters (switch degree and time delay) and latent variables is observed; (e) latent space of cVAE with time delay as a condition; in this case, the correlations between ground truth parameters (switch degree and time delay) and latent variables are modified but do not disappear. It is seen that adding time delay as a condition is not functioning; this is probably because there is conflict of information included in time delay and switch degree, as seen in (a), that is, a larger time delay generally corresponds to a larger switch degree. The latent variables vs ground truth parameter plots of these VAE analyses are shown in Fig. S16 in the supplementary material.
In the cVAE analysis, we used switch degree and time delay as conditions. The switch degree represents the ratio of yellow (switched) domain area and blue (unswitched) domain area, and the time delay is explained above. Shown in Fig. 7(c) is a simple VAE analysis without conditions, where latent variables are colored by ground truth values. Just as with the 1D and 2D model systems explored above, the color gradient reveals that both switch degree and time delay are encoded into latent variables. Figures 7(d) and 7(e) then show cVAE analyses conditioned by switch degree and time delay, respectively. Figure 7(d) confirms that the switch degree is featureless in the latent space when it is performed as a condition, indicating that this prior physical knowledge effectively affects the cVAE analysis. However, the time delay is still visible as a correlation within the latent space when it is performed as a condition [Fig. 7(e)]. Note that conditioning on switch degree not only leads to switch degree featureless in latent space but also leads to time delay featureless in latent space. This is possibly because of the intimate connection between switch degree and time delay, as shown in Fig. 7(a); when a time delay is added (e.g., dt = 5), the switch degree also changes simultaneously.
We also explored the reconstruction of PFM image patches by cVAE with predefined parameters. Shown in Fig. 8 are the analyses of two cVAE conditioned on time delay and switch degree, respectively. cVAE manifolds show the reconstruction of PFM image patches with different conditions. More reconstruction by cVAE is shown in Fig. S17 in the supplementary material. Such cVAE manifolds as a function of defined conditions allow us to perform extrapolation into the future as well, as demonstrated in the final column in Fig. 8. The analyses in Figs. 7 and 8 are extended to PFM domain wall images generated from raw PFM images via application of a Canny filter (results are shown in Figs. S18–S21 in the supplementary material).
Latent manifolds with predefined conditions of cVAE analyses of experimental PFM data based on image patches with window size = 20. (a) Latent space distribution and latent manifold of VAE analysis with time-delay as a condition. (b) Latent space distribution and latent manifold of VAE analysis with switch-degree as a condition.
Latent manifolds with predefined conditions of cVAE analyses of experimental PFM data based on image patches with window size = 20. (a) Latent space distribution and latent manifold of VAE analysis with time-delay as a condition. (b) Latent space distribution and latent manifold of VAE analysis with switch-degree as a condition.
CONCLUSION
In conclusion, we demonstrate the use of conditional variational autoencoders (cVAEs) to explore physical information by conditioning prior known physical parameters, and we compare the cVAE with the previous invariant VAE (iVAE) approach. Given that cVAE does not rely on the specific invariant transform, it allows for much greater flexibility. We showed the application of this approach using modeled 1D spectrum and 2D image data sets, revealing that the conditioned parameters become featureless in the latent representation. Then, we extended this approach to experimental PFM data on ferroelectric domain switching and domain wall dynamics. While the latent distribution of the experimental data set shows more complexity, we argue that the cVAE-based physics discovery can be performed in iterative and hypothesis testing modes, allowing for simple and low-dimensional latent distributions when the relevant physical factors are correctly identified.
METHODS
PZT thin films
The PZT film is grown on a SrTiO3 (001) substrate by pulsed laser deposition (PLD), with an intermediate conducting oxide electrode (SRO). The PLD deposition is conducted at 650 °C with 100 mTorr oxygen partial pressure, and then, the samples are cooled to room temperature.
Data analysis
For cVAE training, we used a simple cVAE neural network architecture for the encoder and decoder; both encoder and decoder consist of two fully connected layers with 128 neurons in each layer (i.e., hidden_dim_e = [128, 128], hidden_dim_d = [128, 128]), activated by the hyperbolic tangent function. The number of layers and neurons can be increased depending on the complexity of the input features. The available activations also include ReLU, leaky ReLU, softplus, and GELU. The latent dimension of cVAE in this work is two (i.e., latent_dim = 2), and the conditioning dimension is one, two, or three (e.g., c_dim = 1) depending on the available conditions for each analysis. The loss functions are the sum of reconstruction loss and Kulback–Leibler loss. The conditioning vector was concatenated with the vector sampled from the latent space before passing into the decoder neural network. The training procedure is the same as an autoencoder with an additional term “condition” passed to the model. The 500 epochs training was performed in Google Colab using T4GPU. The detailed methodologies of analysis are also shown in Jupyter notebooks that are available from https://github.com/yongtaoliu/Physics–cVAE, which include all data and model parameters for the results presented in this manuscript, allowing readers to reproduce the investigation. These notebooks also allow readers to adapt it for their own investigations.
The details of shift-VAE and rVAE architectures are described elsewhere.22,23,36
VAEs were implemented using open-source package pyroVED from https://pyroved.readthedocs.io/en/latest/models.html.
SUPPLEMENTARY MATERIAL
See the supplementary material for figures of VAE, rVAE, and cVAE analyses of 1D spectrum, 2D card, and PFM data.
ACKNOWLEDGMENTS
This work (ML analysis) was supported (Y.L., S.V.K., and M.A.Z.) by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences as part of the Energy Frontier Research Centers program: CSSAS—The Center for the Science of Synthesis Across Scales under Award No. DE-SC0019288. This manuscript has been authored by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
AUTHOR DECLARATIONS
Conflict of Interest
The authors declare no conflict of interest.
Author Contributions
Yongtao Liu: Formal analysis (lead); Investigation (lead); Writing – original draft (lead); Writing – review & editing (lead). Bryan D. Huey: Resources (supporting). Maxim A. Ziatdinov: Methodology (lead). Sergei V. Kalinin: Supervision (lead); Writing – original draft (lead); Writing – review & editing (lead).
DATA AVAILABILITY
The interactive Jupyter notebooks that reproduce this paper's results are available at https://git.io/JD28J.