As deep Variational Auto-Encoder (VAE) frameworks become more widely used for modeling biomolecular simulation data, we emphasize the capability of the VAE architecture to concurrently maximize the time scale of the latent space while inferring a reduced coordinate, which assists in finding slow processes as according to the variational approach to conformational dynamics. We provide evidence that the VDE framework [Hernández et al., Phys. Rev. E 97, 062412 (2018)], which uses this autocorrelation loss along with a time-lagged reconstruction loss, obtains a variationally optimized latent coordinate in comparison with related loss functions. We thus recommend leveraging the autocorrelation of the latent space while training neural network models of biomolecular simulation data to better represent slow processes.
The Variational Auto-Encoder (VAE) framework,1 a neural network architecture for dimensionality reduction, is increasingly used for analyzing simulation data from biophysical systems2,3 and to infer collective variables for enhanced sampling simulations.4–6 In the process of developing auto-encoder-based models for simulation data, several modifications to the original VAE loss function have been proposed to better suit the analysis of time-series data, and a more thorough analysis of the effect of these modifications on the modeled latent space is needed.
Consider a trajectory xt that we wish to encode in a reduced-dimensionality latent space zt. The traditional VAE framework learns a latent coordinate by iteratively (1) mapping the input coordinate xt to a latent space coordinate zt using the encoding network, qϕ(zt|xt), and (2) generating a reconstruction of the original coordinate, , using the decoding network, . The standard VAE loss function [Eq. (1)] trains both networks concurrently and is comprised of two terms. A reconstruction loss (termed in this work, denoting reconstruction as an identity operation) aims to quantify how well the VAE reconstructs the data by minimizing the mean squared distance between the original data and the reconstructed data. A KL-divergence loss () on the encoded distribution, qϕ(zt|xt), and a prior on the latent encoding, P(z), imposes a penalty on the complexity of the latent coordinate. This discourages the model from deterministically encoding each data point to a unique value and instead encodes a distribution where neighboring points in the latent coordinate are encouraged to be correlated,
This standard framework can be augmented to better encode time-series data. We recently introduced the Variational Dynamics Encoder (VDE),2,5 which presented two modifications to the original VAE: (1) we incorporated a term to maximize the autocorrelation of the latent space, and (2) our decoder network was structured as a propagator, trained to reconstruct coordinates at some lag time in the future instead of reconstructing the input itself.
A VAE can optimize the latent coordinate’s time scale. Unlike other VAE-based methods for dimensionality reduction in biophysical simulation, our previous work2 incorporates the autocorrelation (abbreviated AC) of the latent coordinate in the loss function, which encourages the model to find a maximally correlated latent coordinate,
where is the batch mean of the encoded latent variable zt and is the batch standard deviation of zt. This autocorrelation loss is motivated by the variational approach to conformational dynamics,7 which states that in the limit of infinite sampling, no dynamical process can be approximated that is slower than the true slowest process. Thus, process time scales can serve as measure of model quality: a model with slower processes is a better model of the system dynamics. This has been widely employed in evaluating and parametrizing Markov state models.8
We can view the latent space of a VAE as a model that has the potential to identify the slowest process measurable through the expressivity afforded by neural networks. We can approximate the time scale of the latent space by measuring the autocorrelation ρ of points in the latent space given a lag time τ. Furthermore, the autocorrelation of the latent space is directly related to the sum of the eigenvalues.9 By including the autocorrelation of the latent space in our loss function, we are performing optimization on the quality of our model in representing long time scales, concurrently with using the VAE framework to perform variational inference to infer the latent coordinate zt.
A VAE may be structured as a propagator. Our second modification to the original VAE framework structured the network as a propagator rather than an auto-encoder and is used in other frameworks as well.3 Instead of training to reconstruct the coordinate space given a datapoint at time t, we trained on reconstructing the coordinate space at time t + τ, where τ is a user-selected lag time. In this sense, our VDE network aims to approximate the propagator of the system , an operator that, given a distribution f(xt), is able to generate f(xt+τ). The VDE reconstruction loss is written as
The loss function used in the VDE sums a KL-divergence loss term, analogous to a traditional VAE model, and the above autocorrelation and reconstruction losses,
Autocorrelation loss is needed to obtain a meaningful encoding. To analyze our modifications to the standard VAE loss function discussed above, we compared the latent coordinates of models trained either with or without and either with the standard VAE reconstruction loss, , or . Additionally, we compared models trained only with , to isolate the effects of from the two reconstruction losses and . All compared loss functions contained as this term is essential for performing variational inference.1
For each condition, we trained 10 independent models on simulation data10 of the villin headpiece domain as described previously2 with a 44 ns lag time. For all models that included , the training loss converged within 10 epochs and the models identified qualitatively very similar latent coordinates, while all models trained without did not converge.
Figure 1(a) depicts simulation data projected onto the latent coordinates identified in each condition. To compare to the commonly used dimensionality reduction technique, time-structure independent component analysis (tICA),11 the same data are projected onto the slowest tIC from an optimized tICA model12 [Fig. 1(a–i)]. Neither a standard VAE loss function, [1(a–ii)], which is analogous to the loss function used in Ref. 6, nor [1(a–iii)], which is analogous to the loss function used in Ref. 3, is able to find a meaningful latent coordinate. Instead, both encode to a minimally informative Gaussian-shaped distribution. By contrast, training with alone [Fig. 1(a–iv)] identifies a latent coordinate separating the misfolded state [labeled MF in Fig. 1(a)] from the folded (F) and unfolded (UF) states. Additionally incorporating either [Fig. 1(a–v)] or [1(a–vi)], the loss function used in the VDE,2 results in a richer encoding that is able to separate the folded and unfolded state.
The latent coordinate autocorrelation loss () is needed to obtain a useful encoding for the villin headpiece folding landscape. (a-ii)–(a-vi) depict free energy landscapes transformed by VAE encodings with altered loss functions, compared to the slowest coordinate obtained from an optimized linear tICA model (i). Without , neither reconstruction loss type (ii) nor (iii) is able to encode the landscape. Training with (iv) is needed to encode the slow process spanning F/UF to MF. Including (v) or (vi) further benefits reconstruction, resolving the difference between the folded (F) and unfolded (UF) state. Higher autocorrelation of the encoded landscape also indicates model quality [1(b)]. Using [(b), red curve] results in a more autocorrelated latent coordinate than alone (cyan), whereas using (magenta) results in a less autocorrelated latent coordinate. Both and result in encodings with negligible autocorrelation (blue and green curves).
The latent coordinate autocorrelation loss () is needed to obtain a useful encoding for the villin headpiece folding landscape. (a-ii)–(a-vi) depict free energy landscapes transformed by VAE encodings with altered loss functions, compared to the slowest coordinate obtained from an optimized linear tICA model (i). Without , neither reconstruction loss type (ii) nor (iii) is able to encode the landscape. Training with (iv) is needed to encode the slow process spanning F/UF to MF. Including (v) or (vi) further benefits reconstruction, resolving the difference between the folded (F) and unfolded (UF) state. Higher autocorrelation of the encoded landscape also indicates model quality [1(b)]. Using [(b), red curve] results in a more autocorrelated latent coordinate than alone (cyan), whereas using (magenta) results in a less autocorrelated latent coordinate. Both and result in encodings with negligible autocorrelation (blue and green curves).
The VDE produces a variationally optimized model. To determine which loss function provided the most optimal model as defined by the variational principle for conformation dynamics, i.e., the model identifying the process with the longest time scale, we computed the autocorrelations of each latent coordinate at a range of lag times [Fig. 1(c)]. The model using and , the framework presented in the VDE,2 has the most autocorrelated latent coordinate (red curve), indicating that it is optimal of the models compared.
In this work, we provide evidence that using the autocorrelation of the latent coordinate as a loss function () is useful and possibly essential for characterizing protein systems with VAE-based neural network models. As deep VAE models become more widely used for studying biophysical systems, we recommend including a loss term to maximize the autocorrelation of the latent space in VAE frameworks. Doing so directly couples model training with finding a latent representation with the longest possible time scale, leveraging existing theory regarding conformational dynamics and its implications for optimizing models.
The authors thank C. X. Hernández, B. E. Husic, and M. M. Sultan for insightful discussion. H.K.W.S. acknowledges support from NSF GRFP (Grant No. DGE-114747).