The accurate definition of suitable metastable conformational states is fundamental for the construction of a Markov state model describing biomolecular dynamics. Following the dimensionality reduction in a molecular dynamics trajectory, these microstates can be generated by a recently proposed density-based geometrical clustering algorithm [F. Sittel and G. Stock, J. Chem. Theory Comput. 12, 2426 (2016)], which by design cuts the resulting clusters at the energy barriers and allows for a data-based identification of all parameters. Nevertheless, projection artifacts due to the inevitable restriction to a low-dimensional space combined with insufficient sampling often leads to a misclassification of sampled points in the transition regions. This typically causes intrastate fluctuations to be mistaken as interstate transitions, which leads to artificially short life times of the metastable states. As a simple but effective remedy, dynamical coring requires that the trajectory spends a minimum time in the new state for the transition to be counted. Adopting molecular dynamics simulations of two well-established biomolecular systems (alanine dipeptide and villin headpiece), dynamical coring is shown to considerably improve the Markovianity of the resulting metastable states, which is demonstrated by Chapman-Kolmogorov tests and increased implied time scales of the Markov model. Providing high structural and temporal resolution, the combination of density-based clustering and dynamical coring is particularly suited to describe the complex structural dynamics of unfolded biomolecules.
REFERENCES
This can be tested by projecting the data onto one or two coordinates and comparing the original MD distribution to the number of neighbors the points have within R (normalized to 1) as calculated for this projection. If R is chosen too large, the latter yields blurred features such that details in the point distribution cannot be recovered.
We note that due to the low dimension in the case of AD there is a large number of points that are geometrically isolated from the main cluster of points, e.g., the αL-helical region. However, this region forms a cluster of more than 0.1% of data and is therefore not defined as noise.