A dimensionality reduction method for high-dimensional circular data is developed, which is based on a principal component analysis (PCA) of data points on a torus. Adopting a geometrical view of PCA, various distance measures on a torus are introduced and the associated problem of projecting data onto the principal subspaces is discussed. The main idea is that the (periodicity-induced) projection error can be minimized by transforming the data such that the maximal gap of the sampling is shifted to the periodic boundary. In a second step, the covariance matrix and its eigendecomposition can be computed in a standard manner. Adopting molecular dynamics simulations of two well-established biomolecular systems (Aib9 and villin headpiece), the potential of the method to analyze the dynamics of backbone dihedral angles is demonstrated. The new approach allows for a robust and well-defined construction of metastable states and provides low-dimensional reaction coordinates that accurately describe the free energy landscape. Moreover, it offers a direct interpretation of covariances and principal components in terms of the angular variables. Apart from its application to PCA, the method of maximal gap shifting is general and can be applied to any other dimensionality reduction method for circular data.

1.
A.
Hyvärinen
,
J.
Karhunen
, and
E.
Oja
,
Independent Component Analysis
(
John Wiley & Sons
,
New York
,
2001
).
2.
I. T.
Jolliffe
,
Principal Component Analysis
(
Springer
,
New York
,
2002
).
3.
P.
Benner
,
V.
Mehrmann
, and
D. C.
Sorensen
,
Dimension Reduction of Large-Scale Systems
(
Springer
,
New York
,
2005
).
4.
I.
Borg
and
P. J.
Groenen
,
Modern Multidimensional Scaling: Theory and Applications
(
Springer
,
New York
,
2005
).
5.
J. A.
Lee
and
M.
Verleysen
,
Nonlinear Dimensionality Reduction
(
Springer
,
New York
,
2007
).
6.
G. R.
Bowman
,
V. S.
Pande
, and
F.
Noe
,
An Introduction to Markov State Models
(
Springer
,
Heidelberg
,
2013
).
7.
M. A.
Rohrdanz
,
W.
Zheng
, and
C.
Clementi
, “
Discovering mountain passes via torchlight: Methods for the definition of reaction coordinates and pathways in complex macromolecular reactions
,”
Annu. Rev. Phys. Chem.
64
,
295
(
2013
).
8.
O. F.
Lange
and
H.
Grubmüller
, “
Generalized correlation for biomolecular dynamics
,”
Proteins: Struct., Funct., Bioinf.
62
,
1053
(
2006
).
9.
G.
Perez-Hernandez
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noe
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
,
015102
(
2013
).
10.
A.
Amadei
,
A. B. M.
Linssen
, and
H. J. C.
Berendsen
, “
Essential dynamics of proteins
,”
Proteins: Struct., Funct., Genet.
17
,
412
(
1993
).
11.
B. L.
de Groot
,
X.
Daura
,
A. E.
Mark
, and
H.
Grubmüller
, “
Essential dynamics of reversible peptide folding: Memory-free conformational dynamics governed by internal hydrogen bonds
,”
J. Mol. Biol.
309
,
299
(
2001
).
12.
G.
Kurz
,
I.
Gilitschenski
, and
U. D.
Hanebeck
, “
Recursive nonlinear filtering for angular data based on circular distributions
,” in
American Control Conference (ACC), 2013
(
IEEE
,
2013
), pp.
5439
5445
.
13.
S. C.
Lovell
,
I. W.
Davis
,
W. B.
Arendall
,
P. I. W.
de Bakker
,
J. M.
Word
,
M. G.
Prisant
,
J. S.
Richardson
, and
D. C.
Richardson
, “
Structure validation by cα geometry: ϕ, ψ and cβ deviation
,”
Proteins: Struct., Funct., Bioinf.
50
,
437
(
2003
).
14.
K. V.
Mardia
and
P. E.
Jupp
,
Directional Statistics
(
John Wiley & Sons
,
2009
).
15.
D. M. D.
van Aalten
,
B. L.
de Groot
,
J. B. C.
Finday
,
H. J. C.
Berendsen
, and
A.
Amadei
, “
A comparison of techniques for calculating protein essential dynamics
,”
J. Comput. Chem.
18
,
169
(
1997
).
16.
Y.
Mu
,
P. H.
Nguyen
, and
G.
Stock
, “
Energy landscape of a small peptide revealed by dihedral angle principal component analysis
,”
Proteins: Struct., Funct., Bioinf.
58
,
45
(
2005
).
17.
A.
Altis
,
P. H.
Nguyen
,
R.
Hegger
, and
G.
Stock
, “
Dihedral angle principal component analysis of molecular dynamics simulations
,”
J. Chem. Phys.
126
,
244111
(
2007
).
18.
K.
Sargsyan
,
J.
Wright
, and
C.
Lim
, “
GeoPCA: A new tool for multivariate analysis of dihedral angles based on principal component geodesics
,”
Nucl. Acids Res.
40
,
e25
(
2012
).
19.
K.
Sargsyan
,
J.
Wright
, and
C.
Lim
, “
Corrigendum to GeoPCA: A new tool for multivariate analysis of dihedral angles based on principal component geodesics
,”
Nucl. Acids Res.
43
,
10571
(
2015
).
20.
S.
Huckemann
and
H.
Ziezold
, “
Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces
,”
Adv. Appl. Prob.
38
,
299
(
2006
).
21.
A.
Nodehi
,
M.
Golalizadeh
, and
A.
Heydari
, “
Dihedral angles principal geodesic analysis using nonlinear statistics
,”
J. Appl. Stat.
42
,
1962
(
2015
).
22.
B.
Eltzner
,
S.
Huckemann
, and
K. V.
Mardia
, “
Torus principal component analysis with an application to RNA structures
,” e-print arXiv:1511.04993 (
2015
).
23.
A.
Altis
,
M.
Otten
,
P. H.
Nguyen
,
R.
Hegger
, and
G.
Stock
, “
Construction of the free energy landscape of biomolecules via dihedral angle principal component analysis
,”
J. Chem. Phys.
128
,
245102
(
2008
).
24.
L.
Riccardi
,
P. H.
Nguyen
, and
G.
Stock
, “
Free energy landscape of an RNA hairpin constructed via dihedral angle principal component analysis
,”
J. Phys. Chem. B
113
,
16660
(
2009
).
25.
G. G.
Maisuradze
,
A.
Liwo
, and
H. A.
Scheraga
, “
Principal component analysis for protein folding dynamics
,”
J. Mol. Biol.
385
,
312
(
2009
).
26.
A.
Jain
,
R.
Hegger
, and
G.
Stock
, “
Hidden complexity of protein energy landscape revealed by principal component analysis by parts
,”
J. Phys. Chem. Lett.
1
,
2769
(
2010
).
27.
D. A.
Potoyan
and
G. A.
Papoian
, “
Energy landscape analyses of disordered histone tails reveal special organization of their conformational dynamics
,”
J. Am. Chem. Soc.
133
,
7405
(
2011
).
28.
J. C.
Miner
,
A. A.
Chen
, and
A. E.
García
, “
Free-energy landscape of a hyperstable RNA tetraloop
,”
Proc. Natl. Acad. Sci. U. S. A.
113
,
6665
(
2016
).
29.
G. M.
Hocky
,
J. L.
Baker
,
M. J.
Bradley
,
A. V.
Sinitskiy
,
E. M.
De La Cruz
, and
G. A.
Voth
, “
Cations stiffen actin filaments by adhering a key structural element to adjacent subunits
,”
J. Phys. Chem. B
120
,
4558
(
2016
).
30.
C. R.
Watts
,
A. J.
Gregory
,
C. P.
Frisbie
, and
S.
Lovas
, “
Structural properties of amyloid β(1-40) dimer explored by replica exchange molecular dynamics simulations
,”
Proteins
85
,
1024
(
2017
).
31.
S.
Buchenberg
,
N.
Schaudinnus
, and
G.
Stock
, “
Hierarchical biomolecular dynamics: Picosecond hydrogen bonding regulates microsecond conformational transitions
,”
J. Chem. Theory Comput.
11
,
1330
(
2015
).
32.
S.
Piana
,
K.
Lindorff-Larsen
, and
D. E.
Shaw
, “
Protein folding kinetics and thermodynamics from atomistic simulation
,”
Proc. Natl. Acad. Sci. U. S. A.
109
,
17845
(
2012
).
33.
K.
Pearson
, “
On lines and planes of closest fit to systems of points in space
,”
Philos. Mag.
2
,
559
(
1901
).
34.

The projection of the data points onto this main principal component axis destroys any properties of “neighborhood,” i.e., two points which are very close to each other on the torus (i.e., the data space) may be arbitrarily far apart from each other when projected (according to closest distance) onto this axis. Furthermore, on this principal component the data points will in general be distributed over an infinite length.

35.
K.
Hinsen
, “
Comment on ‘Energy landscape of a small peptide revealed by dihedral angle principal component analysis
,’”
Proteins: Struct., Funct., Bioinf.
64
,
795
(
2006
).
36.
Y.
Mu
,
P. H.
Nguyen
, and
G.
Stock
, “
Reply to the comment on ‘Energy landscape of a small peptide revealed by dihedral angle principal component analysis
,’”
Proteins: Struct., Funct., Bioinf.
64
,
798
(
2006
).
37.

Due to a lack of rigor, in particular with respect to notation, it is not really clear from the article how exactly GeoPCA is performed. Some formulae of Ref. 18 refer to scalar products of D-dimensional vectors of data points with (D + 1)-dimensional vectors in the embedding space, which is clearly not what the authors had in mind. The existing computer program for this analysis53 seems to be based on the usual representation of a sphere by generalizations of Euler angles. However, the restriction to principal dimensions being great circles (geodesics) will yield satisfactory results only in very special cases of data structures.

38.
A.
Jain
and
G.
Stock
, “
Hierarchical folding free energy landscape of HP35 revealed by most probable path clustering
,”
J. Phys. Chem. B
118
,
7750
(
2014
).
39.
F.
Sittel
and
G.
Stock
, “
Robust density-based clustering to identify metastable conformational states of proteins
,”
J. Chem. Theory Comput.
12
,
2426
(
2016
).
40.
D.
van der Spoel
,
E.
Lindahl
,
B.
Hess
,
G.
Groenhof
,
A. E.
Mark
, and
H. J. C.
Berendsen
, “
Gromacs; fast, flexible and free
,”
J. Comput. Chem.
26
,
1701
(
2005
).
41.
W. F.
van Gunsteren
,
S. R.
Billeter
,
A. A.
Eising
,
P. H.
Hünenberger
,
P.
Krüger
,
A. E.
Mark
,
W. R. P.
Scott
, and
I. G.
Tironi
,
Biomolecular Simulation: The GROMOS96 Manual and User Guide
(
Vdf Hochschulverlag AG an der ETH Zürich
,
Zürich
,
1996
).
42.
I. G.
Tironi
and
W. F.
van Gunsteren
, “
A molecular dynamics simulation study of chloroform
,”
Mol. Phys.
83
,
381
(
1994
).
43.
V.
Hornak
,
R.
Abel
,
A.
Okur
,
B.
Strockbine
,
A.
Roitberg
, and
C.
Simmerling
, “
Comparison of multiple Amber force fields and development of improved protein backbone parameters
,”
Proteins: Struct., Funct., Bioinf.
65
,
712
(
2006
).
44.
R. B.
Best
and
G.
Hummer
, “
Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides
,”
J. Phys. Chem. B
113
,
9004
(
2009
).
45.
K.
Lindorff-Larsen
,
S.
Piana
,
K.
Palmo
,
P.
Maragakis
,
J. L.
Klepeis
,
R. O.
Dror
, and
D. E.
Shaw
, “
Improved side-chain torsion potentials for the Amber ff99sb protein force field
,”
Proteins: Struct., Funct., Bioinf.
78
,
1950
(
2010
).
46.
W. L.
Jorgensen
,
J.
Chandrasekhar
,
J. D.
Madura
,
R. W.
Impey
, and
M.
Klein
, “
Comparison of simple potential functions for simulating liquid water
,”
J. Chem. Phys.
79
,
926
(
1983
).
47.
A.
Jain
and
G.
Stock
, “
Identifying metastable states of folding proteins
,”
J. Chem. Theory Comput.
8
,
3810
(
2012
).
48.
H.
Frauenfelder
,
S.
Sligar
, and
P.
Wolynes
, “
The energy landscapes and motions of proteins
,”
Science
254
,
1598
(
1991
).
49.
J. N.
Onuchic
,
Z. L.
Schulten
, and
P. G.
Wolynes
, “
Theory of protein folding: The energy landscape perspective
,”
Annu. Rev. Phys. Chem.
48
,
545
(
1997
).
50.
K. A.
Dill
and
H. S.
Chan
, “
From Levinthal to pathways to funnels: The ‘new view’ of protein folding kinetics
,”
Nat. Struct. Biol.
4
,
10
(
1997
).
51.
J.-H.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J. D.
Chodera
,
C.
Schütte
, and
F.
Noe
, “
Markov models of molecular kinetics: Generation and validation
,”
J. Chem. Phys.
134
,
174105
(
2011
).
52.
D.
Shukla
,
C. X.
Hernández
,
J. K.
Weber
, and
V. S.
Pande
, “
Markov state models provide insights into dynamic modulation of protein function
,”
Acc. Chem. Res.
48
,
414
(
2015
).
53.
K.
Sargsyan
,
Y. H.
Hua
, and
C.
Lim
, “
Clustangles: An open library for clustering angular data
,”
J. Chem. Inf. Mod.
55
,
1517
(
2015
).

Supplementary Material

You do not currently have access to this content.