We illustrate relationships between classical kernel-based dimensionality reduction techniques and eigendecompositions of empirical estimates of reproducing kernel Hilbert space operators associated with dynamical systems. In particular, we show that kernel canonical correlation analysis (CCA) can be interpreted in terms of kernel transfer operators and that it can be obtained by optimizing the variational approach for Markov processes score. As a result, we show that coherent sets of particle trajectories can be computed by kernel CCA. We demonstrate the efficiency of this approach with several examples, namely, the well-known Bickley jet, ocean drifter data, and a molecular dynamics problem with a time-dependent potential. Finally, we propose a straightforward generalization of dynamic mode decomposition called coherent mode decomposition. Our results provide a generic machine learning approach to the computation of coherent sets with an objective score that can be used for cross-validation and the comparison of different methods.

1.
H.
Hotelling
, “
Analysis of a complex of statistical variables into principal components
,”
J. Educ. Psychol.
24
(
6
),
417
441
(
1933
).
2.
H.
Hotelling
, “
Relations between two sets of variates
,”
Biometrika
28
,
321
377
(
1936
).
3.
A.
Hyvärinen
and
E.
Oja
, “
Independent component analysis: Algorithms and applications
,”
Neural Netw.
13
(
4–5
),
411
430
(
2000
).
4.
L.
Molgedey
and
H. G.
Schuster
, “
Separation of a mixture of independent signals using time delayed correlations
,”
Phys. Rev. Lett.
72
,
3634
3637
(
1994
).
5.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
(
1
),
015102
(
2013
).
6.
H.
Wu
and
F.
Noé
, “Variational approach for learning Markov processes from time series data,”
J. Nonlinear Sci.
(published online).
7.
P.
Schmid
and
J.
Sesterhenn
, “Dynamic mode decomposition of numerical and experimental data,” in 61st Annual Meeting of the APS Division of Fluid Dynamics (American Physical Society, 2018).
8.
A.
Mardt
,
L.
Pasquali
,
H.
Wu
, and
F.
Noé
, “
VAMPnets for deep learning of molecular kinetics
,”
Nat. Commun.
9
(
1
),
5
(
2018
).
9.
Q.
Li
,
F.
Dietrich
,
E. M.
Bollt
, and
I. G.
Kevrekidis
, “
Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator
,”
Chaos
27
,
103111
(
2017
).
10.
S. E.
Otto
and
C. W.
Rowley
, “
Linearly-recurrent autoencoder networks for learning dynamics
,”
SIAM J. Appl. Dyn. Syst.
18
(
1
),
558
593
(
2019
).
11.
B.
Schölkopf
,
A.
Smola
, and
K.-R.
Müller
, “
Nonlinear component analysis as a kernel eigenvalue problem
,”
Neural Comput.
10
(
5
),
1299
1319
(
1998
).
12.
T.
Melzer
,
M.
Reiter
, and
H.
Bischof
, “Nonlinear feature extraction using generalized canonical correlation analysis,” in Artificial Neural Networks—ICANN 2001, edited by G. Dorffner, H. Bischof, and K. Hornik (Springer, Berlin, 2001), pp. 353–360.
13.
F. R.
Bach
and
M. I.
Jordan
, “
Kernel independent component analysis
,”
J. Mach. Learn. Res.
3
,
1
48
(
2002
); available at http://www.jmlr.org/papers/v3/bach02a
14.
S.
Harmeling
,
A.
Ziehe
,
M.
Kawanabe
, and
K.-R.
Müller
, “
Kernel-based nonlinear blind source separation
,”
Neural Comput.
15
(
5
),
1089
1124
(
2003
).
15.
M. O.
Williams
,
C. W.
Rowley
, and
I. G.
Kevrekidis
, “
A kernel-based method for data-driven Koopman spectral analysis
,”
J. Comput. Dyn.
2
(
2
),
247
265
(
2015
).
16.
F.
Noé
and
F.
Nüske
, “
A variational approach to modeling slow processes in stochastic dynamical systems
,”
Multiscale Model. Simul.
11
(
2
),
635
655
(
2013
).
17.
C.
Schütte
,
A.
Fischer
,
W.
Huisinga
, and
P.
Deuflhard
, “
A direct approach to conformational dynamics based on hybrid Monte Carlo
,”
J. Comput. Phys.
151
,
146
168
(
1999
).
18.
A.
Bovier
, “Metastability: a potential theoretic approach,” in Proceedings of the International Congress of Mathematicians (
Springer
,
2006
), pp. 499–518.
19.
C. R.
Schwantes
and
V. S.
Pande
, “
Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9
,”
J. Chem. Theory Comput.
9
,
2000
2009
(
2013
).
20.
L.
Song
,
J.
Huang
,
A.
Smola
, and
K.
Fukumizu
, “Hilbert space embeddings of conditional distributions with applications to dynamical systems,” in Proceedings of the 26th Annual International Conference on Machine Learning (
ACM
, 2009), pp. 961–968.
21.
K.
Muandet
,
K.
Fukumizu
,
B.
Sriperumbudur
, and
B.
Schölkopf
, “
Kernel mean embedding of distributions: A review and beyond
,”
Found. Trends Mach. Learn.
10
(
1–2
),
1
141
(
2017
).
22.
B.
Koopman
, “
Hamiltonian systems and transformation in Hilbert space
,”
Proc. Natl. Acad. Sci. U.S.A.
17
(
5
),
315
(
1931
).
23.
A.
Lasota
and
M. C.
Mackey
, Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics, Applied Mathematical Sciences, 2nd ed., Vol. 97 (Springer, 1994).
24.
S.
Klus
,
I.
Schuster
, and
K.
Muandet
, “Eigendecompositions of transfer operators in reproducing kernel Hilbert spaces,”
J. Nonlinear Sci.
(published online).
25.
G.
Froyland
and
O.
Junge
, “
Robust FEM-based extraction of finite-time coherent sets using scattered, sparse, and incomplete trajectories
,”
SIAM J. Appl. Dyn. Syst.
17
(
2
),
1891
1924
(
2018
).
26.
G.
Froyland
,
N.
Santitissadeekorn
, and
A.
Monahan
, “
Transport in time-dependent dynamical systems: Finite-time coherent sets
,”
Chaos
20
(
4
),
043116
(
2010
).
27.
G.
Froyland
and
O.
Junge
, “
On fast computation of finite-time coherent sets using radial basis functions
,”
Chaos
25
(
8
),
087409
(
2015
).
28.
M. O.
Williams
,
I. I.
Rypina
, and
C. W.
Rowley
, “
Identifying finite-time coherent sets from limited quantities of Lagrangian data
,”
Chaos
25
(
8
),
087408
(
2015
).
29.
A.
Hadjighasem
,
D.
Karrasch
,
H.
Teramoto
, and
G.
Haller
, “
Spectral-clustering approach to Lagrangian vortex detection
,”
Phys. Rev. E
93
,
063107
(
2016
).
30.
R.
Banisch
and
P.
Koltai
, “
Understanding the geometry of transport: Diffusion maps for Lagrangian trajectory data unravel coherent sets
,”
Chaos
27
(
3
),
035804
(
2017
).
31.
B. E.
Husic
,
K. L.
Schlueter-Kuck
, and
J. O.
Dabiri
, “
Simultaneous coherent structure coloring facilitates interpretable clustering of scientific data by amplifying dissimilarity
,”
PLoS One
14
(
3
),
e0212442
(
2019
).
32.
M. R.
Allshouse
and
T.
Peacock
, “
Lagrangian based methods for coherent structure detection
,”
Chaos
25
(
9
),
097617
(
2015
).
33.
S.
Klus
,
F.
Nüske
,
P.
Koltai
,
H.
Wu
,
I.
Kevrekidis
,
C.
Schütte
, and
F.
Noé
, “
Data-driven model reduction and transfer operator approximation
,”
J. Nonlinear Sci.
28
,
985
1010
(
2018
).
34.
P.
Koltai
,
H.
Wu
,
F.
Noé
, and
C.
Schütte
, “
Optimal data-driven estimation of generalized Markov state models for non-equilibrium dynamics
,”
Computation
6
(
1
),
22
(
2018
).
35.
B.
Schölkopf
and
A. J.
Smola
,
Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond
(
MIT Press
,
Cambridge
,
2001
).
36.
I.
Steinwart
and
A.
Christmann
,
Support Vector Machines
, 1st ed. (
Springer
,
New York
,
2008
).
37.
J.
Shawe-Taylor
and
N.
Cristianini
,
Kernel Methods for Pattern Analysis
(
Cambridge University Press
,
2004
).
38.
C.
Baker
, “
Mutual information for Gaussian processes
,”
SIAM J. Appl. Math.
19
(
2
),
451
458
(
1970
).
39.
C.
Baker
, “
Joint measures and cross-covariance operators
,”
Trans. Am. Math. Soc.
186
,
273
289
(
1973
).
40.
J. R.
Baxter
and
J. S.
Rosenthal
, “
Rates of convergence for everywhere-positive Markov chains
,”
Stat. Probab. Lett.
22
(
4
),
333
338
(
1995
).
41.
S.
Klus
,
P.
Koltai
, and
C.
Schütte
, “
On the numerical approximation of the Perron–Frobenius and Koopman operator
,”
J. Comput. Dyn.
3
(
1
),
51
79
(
2016
).
42.
J.
Mercer
, “
Functions of positive and negative type and their connection with the theory of integral equations
,”
Philos. Trans. R. Soc.
209
,
415
446
(
1909
).
43.
M.
Reed
and
B.
Simon
,
Methods of Mathematical Physics I: Functional Analysis
, 2nd ed. (
Academic Press Inc.
,
1980
).
44.
L.
Rosasco
,
M.
Belkin
, and
E.
De Vito
, “
On learning with integral operators
,”
J. Mach. Learn. Res.
11
,
905
934
(
2010
); available at http://www.jmlr.org/papers/v11/rosasco10a.html
45.
C. W.
Groetsch
,
Inverse Problems in the Mathematical Sciences
(
Vieweg
,
1993
).
46.
H.
Engl
and
C. W.
Groetsch
,
Inverse and Ill-Posed Problems
(
Academic Press
,
1996
).
47.
H.
Engl
,
M.
Hanke
, and
A.
Neubauer
,
Regularization of Inverse Problems
(
Kluwer
,
1996
).
48.
L.
Song
,
K.
Fukumizu
, and
A.
Gretton
, “
Kernel embeddings of conditional distributions: A unified kernel framework for nonparametric inference in graphical models
,”
IEEE Signal Process. Mag.
30
(
4
),
98
111
(
2013
).
49.
K.
Fukumizu
,
L.
Song
, and
A.
Gretton
, “
Kernel Bayes’ rule: Bayesian inference with positive definite kernels
,”
J. Mach. Learn. Res.
14
,
3753
3783
(
2013
); available at http://www.jmlr.org/papers/v14/fukumizu13a.html
50.
K.
Fukumizu
, “Nonparametric Bayesian inference with kernel mean embedding,” in Modern Methodology and Applications in Spatial-Temporal Modeling, edited by G. Peters and T. Matsui (
Springer
, 2017).
51.
M.
Mollenhauer
,
I.
Schuster
,
S.
Klus
, and
C.
Schütte
, “Singular value decomposition of operators on reproducing kernel Hilbert spaces,” e-print arXiv:1807.09331 (2018).
52.
J.
Shawe-Taylor
,
C. K. I.
Williams
,
N.
Cristianini
, and
J.
Kandola
, “On the eigenspectrum of the gram matrix and its relationship to the operator eigenspectrum,” in Algorithmic Learning Theory. ALT 2002, Lecture Notes in Computer Science Vol. 2533 (
Springer
, 2002), pp. 23–40.
53.
S.
Klus
,
A.
Bittracher
,
I.
Schuster
, and
C.
Schütte
, “
A kernel-based approach to molecular conformation analysis
,”
J. Chem. Phys.
149
,
244109
(
2018
).
54.
M.
Borga
, “Canonical correlation: A tutorial” (2001).
55.
K.
Fukumizu
,
F.
Bach
, and
A.
Gretton
, “
Statistical consistency of kernel canonical correlation analysis
,”
J. Mach. Learn. Res.
8
,
361
383
(
2007
); available at http://www.jmlr.org/papers/v8/fukumizu07a.html
56.
G.
Froyland
, “
An analytic framework for identifying finite-time coherent sets in time-dependent dynamical systems
,”
Physica D
250
,
1
19
(
2013
).
57.
M. O.
Williams
,
I. G.
Kevrekidis
, and
C. W.
Rowley
, “
A data-driven approximation of the Koopman operator: Extending dynamic mode decomposition
,”
J. Nonlinear Sci.
25
(
6
),
1307
1346
(
2015
).
58.
C. R.
Schwantes
and
V. S.
Pande
, “
Modeling molecular kinetics with TICA and the kernel trick
,”
J. Chem. Theory Comput.
11
(
2
),
600
608
(
2015
).
59.
R.
Penrose
, “
A generalized inverse for matrices
,”
Math. Proc. Cambridge Philos. Soc.
51
(
3
),
406
413
(
1955
).
60.
F.
Noé
, “Machine learning for molecular dynamics on long timescales,” e-print arXiv:1812.07669 (2018).
61.
F.
Noé
and
C.
Clementi
, “
Kinetic distance and kinetic maps from molecular dynamics simulation
,”
J. Chem. Theory Comput.
11
,
5002
5011
(
2015
).
62.
P. J.
Schmid
, “
Dynamic mode decomposition of numerical and experimental data
,”
J. Fluid Mech.
656
,
5
28
(
2010
).
63.
J. H.
Tu
,
C. W.
Rowley
,
D. M.
Luchtenburg
,
S. L.
Brunton
, and
J. N.
Kutz
, “
On dynamic mode decomposition: Theory and applications
,”
J. Comput. Dyn.
1
(
2
),
391
421
(
2014
).
64.
J. N.
Kutz
,
S. L.
Brunton
,
B. W.
Brunton
, and
J. L.
Proctor
,
Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems
(
SIAM
,
2016
).
65.
N. B.
Erichson
,
L.
Mathelin
,
S. L.
Brunton
, and
N. J.
Kutz
, “Randomized dynamic mode decomposition,”
SIAM J. Appl. Dyn. Syst.
18
(
4
),
1867
1891
(
2019
).
66.
I. I.
Rypina
,
M. G.
Brown
,
F. J.
Beron-Vera
,
H.
Koçak
,
M. J.
Olascoaga
, and
I. A.
Udovydchenkov
, “
On the Lagrangian dynamics of atmospheric zonal jets and the permeability of the stratospheric polar vortex
,”
J. Atmos. Sci.
64
(
10
),
3595
3610
(
2007
).
67.
M.
Lange
and
E.
van Sebille
, “
Parcels v0.9: Prototyping a Lagrangian ocean analysis framework for the petascale age
,”
Geosci. Model Dev.
10
(
11
),
4175
4186
(
2017
).
68.
A.
Bittracher
,
P.
Koltai
,
S.
Klus
,
R.
Banisch
,
M.
Dellnitz
, and
C.
Schütte
, “
Transition manifolds of complex metastable systems: Theory and data-driven computation of effective dynamics
,”
J. Nonlinear Sci.
28
(
2
),
471
512
(
2018
).
69.
S.
Röblitz
and
M.
Weber
, “
Fuzzy spectral clustering by PCCA+: Application to Markov state models and data classification
,”
Adv. Data Anal. Classif.
7
(
2
),
147
179
(
2013
).
70.
Such a feature map ϕ:XH admitting the property k(x,x)=ϕ(x),ϕ(x)H is not uniquely defined. There are other feature space representations such as, for instance, the Mercer feature space.35,36,42 As long as we are only interested in kernel evaluations, however, it does not matter which one is considered.
71.
In general, all considered kernel transfer operators in this paper are compositions of compact and bounded operators and therefore compact. They admit series representations in terms of singular value decompositions as well as eigendecompositions in the self-adjoint case.43 The functional analytic details and the convergence of S^ and its spectral properties in the infinite-data limit depend on the specific scenario and are beyond the scope of this paper.
You do not currently have access to this content.