We merge computational mechanics’ definition of causal states (predictively equivalent histories) with reproducing-kernel Hilbert space (RKHS) representation inference. The result is a widely applicable method that infers causal structure directly from observations of a system’s behaviors whether they are over discrete or continuous events or time. A structural representation—a finite- or infinite-state kernel ϵ-machine—is extracted by a reduced-dimension transform that gives an efficient representation of causal states and their topology. In this way, the system dynamics are represented by a stochastic (ordinary or partial) differential equation that acts on causal states. We introduce an algorithm to estimate the associated evolution operator. Paralleling the Fokker–Planck equation, it efficiently evolves causal-state distributions and makes predictions in the original data space via an RKHS functional mapping. We demonstrate these techniques, together with their predictive abilities, on discrete-time, discrete-value infinite Markov-order processes generated by finite-state hidden Markov models with (i) finite or (ii) uncountably infinite causal states and (iii) continuous-time, continuous-value processes generated by thermally driven chaotic flows. The method robustly estimates causal structure in the presence of varying external and measurement noise levels and for very high-dimensional data.

1.
J. P.
Crutchfield
and
K.
Young
, “
Inferring statistical complexity
,”
Phys. Rev. Lett.
63
,
105
108
(
1989
).
2.
J. P.
Crutchfield
, “
Between order and chaos
,”
Nat. Phys.
8
,
17
24
(
2012
).
3.
C. R.
Shalizi
and
J. P.
Crutchfield
, “
Computational mechanics: Pattern and prediction, structure and simplicity
,”
J. Stat. Phys.
104
,
817
879
(
2001
).
4.
S. E.
Marzen
and
J. P.
Crutchfield
, “Inference, prediction, and entropy-rate estimation of continuous-time, discrete-event processes,” arxiv:2005.03750 (2020).
5.
S.
Marzen
and
J. P.
Crutchfield
, “
Structure and randomness of continuous-time discrete-event processes
,”
J. Stat. Phys.
169
(
2
),
303
315
(
2017
).
6.
C. R.
Shalizi
,
K. L.
Shalizi
, and
J. P.
Crutchfield
, “Pattern discovery in time series, part I: Theory, algorithm, analysis, and convergence,” arXiv:cs/0210025 (2002).
7.
G. M.
Goerg
and
C. R.
Shalizi
, “Mixed LICORS: A nonparametric algorithm for predictive state reconstruction,” in Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (PMLR, 2013), pp. 289–297, see https://proceedings.mlr.press/v31/goerg13a.html.
8.
A.
Rupe
,
N.
Kumar
,
V.
Epifanov
,
K.
Kashinath
,
O.
Pavlyk
,
F.
Schlimbach
,
M.
Patwary
,
S.
Maidanov
,
V.
Lee
,
M.
Prabhat
et al., “DisCo: Physics-based unsupervised discovery of coherent structures in spatiotemporal systems,” in 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) (IEEE, 2019), pp. 75–87.
9.
N.
Brodu
, “Quantifying the effect of learning on recurrent spiking neurons,” in 2007 International Joint Conference on Neural Networks (IEEE, 2007), pp. 512–517.
10.
S.
Klus
,
I.
Schuster
, and
K.
Muandet
, “
Eigendecompositions of transfer operators in reproducing kernel Hilbert spaces
,”
J. Nonlinear Sci.
30
(
1
),
283
315
(
2020
).
11.
C. C.
Strelioff
and
J. P.
Crutchfield
, “
Bayesian structural inference for hidden processes
,”
Phys. Rev. E
89
,
042119
(
2014
).
12.
S.
Marzen
and
J. P.
Crutchfield
, “
Predictive rate-distortion for infinite-order Markov processes
,”
J. Stat. Phys.
163
(
6
),
1312
1338
(
2016
).
13.
R. J.
Elliot
,
L.
Aggoun
, and
J. B.
Moore
, Hidden Markov Models: Estimation and Control, Applications of Mathematics Vol. 29 (Springer, New York, 1995).
14.
C. R.
Shalizi
and
K. L.
Klinkner
, “Blind construction of optimal nonlinear recursive predictors for discrete sequences,” in Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI 2004), edited by M. Chickering and J. Y. Halpern (AUAI Press, Arlington, VA, 2004), pp. 504–511.
15.
N.
Brodu
, “
Reconstruction of epsilon-machines in predictive frameworks and decisional states
,”
Adv. Complex Syst.
14
(
05
),
761
794
(
2011
).
16.
R. J.
Elliott
,
L.
Aggoun
, and
J. B.
Moore
,
Hidden Markov Models: Estimation and Control
(
Springer
,
New York
,
1994
).
17.
C. R.
Shalizi
,
K. L.
Shalizi
, and
R.
Haslinger
, “
Quantifying self-organization with optimal predictors
,”
Phys. Rev. Lett.
93
,
118701
(
2004
).
18.
A.
Jurgens
and
J. P.
Crutchfield
, “
Shannon entropy rate of hidden Markov processes
,”
J. Stat. Phys.
183
,
32
(
2021
).
19.
A. M.
Jurgens
and
J. P.
Crutchfield
, “
Divergent predictive states: The statistical complexity dimension of stationary, ergodic hidden Markov processes
,”
Chaos
31
(
8
),
083114
(
2021
).
20.
A.
Jurgens
and
J. P.
Crutchfield
, “Minimal embedding dimension of minimally infinite hidden Markov processes,” (unpublished) (2020).
21.
C. J.
Ellison
,
J. R.
Mahoney
, and
J. P.
Crutchfield
, “
Prediction, retrodiction, and the amount of information stored in the present
,”
J. Stat. Phys.
136
(
6
),
1005
1034
(
2009
).
22.
J. P.
Crutchfield
,
C. J.
Ellison
, and
J. R.
Mahoney
, “
Time’s barbed arrow: Irreversibility, crypticity, and stored information
,”
Phys. Rev. Lett.
103
(
9
),
094101
(
2009
).
23.
P. M.
Riechers
and
J. P.
Crutchfield
, “
Spectral simplicity of apparent complexity. II. Exact complexities and complexity spectra
,”
Chaos
28
,
033116
(
2018
).
24.
J. E.
Hanson
and
J. P.
Crutchfield
, “
Computational mechanics of cellular automata: An example
,”
Physica D
103
,
169
189
(
1997
).
25.
C. S.
McTague
and
J. P.
Crutchfield
, “
Automated pattern discovery—An algorithm for constructing optimally synchronizing multi-regular language filters
,”
Theor. Comput. Sci.
359
(
1–3
),
306
328
(
2006
).
26.
C. R.
Shalizi
,
R.
Haslinger
,
J.-B.
Rouquier
,
K. L.
Klinkner
, and
C.
Moore
, “
Automatic filters for the detection of coherent structure in spatiotemporal systems
,”
Phys. Rev. E
73
(
3
),
036104
(
2006
).
27.
A.
Rupe
and
J. P.
Crutchfield
, “
Local causal states and discrete coherent structures
,”
Chaos
28
(
7
),
075312
(
2018
).
28.
P. M.
Riechers
and
J. P.
Crutchfield
, “
Fraudulent white noise: Flat power spectra belie arbitrarily complex processes
,”
Phys. Rev. Res.
3
(
1
),
013170
(
2021
).
29.
S. E.
Marzen
and
J. P.
Crutchfield
, “
Nearly maximally predictive features and their dimensions
,”
Phys. Rev. E
95
(
5
),
051301
(
2017
).
30.
S.
Marzen
and
J. P.
Crutchfield
, “
Informational and causal architecture of continuous-time renewal processes
,”
J. Stat. Phys.
168
,
109
127
(
2017
).
31.
S.
Marzen
,
M. R.
DeWeese
, and
J. P.
Crutchfield
, “
Time resolution dependence of information measures for spiking neurons: Scaling and universality
,”
Front. Comput. Neurosci.
9
,
89
(
2015
).
32.
A.
Smola
,
A.
Gretton
,
L.
Song
, and
B.
Schölkopf
, “A Hilbert space embedding for distributions,” in
Algorithmic Learning Theory: 18th International Conference
(Springer, Berlin, Heidelberg 2007), Vol. 31, pp. 13–31.
33.
L.
Song
,
J.
Huang
,
A.
Smola
, and
K.
Fukumizu
, “Hilbert space embeddings of conditional distributions with applications to dynamical systems,” in Proceedings of the 26th Annual International Conference on Machine Learning (ACM, 2009), pp. 961–968.
34.
N.
Aronszajn
, “
Theory of reproducing kernels
,”
Trans. Am. Math. Soc.
68
(
3
),
337
404
(
1950
).
35.
B. E.
Boser
,
I. M.
Guyon
, and
V. N.
Vapnik
, “A training algorithm for optimal margin classifiers,” in Proceedings of the Fifth Annual Workshop on Computational Learning Theory (ACM, 1992), pp. 144–152.
36.
A.
Gretton
,
K. M.
Borgwardt
,
M. J.
Rasch
,
B.
Schölkopf
, and
A.
Smola
, “
A kernel two-sample test
,”
J. Mach. Learn. Res.
13
,
723
773
(
2012
).
37.
B.
Sriperumbudur
,
A.
Gretton
,
K.
Fukumizu
,
B.
Schoelkopf
, and
G.
Lanckriet
, “
Hilbert space embeddings and metrics on probability measures
,”
J. Mach. Learn. Res.
11
,
1517
1561
(
2010
).
38.
L.
Song
,
K.
Fukumizu
, and
A.
Gretton
, “
Kernel embeddings of conditional distributions: A unified kernel framework for nonparametric inference in graphical models
,”
IEEE Signal Process. Mag.
30
(
4
),
98
111
(
2013
).
39.
K.
Fukumizu
,
L.
Song
, and
A.
Gretton
, “
Kernel Bayes’ rule: Bayesian inference with positive definite kernels
,”
J. Mach. Learn. Res.
14
(
1
),
3753
3783
(
2013
).
40.
S.
Grünewälder
,
G.
Lever
,
L.
Baldassarre
,
S.
Patterson
,
A.
Gretton
, and
M.
Pontil
, “Conditional mean embeddings as regressors,” in Proceedings of the 29th International Conference on Machine Learning (ACM, 2012).
41.
B.
Schölkopf
,
R.
Herbrich
, and
A. J.
Smola
, “A generalized representer theorem,” in International Conference on Computational Learning Theory (Springer, 2001), pp. 416–426.
42.
P.
Honeine
and
C.
Richard
, “Solving the pre-image problem in kernel machines: A direct method,” in 2009 IEEE International Workshop on Machine Learning for Signal Processing (IEEE, 2009), pp. 1–6.
43.
I.
Schuster
,
M.
Attes Mollenhauer
,
S.
Klus
, and
K.
Muandet
, “Kernel conditional density operators,” in International Conference on Artificial Intelligence and Statistics (PMLR, 2020), pp. 993–1004.
44.
For Itô diffusions, these f are compactly supported and twice differentiable.
45.
K.
Jacobs
,
Stochastic Processes for Physicists: Understanding Noisy Systems
(
Cambridge University Press
,
2010
).
46.
T.
Berry
,
D.
Giannakis
, and
J.
Harlim
, “
Nonparametric forecasting of low-dimensional dynamical systems
,”
Phys. Rev. E
91
(
3
),
032915
(
2015
).
47.
It may be that S’s dimension is actually infinite so that its estimate grows with the number of samples. This can be detected using the spectral method presented in the main text.
48.
R. R.
Coifman
and
S.
Lafon
, “
Diffusion maps
,”
Appl. Comput. Harmon. Anal.
21
(
1
),
5
30
(
2006
).
49.
T.
Berry
and
J.
Harlim
, “
Variable bandwidth diffusion kernels
,”
Appl. Comput. Harmon. Anal.
40
(
1
),
68
96
(
2016
).
50.
T.
Berry
and
T.
Sauer
, “
Local kernels and the geometric structure of data
,”
Appl. Comput. Harmon. Anal.
40
(
3
),
439
469
(
2016
).
51.
R. R.
Coifman
,
S.
Lafon
,
A. B.
Lee
,
M.
Maggioni
,
B.
Nadler
,
F.
Warner
, and
S. W.
Zucker
, “
Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps
,”
Proc. Natl. Acad. Sci. U.S.A.
102
(
21
),
7426
7431
(
2005
).
52.
R. R.
Coifman
,
Y.
Shkolnisky
,
F. J.
Sigworth
, and
A.
Singer
, “
Graph Laplacian tomography from unknown random projections
,”
IEEE Trans. Image Process.
17
(
10
),
1891
1899
(
2008
).
53.
R. R.
Coifman
,
S.
Lafon
,
A. B.
Lee
,
M.
Maggioni
,
B.
Nadler
,
F.
Warner
, and
S. W.
Zucker
, “
Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods
,”
Proc. Natl. Acad. Sci. U.S.A.
102
(
21
),
7432
7437
(
2005
).
54.
P.
Drineas
and
M. W.
Mahoney
, “
On the Nyström method for approximating a Gram matrix for improved kernel-based learning
,”
J. Mach. Learn. Res.
6
,
2153
2175
(
2005
).
55.
L.
Song
,
X.
Zhang
,
A.
Smola
,
A.
Gretton
, and
B.
Schölkopf
, “Tailoring density estimation via reproducing kernel moment matching,” in Proceedings of the 25th International Conference on Machine Learning (ACM, 2008), pp. 992–999.
56.
R.
Friedrich
,
J.
Peinke
,
M.
Sahimi
, and
M. R. R.
Tabar
, “
Approaching complexity by stochastic methods: From biological systems to turbulence
,”
Phys. Rep.
506
(
5
),
87
162
(
2011
).
57.
R.
Alexander
and
D.
Giannakis
, “
Operator-theoretic framework for forecasting nonlinear time series with kernel analog techniques
,”
Physica D
409
,
132520
(
2020
).
58.
Here, DBSCAN with a threshold of 0.1 was used. However, any reasonable clustering algorithm will work given that clusters are well separated.
59.
The probabilities sum exactly to 1. We show all the transitions found.
60.
E. N.
Lorenz
, “
Deterministic nonperiodic flow
,”
J. Atmos. Sci.
20
,
130
(
1963
).
61.
J. P.
Crutchfield
and
B. S.
McNamara
, “
Equations of motion from a data series
,”
Complex Syst.
1
,
417
452
(
1987
), see https://www.complex-systems.com/abstracts/v01_i03_a03/.
62.
E. N.
Lorenz
, “Predictability: A problem partly solved,” in Proceedings of the Seminar on Predictability (ECMWF, 1996), Vol. 1.
63.
A.
Rupe
,
K.
Kashinath
,
N.
Kumar
,
V.
Lee
,
M.
Prabhat
, and
J. P.
Crutchfield
, “Towards unsupervised segmentation of extreme weather events,” arXiv:1909.07520 (2019).
64.
A.
Rupe
and
J. P.
Crutchfield
, “Spacetime autoencoders using local causal states,” in AAAI Fall Series 2020 Symposium on Physics-Guided AI for Accelerating Scientific DiscoveryarXiv:2010.05451 (2020).
You do not currently have access to this content.