The identification of diffusion processes is challenging for many real-world systems with sparsely sampled observation data. In this work, we propose a data augmentation-based sparse Bayesian learning method to identify a class of diffusion processes from sparsely sampled data. We impute latent unsampled diffusion paths between adjacent observations and construct a candidate model for the diffusion processes with the sparsity-inducing prior on model parameters. Given the augmented data and candidate model, we investigate the full joint posterior distribution of all the parameters and latent diffusion paths under a Bayesian learning framework. We then design a Markov chain Monte Carlo sampler with non-degenerate acceptance probability on system dimension to draw samples from the posterior distribution to estimate the parameters and latent diffusion paths. Particularly, the proposed method can handle sparse data that are regularly or irregularly sampled in time. Simulations on the well-known Langevin equation, homogeneous diffusion in a symmetric double-well potential, and stochastic Lotka–Volterra equation demonstrate the effectiveness and considerable accuracy of the proposed method.

1.
C.
Archambeau
,
D.
Cornford
,
M.
Opper
, and
J.
Shawe-Taylor
, “
Gaussian process approximations of stochastic differential equations
,” in
Machine Learning Research Workshop and Conference Proceedings
(PMLR 2007), pp.
1
16
.
2.
M. D.
Vrettas
,
D.
Cornford
, and
M.
Opper
, “
Estimating parameters in stochastic systems: A variational Bayesian approach
,”
Physica D
240
,
1877
1900
(
2011
).
3.
R.
Calif
, “
PDF models and synthetic model for the wind speed fluctuations based on the resolution of Langevin equation
,”
Appl. Energy
99
,
173
182
(
2012
).
4.
J.
Hull
,
Options, Futures, and Other Derivatives
(
Pearson
,
Boston
,
2015
).
5.
G.
Rigas
,
A. S.
Morgans
,
R.
Brackston
, and
J. F.
Morrison
, “
Diffusive dynamics and stochastic models of turbulent axisymmetric wakes
,”
J. Fluid Mech.
778
,
R2
(
2015
).
6.
K. L.
Chong
,
J.-Q.
Shi
,
G.-Y.
Ding
,
S.-S.
Ding
,
H.-Y.
Lu
,
J.-Q.
Zhong
, and
K.-Q.
Xia
, “
Vortices as Brownian particles in turbulent flows
,”
Sci. Adv.
6
,
eaaz1110
(
2020
).
7.
S. H.
Rudy
,
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
, “
Data-driven discovery of partial differential equations
,”
Sci. Adv.
3
,
e1602614
(
2017
).
8.
K.
Champion
,
B.
Lusch
,
J. N.
Kutz
, and
S. L.
Brunton
, “
Data-driven discovery of coordinates and governing equations
,”
Proc. Natl. Acad. Sci. U.S.A.
116
,
22445
22451
(
2019
).
9.
Y.
Wang
,
H.
Fang
,
J.
Jin
et al., “
Data-driven discovery of stochastic differential equations
,”
Engineering
17
,
244
252
(
2022
).
10.
M.
Quade
,
M.
Abel
,
J.
Nathan Kutz
, and
S. L.
Brunton
, “
Sparse identification of nonlinear dynamics for rapid model recovery
,”
Chaos
28
,
063116
(
2018
).
11.
S.
Chen
,
S. A.
Billings
, and
W.
Luo
, “
Orthogonal least squares methods and their application to non-linear system identification
,”
Int. J. Control
50
,
1873
1896
(
1989
).
12.
K.
Li
,
J. X.
Peng
, and
G. W.
Irwin
, “
A fast nonlinear model identification method
,”
IEEE Trans. Autom. Control
50
,
1211
1216
(
2005
).
13.
R.
Tibshirani
, “
Regression shrinkage and selection via the Lasso
,”
J. R. Stat. Soc. Ser. B-Stat. Methodol.
58
,
267
288
(
1996
).
14.
B.
Efron
,
T.
Hastie
,
I.
Johnstone
, and
R. R.
Tibshirani
, “
Least angle regression
,”
Ann. Math. Stat.
32
,
407
499
(
2004
).
15.
K.
Li
,
J. X.
Peng
, and
E. W.
Bai
, “
A two-stage algorithm for identification of nonlinear dynamic systems
,”
Automatica
42
,
1189
1197
(
2006
).
16.
L.
Zhang
and
K.
Li
, “
Forward and backward least angle regression for nonlinear system identification
,”
Automatica
53
,
94
102
(
2015
).
17.
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
, “
Discovering governing equations from data by sparse identification of nonlinear dynamical systems
,”
Proc. Natl. Acad. Sci. U.S.A.
113
,
3932
3937
(
2016
).
18.
M. E.
Tipping
, “
Sparse Bayesian learning and the relevance vector machine
,”
J. Mach. Learn. Res.
1
,
211
244
(
2001
).
19.
W.
Pan
,
Y.
Yuan
,
J.
Gonçalves
, and
G. B.
Stan
, “
A sparse Bayesian approach to the identification of nonlinear state-space systems
,”
IEEE Trans. Autom. Control
61
,
182
187
(
2015
).
20.
D. P.
Wipf
and
B. D.
Rao
, “
Sparse Bayesian learning for basis selection
,”
IEEE Trans. Signal Process.
52
,
2153
2164
(
2004
).
21.
W. R.
Jacobs
,
T.
Baldacchino
,
T.
Dodd
, and
S. R.
Anderson
, “
Sparse Bayesian nonlinear system identification using variational inference
,”
IEEE Trans. Autom. Control
63
,
4172
4187
(
2018
).
22.
Y.
Yuan
,
X.
Tang
,
W.
Zhou
,
W.
Pan
,
X.
Li
,
H. T.
Zhang
,
H.
Ding
, and
J.
Goncalves
, “
Data driven discovery of cyber physical systems
,”
Nat. Commun.
10
,
4894
(
2019
).
23.
W.
Zhou
,
Y.
Wu
,
X.
Huang
,
R.
Lu
, and
H. T.
Zhang
, “
A group sparse Bayesian learning algorithm for harmonic state estimation in power systems
,”
Appl. Energy
306
,
118063
(
2022
).
24.
Y.
Rajabzadeh
,
A. H.
Rezaie
, and
H.
Amindavar
, “
A robust nonparametric framework for reconstruction of stochastic differential equation models
,”
Physica A
450
,
294
304
(
2016
).
25.
L.
Boninsegna
,
F.
Nüske
, and
C.
Clementi
, “
Sparse learning of stochastic dynamical equations
,”
J. Chem. Phys.
148
,
241723
(
2018
).
26.
J. L.
Callaham
,
J. C.
Loiseau
,
G.
Rigas
, and
S. L.
Brunton
, “
Nonlinear stochastic modelling with Langevin regression
,”
Proc. R. Soc. A-Math. Phys. Eng. Sci.
477
,
20210092
(
2021
).
27.
B.
Eraker
, “
MCMC analysis of diffusion models with application to finance
,”
J. Bus. Econ. Stat.
19
,
177
191
(
2001
).
28.
G. O.
Roberts
and
O.
Stramer
, “
On inference for partially observed nonlinear diffusion models using the Metropolis–Hastings algorithm
,”
Biometrika
88
,
603
621
(
2001
).
29.
A.
Golightly
and
D. J.
Wilkinson
, “
Bayesian inference for nonlinear multivariate diffusion models observed with error
,”
Comput. Stat. Data Anal.
52
,
1674
1693
(
2008
).
30.
F.
van der Meulen
and
M.
Schauer
, “
Bayesian estimation of incompletely observed diffusions
,”
Stochastics
90
,
641
662
(
2018
).
31.
A.
Ruttor
,
P.
Batz
, and
M.
Opper
, “Approximate gaussian process inference for the drift function in stochastic differential equations,” in International Conference on Neural Information Processing Systems (Curran Associates, Inc., 2013).
32.
P. E.
Kloeden
and
E.
Platen
,
Numerical Solution of Stochastic Differential Equations
(
Springer
,
1999
).
33.
A.
Beskos
,
O.
Papaspiliopoulos
,
G. O.
Roberts
, and
P.
Fearnhead
, “
Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion)
,”
J. R. Stat. Soc. Ser. B-Stat. Methodol.
68
,
333
382
(
2006
).
34.
C.
Archambeau
,
M.
Opper
,
Y.
Shen
,
D.
Cornford
, and
J.
Shawe-Taylor
, “Variational inference for diffusion processes,” in International Conference on Neural Information Processing Systems (Curran Associates, Inc., 2007).
35.
C. A.
Garcia
,
A.
Otero
,
P.
Felix
,
J.
Presedo
, and
D. G.
Marquez
, “
Nonparametric estimation of stochastic differential equations with sparse Gaussian processes
,”
Phys. Rev. E
96
,
022104
(
2017
).
36.
D.
Wipf
and
S.
Nagarajan
, “A new view of automatic relevance determination,” in International Conference on Neural Information Processing Systems (Curran Associates, Inc., 2007).
37.
J.
Jin
,
Y.
Yuan
, and
J.
Gonçalves
, “
High precision variational Bayesian inference of sparse linear networks
,”
Automatica
118
,
109017
(
2020
).
38.
W.
Zhou
,
H. T.
Zhang
, and
J.
Wang
, “
An efficient sparse Bayesian learning algorithm based on Gaussian-scale mixtures
,”
IEEE Trans. Neural Networks Learn. Syst.
33
,
3065
3078
(
2021
).
39.
D. A. V.
Dyk
and
T.
Park
, “
Partially collapsed Gibbs samplers: Theory and methods
,”
J. Am. Stat. Assoc.
103
,
790
796
(
2008
).
40.
D. A. V.
Dyk
and
X.
Jiao
, “
Metropolis–Hastings within partially collapsed Gibbs samplers
,”
J. Comput. Graph. Stat.
24
,
301
327
(
2015
).
41.
T.
Baldacchino
,
S. R.
Anderson
, and
V.
Kadirkamanathan
, “
Computational system identification for Bayesian narmax modelling
,”
Automatica
49
,
2641
2651
(
2013
).
42.
J.
Jin
,
Y.
Yuan
, and
J.
Gonçalves
, “
A full bayesian approach to sparse network inference using heterogeneous datasets
,”
IEEE Trans. Autom. Control
66
,
3282
3288
(
2020
).
43.
J.
Bernardo
, M.
Bayarri
,
J.
Berger
,
A.
Dawid
,
D.
Heckerman
,
A.
Smith
, and
M.
West
, “
Non-centered parameterisations for hierarchical models and data augmentation
,” in
Bayesian Statistics 7: Proceedings of the Seventh Valencia International Meeting
(Oxford University Press, 2003), pp.
307
326
.
44.
B.
Ninness
and
S.
Henriksen
, “
Bayesian system identification via Markov chain Monte Carlo techniques
,”
Automatica
46
,
40
51
(
2010
).
45.
Dataset:
Y.
Wang
(
2023
). “Data augmentation-based statistical inference of diffusion processes,”
Github.
https://github.com/ArthinYS/Data-augmentation-based-sparse-Bayesian-learning/tree/main.
You do not currently have access to this content.