A semiparametric methodology for reconstructing Markovian and non-Markovian Langevin equations from time series data using unscented Kalman filtering is introduced and explored. The drift function and the logarithm of the diffusion function are expanded into sets of polynomial basis functions. In contrast to the more common state augmentation approach, the Kalman filter is here used only for state estimation and propagation; the model parameters are determined by maximum likelihood based on the predictive distribution generated by the Kalman filter. Model selection regarding the number of included drift and diffusion basis functions is performed with the Bayesian information criterion. The method is successfully validated on various simulated datasets with known system dynamics; it achieves accurate identification of drift and diffusion functions, also outside the prescribed model class, from datasets of moderate length with medium computational cost.

The oldest and most established scientific modeling strategy is the physics-based or forward or direct modeling approach. Dynamical models are built from the known governing equations of motion of the system derived from first principles. A second and complementary approach is offered by data-driven or inverse modeling; here, dynamical models are extracted from datasets of the underlying system using statistical techniques. Data-driven modeling is useful if the equations of motion of the system are not readily known and, in particular, if some sort of coarse-graining is involved, that is, if the goal is to derive reduced or effective equations modeling particular macroscopic variables or modes of the system. Such equations will then naturally have a stochastic nature and incorporate memory effects.

The time evolution of many dynamical systems in physics, chemistry, biology, neuroscience, climate science, or finance can be well described by stochastically forced, first-order differential equations (see Refs. 1–3 for an overview). For a scalar process x ( t ), such a stochastic differential equation (SDE) or Langevin equation reads
(1)
where f ( x ) is the drift function, g 2 ( x ) is the diffusion function, and W denotes a standard Wiener process or Brownian motion. In this paper, we assume that the drift and diffusion functions do not depend explicitly on time, that is, we consider only stationary or autonomous processes. The drift function describes the deterministic internal dynamics of the macroscopic variable x. The influence of the unresolved microscopic variables on x is summarized in the noise term; there is the implicit assumption that the time scales of the microscopic variables are very much smaller than the time scale of x, leading to the white noise formulation. The process x ( t ) governed by Eq. (1) possesses the Markov property.

The reconstruction of the drift and diffusion functions in Eq. (1) from data is a non-trivial estimation problem. Possible approaches include the direct estimation method,4,5 maximum likelihood estimation,6,7 Kalman filtering8,9 and Markov chain Monte Carlo methods10 in a Bayesian framework. All these techniques have advantages and disadvantages regarding issues such as finite sampling interval, partial observation of the system, observational noise, and computation time.

For real-world systems, the assumption of white driving noise is always an idealization. Many systems are not purely Markovian but have a finite Markov–Einstein time scale, and a Markovian description is only valid for increments over time intervals larger than that time scale.3,11 This motivates SDEs driven by correlated or colored noise. The simplest realization of such a noise process is an Ornstein–Uhlenbeck process, which leads to a non-Markovian Langevin model described by the two SDEs,
(2)
(3)
with γ > 0 and ν > 0. The noise process has expectation value E ( y ) = 0 and variance Var ( y ) = ν 2 / 2 γ; the autocorrelation function is ρ ( τ ) = e γ | τ |, and the correlation time is T y = 1 / γ. We here define the correlation time of a scalar variable with autocorrelation function ρ as
(4)
In the limit T y 0, the standard Langevin model with white driving noise is recovered. More precisely, it follows from the Wong–Zakai theorem12 that in the limit γ and ν with γ = ν, the solutions of Eq. (2) converge to the solutions of the Stratonovich SDE,
(5)
or equivalently the Itô SDE,
(6)

Estimating the drift and diffusion functions in a non-Markovian Langevin equation from time series data of only x is an even more challenging task than reconstructing a Markovian Langevin equation. An extension of the direct estimation method was proposed;13 it is based on a perturbation approach, admits a non-zero correlation time of the driving noise, but still needs it to be much smaller than the correlation time of x. A direct estimation method without this restriction was introduced recently.14 However, the proposed optimization procedure does not appear to be straightforward; a complicated regularization guided by trial and error for each example system is necessary. Moreover, very large datasets are required for the method to be applicable.

This study introduces and explores a methodology for reconstructing drift and diffusion functions in Markovian and non-Markovian SDEs based on the unscented Kalman filter. The estimation is semiparametric as the drift and diffusion functions are expanded into nested sets of basis functions of increasing complexity. Unlike previous work using the state augmentation technique for parameter estimation,8,9,15,16 the Kalman filter is here only used for state estimation and propagation of the probability density of the state. The model parameters are determined by maximizing the likelihood of the observations based on the predictive probability density generated by the Kalman filter.

The remainder of the paper is organized as follows: The method is described in detail in Sec. II. In Sec. III, the example systems used to validate the method and their properties are illustrated. Then, the results are given and discussed in Sec. IV. Section V concludes.

Given a time series of the variable x of length N, { x n } n = 0 N, observed at sampling interval δ t the goal is to estimate the drift and diffusion functions of the underlying Langevin equation, driven by white noise [Eq. (1)] or correlated noise [Eqs. (2) and (3)]. In the correlated noise case, as long as y is not observed, its amplitude cannot be determined. The process x ( t ) remains invariant when rescaling the diffusion function as g ~ ( x ) = C g ( x ) with some constant C > 0 and at the same time rescaling the noise strength of the Ornstein–Uhlenbeck process as ν ~ = ν / C. One way to eliminate this ambiguity is to set the variance of the driving noise process to unity. Hence, for the estimation, the non-Markovian Langevin model is formulated as
(7)
(8)
with only the correlation time parameter γ left in the equation for y.
The drift function and the logarithm of the diffusion function are each expanded into a set of basis functions as
(9)
with expansion coefficients { A i } i = 1 I and
(10)
with expansion coefficients { B j } j = 1 J. The log-transform for the diffusion function naturally ensures positivity while having g ( x ) and g 2 ( x ) in the same model class. The numbers of basis functions I and J describe the complexity of the model and are hyperparameters of the method.
The drift and diffusion basis functions are here canonically chosen as polynomials, and the flexible notation
(11)
and
(12)
is adopted, where I and J are sets of distinct non-negative integers with | I | = I and | J | = J. One can use all powers up to a certain degree or use only particular powers if there is prior knowledge about the underlying system supporting this, for example, symmetries in the drift and/or diffusion function.
For the inference, actual empirical orthonormal polynomials are constructed. For two functions h 1 and h 2, the empirical scalar product over the dataset
(13)
is introduced. Starting from the set of monomials { x i } i I, the Gram–Schmidt orthogonalization procedure is applied to generate a set of drift basis functions { ϕ i } i = 1 I satisfying
(14)
analogously, the set of monomials { x j } j J is turned into the set of diffusion basis functions { ψ j } j = 1 J with
(15)
The empirical orthonormal polynomials now are the basis functions for the expansions of Eqs. (9) and (10). The orthonormalization of the basis functions significantly improves the condition of the estimation problem and accelerates the maximization of the log-likelihood function by generating less stiff optimization paths in parameter space (see Subsection. II C). It has no influence on the reconstructed drift and diffusion functions as the expansions of Eqs. (9) and (10) and, in particular, Eqs. (11) and (12) are obviously invariant under a linear transformation applied to the set of basis functions when applying at the same time the inverse transformation to the vector of expansion coefficients.

Depending on the domain of the system, the method could be extended by using also other types of basis functions such as spline, trigonometric, logarithmic, or rational functions which would then also be included in the orthonormalization, but this will not be discussed here. A similar strategy of combining different types of basis functions was recently applied in the context of maximum likelihood probability density estimation.17 

In the non-Markovian case, the correlation time parameter γ is also to be estimated. It is represented as γ = e η to guarantee positivity. The total parameter vector of the Langevin model is ω = ( A 1 , , A I , B 1 , , B J ) T in the white noise case and ω = ( A 1 , , A I , B 1 , , B J , η ) T in the correlated noise case. There are no constraints on any of the parameters. When quoting values of expansion coefficients, we always refer to the parameters { α i } i I and { β j } j J of the monomial expansions in Eqs. (11) and (12), which are linked by a linear transformation to the parameters { A i } i = 1 I and { B j } j = 1 J.

The unscented Kalman filter18,19 (UKF) is used for state estimation and propagation of state probability densities in the Langevin models. The UKF allows for recursive estimation of unobserved states in stochastic nonlinear systems from incomplete observations. It keeps the full nonlinear system dynamics rather than linearizing it but truncates the filter probability density to a Gaussian in each step by only propagating means and covariances.

We put the Langevin models in the framework of a continuous-discrete nonlinear state space model,
(16)
(17)
The dynamical equation [Eq. (16)] is a set of SDEs for the m-dimensional state vector z = ( z 1 , , z m ) T R m × 1 with drift vector μ R m × 1, diffusion matrix Σ R m × m, and W R m × 1 being a vector of independent standard Wiener processes. The case m = 1 refers to the white noise model of Eq. (1); we have the state z = z 1 = x, the drift μ ( z ) = f ( z 1 ), and the diffusion Σ ( z ) = g 2 ( z 1 ). The case m = 2 refers to the correlated noise model of Eqs. (7) and (8); we have the state vector z = ( z 1 , z 2 ) T = ( x , y ) T, the drift vector
and the diffusion matrix
The observation equation [Eq. (17)] with the observation operator O R 1 × m describes the (here noise-free) observation of the first component of the system at discrete times, where n is the time index of the time series { x n } n = 0 N. For m = 1, the observation operator is O = 1; for m = 2, it is O = ( 1 , 0 ). Here and in the following, we use bold notation for vectors and matrices; also when in the case m = 1, they are in fact scalars.
The UKF propagates estimates of the system state z and the associated uncertainty between observations and updates them when reaching a new observation. For n = 1 , , N, let z ^ n 1 | n 1 be the mean estimate of the state vector and P n 1 | n 1 its covariance matrix at time step n 1 having processed all data up to and including time step n 1. The filter density is represented by a small number of so-called sigma points that are propagated through the full nonlinear dynamical equations. The sampling interval of size δ t is divided into L equal subintervals of size h = δ t / L, and a sequence of estimates { z ^ l , P l } l = 0 L is generated. Having arrived at z ^ l 1 and P l 1, we use 2 m sigma points, { z l 1 | l 1 ( i ) } i = 1 2 m, given as { z ^ l 1 w l 1 ( j ) , z ^ l 1 + w l 1 ( j ) } j = 1 m. The vectors { w l 1 ( j ) } j = 1 m are the columns of A, where A can be any matrix satisfying A A T = m P l 1. Here, we calculate A using the Cholesky decomposition of P l 1. The sigma points are propagated as
(18)
and new mean and covariance estimates are given by
(19)
and
(20)
The sequence is initialized with z ^ 0 = z ^ n 1 | n 1 and P 0 = P n 1 | n 1. We set z ^ n | n 1 = z ^ L and P n | n 1 = P L.
Then, the estimates of the states and their uncertainties are updated according to the Kalman update equations using the new observation,
(21)
(22)
Here,
(23)
is the innovation or residual,
(24)
is the Kalman gain matrix, and
(25)
is the predicted residual variance. The variance in the observational noise is denoted by R; it is zero here but is actually set to a very small positive number, say, R = 10 14 to guarantee that P n | n is positive definite for the Cholesky decomposition not to break down.
At the first data point, x 0, the Kalman filter is initialized as z ^ 0 | 0 = x 0 and P 0 | 0 = R for the white noise model as well as z ^ 0 | 0 = ( x 0 , 0 ) T and
for the correlated noise model.
The UKF generates a probabilistic prediction for each observation based on all previous data. The predictive probability density for the residual ζ n is actually a Gaussian with zero mean and variance S n. Hence, the log-likelihood function of the dataset is
(26)
with
(27)
depending on the model parameters ω. The negative log-likelihood function is minimized numerically with respect to the model parameters using standard iterative techniques for unconstrained optimization. Here, a quasi-Newton method is used as implemented in the MATLAB in-built routine fminunc. A robust initial guess to start the optimization is to set all parameters to zero, but random initializations could also be considered to explore possible multiple local maxima of the likelihood function. The parameters { A i } i = 1 I, { B j } j = 1 J, and η have different roles in the models and scale differently with respect to the log-likelihood function. It, therefore, turns out to speed up the optimization when performing it in turns with respect to { A i } i = 1 I, { B j } j = 1 J, and η for the correlated noise model while keeping the other variables constant. Only in the close vicinity of the solution, we switch to an optimization with respect to all variables at the same time to exploit correlations between the different types of variables and to calculate the Hessian matrix at the solution needed for the uncertainty estimates (see Sec. II D).
Model selection over the number of drift and diffusion basis functions is performed using the Bayesian information criterion20 (BIC), which is here given as
(28)
where l ^ is the log-likelihood function at the maximum likelihood parameter estimate ω ^ and p is the number of model parameters, which is p = I + J in the white noise case and p = I + J + 1 in the correlated noise case.
The observed Fisher information matrix of the estimation problem is
(29)
where H ^ denotes the Hessian of the log-likelihood function at the maximum likelihood solution ω ^. Hence, under the assumption of independent data points, for large N, parameter errors are normally distributed with zero mean and covariance matrix,
(30)
However, this assumption is clearly violated here, in particular, for the correlated noise model. We, therefore, implement an improved uncertainty estimate based on an empirical covariance inflation technique17,21 to account for serial correlation in the time series data. The general expression for the error covariance matrix is
(31)
with V = E ω ^ ( G G T ) and G = ω l, where ω is here defined as a column vector. In the case of independent data, we have V = J and recover the result above. A block length l b is chosen such that data in disjoint blocks of l b consecutive data points can be considered independent. We assume that N is an integer multiple of l b. The variables
(32)
are formed for i = 1 , , N / l b. The covariance matrix of G ¯ i is estimated empirically from the sample { G ¯ i } i = 1 N / l b, and we obtain
(33)
If N is not an integer multiple of l b, the number of blocks is [ N / l b ], the largest integer smaller or equal N / l b. Then, V is multiplied by N / ( [ N / l b ] l b ) to get the final estimate. The derivatives of the likelihood function with respect to the parameters are here calculated using a standard second-order centered finite-difference scheme.

An ensemble of size M of parameter vectors consistent with the parameter uncertainty is generated as { ω ^ + L v i } i = 1 M, where v i are column vectors of length p of independent standard normal random variables and L is obtained from the Cholesky decomposition C = L L T. This ensemble converts to an ensemble of drift and diffusion functions from which confidence intervals can be constructed. In this study, ensembles of size M = 1000 are used to calculate 90% confidence intervals, delimited by the 5% and 95% quantiles.

A possible ill-conditioning or ill-posedness of the parameter estimation problem can be monitored with the (spectral) condition number of J given as κ = λ max / λ min, where λ max and λ min are the largest and smallest eigenvalues of the symmetric and positive definite matrix J.

The estimation methodology is explored and validated on a couple of simulated datasets from prototype example systems with known dynamics of increasing complexity. Similar systems were discussed in Ref. 14.

For each system, first the model selection is performed by increasing the number of drift and diffusion basis functions in steps of one and finding the model with the smallest BIC. Then, the reconstructed drift and diffusion functions for the chosen model are examined. Additionally, we look at some important summary statistics in long-term integrations of the true and the reconstructed model: the (stationary) probability density function of the process x ( t ), the probability density function of the increment process x ( t + δ t ) x ( t ), and the autocorrelation function of x ( t ). These summary statistics are calculated from very long time series of 4 × 10 6 data points at a sampling interval of 0.025. The probability densities are estimated using a recently proposed semiparametric maximum likelihood estimator17 unless they are available analytically. The correlation times of the systems are estimated by evaluating the integral of Eq. (4) from the sample autocorrelation function using the (composite) trapezoidal quadrature rule. In practice, a finite upper limit of the integral has to be adopted. As the sample autocorrelation function is rather noisy at large time lags, even if calculated from a very long time series, we monitor the correlation time as a function of the upper limit and look for a plateau.

The (stationary) probability density of the white noise model of Eq. (1) is known to be given as1,2
(34)
where the normalization constant is calculated by numerical integration using, for example, the (composite) Simpson rule or analytically, if possible. The probability density of x in the correlated noise model of Eqs. (2) and (3) or Eqs. (7) and (8) cannot be given in simple form.

We first consider a linear model with strongly multiplicative noise.

1. White noise

In the white noise case, the governing equation is
(35)
with a > 0, b > 0, and c > 0. Still, as in the case of additive noise, that is, the Ornstein–Uhlenbeck process, the mean state of the system is at the stable fixed point of the linear drift term, E ( x ) = x 0, the autocorrelation function is ρ ( τ ) = e a | τ |, and the correlation time is T x = 1 / a. The probability density is derived as
(36)

The parameter values a = 1, x 0 = 1, b = 1, c = 1, and x = 2 are chosen here. The system exhibits a markedly positively skewed probability density; the mode of the distribution is below the mean value, and there is a long right tail in the region of high noise level. The correlation time of the system is T x = 1. The reconstruction is based on a time series of length N = 5000 with sampling interval δ t = 0.1.

2. Correlated noise

The corresponding non-Markovian Langevin model is
(37)
(38)
The parameter setting is a = 2, x 0 = 1, b = 1, c = 1, x = 2, and γ = 1 / 5. Again, we observe a positively skewed probability density. Now also the mean state of the system is shifted. The correlation times are T x = 5.4 and T y = 1 / γ = 5; that is, they are almost equal, and there is no time scale separation. A dataset of length N = 5000 with sampling interval δ t = 0.2 is used for the estimation.

Next, we examine a nonlinear bistable system with parabolic diffusion function. The drift function has odd and the diffusion function has even symmetry.

1. White noise

For white noise, the Langevin equation is
(39)
with a > 0, c > 0, and d > 0. For b < 0, the deterministic dynamics are bistable with two stable fixed points at x 0 ( 1 , 2 ) = ± b / a and an unstable fixed point at x 0 ( 3 ) = 0; otherwise, they are monostable with a stable fixed point at x 0 = 0. The probability density is derived as
(40)
with ξ = a c / d 2 b / d 1. It is bimodal only if b + d < 0 with the modes located at x m ( 1 , 2 ) = ± ( b + d ) / a and a local minimum at x m ( 3 ) = 0; otherwise, it is unimodal with the mode located at x m = 0.

The parameter setting here is a = 1, b = 1, c = 0.8, and d = 0.25. The dynamics are bistable with stable fixed points at x 0 ( 1 , 2 ) = ± 1; the probability density is bimodal with the modes at x m ( 1 , 2 ) ± 0.87. The correlation time is T x = 2.3. A time series of length N = 10 000 at a sampling interval of δ t = 0.2 is used for the reconstruction.

2. Correlated noise

The model equations for correlated noise are
(41)
(42)
The parameter values a = 1, b = 1, c = 0.8, d = 0.25, and γ = 1 are chosen. The shift of the modes of the probability density away from the stable fixed points is now even more prominent. The correlation times are T x = 4.6 and T y = 1 / γ = 1. Again, a time series of length N = 10 000 with a sampling interval of δ t = 0.2 is used for the system reconstruction.

Finally, we consider the Rayleigh process, here extended to a setting with multiplicative and correlated noise. This system has a non-polynomial drift function.

1. White noise

The Markovian Langevin equation is
(43)
with a > 0, b > 0, and c > 0. The system is restricted to the domain x > 0. The deterministic part of the system has a stable fixed point at x 0 = a / b. The probability density is a special case of the generalized inverse Gaussian distribution,
(44)
where K 0 is a modified Bessel function of the second kind. The density is unimodal with the mode located at x m = ( c 4 + 16 a b c 2 ) / ( 4 b ).

The parameter setting a = 1, b = 1, and c = 1 is adopted here. The system has a stable fixed point at x 0 = 1; the mode of the probability density is at x m 0.78. The correlation time of the system is T x = 0.6. A dataset of length N = 10 000 with sampling interval δ t = 0.05 is used for the estimation.

2. Correlated noise

The model with correlated noise is
(45)
(46)
with parameter values a = 1, b = 1, c = 1, and γ = 1 / 2. The system displays strongly intermittent behavior as can be seen from the very heavy tails of the probability density of the increment process (see below). The correlation times are T x = 2.6 and T y = 1 / γ = 2. The system reconstruction is based on a time series of length N = 10 000 at a sampling interval of δ t = 0.2.

The model selection is illustrated in Tables IVIII. The log-likelihood function at the maximum likelihood solution ω ^ and the BIC are listed when successively increasing the number of drift and diffusion basis functions. The values are rounded to integers as more precision is not needed here. For all example systems, the sampling interval δ t is divided into L = 10 subintervals for propagating the probability densities of the states with the UKF (see Sec. II B).

For all example systems with a polynomial drift function (A.1, A.2, B.1, and B.2) the correct drift model is clearly chosen. The rational drift function of the Rayleigh process is approximated with a third-order polynomial in the white noise case and a fourth-order polynomial in the correlated noise case.

For all example systems, the true diffusion function lies outside the model classes covered by semiparametric modeling. For systems A.1, A.2, B.1, and B.2, the logarithm of the diffusion function can be well approximated on the data range by a quadratic polynomial; for system C.1 a fourth-order, and for system C.2, a fifth-order polynomial is chosen.

In the examples B.1 and B.2, the pattern of symmetry in both drift and diffusion is clearly visible (Tables III and V). A significant model improvement is only achieved when adding a basis function with the correct symmetry. This suggests rerunning the model selection with only odd drift and even diffusion basis functions, which leads to more parsimonious models (Tables IV and VI).

TABLE I.

Model selection for example system A.1: Log-likelihood function and Bayesian information criterion (BIC) for various numbers of drift and diffusion basis functions. The model with minimum BIC is indicated in bold.

J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4}
{0}  818/−1619  1124/−2222  1143/−2252  1145/−2247  1145/−2239 
{0, 1}  942/−1858  1221/−2409  1259/−2476  1260/−2470  1260/−2461 
{0, 1, 2}  982/−1931  1221/−2400  1259/−2468  1260/−2461  1260/−2453 
{0, 1, 2, 3}  1003/−1963  1234/−2416  1260/−2460  1260/−2453  1261/−2444 
J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4}
{0}  818/−1619  1124/−2222  1143/−2252  1145/−2247  1145/−2239 
{0, 1}  942/−1858  1221/−2409  1259/−2476  1260/−2470  1260/−2461 
{0, 1, 2}  982/−1931  1221/−2400  1259/−2468  1260/−2461  1260/−2453 
{0, 1, 2, 3}  1003/−1963  1234/−2416  1260/−2460  1260/−2453  1261/−2444 
TABLE II.

As Table I but for example system A.2.

J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4}
{0}  11 187/−22 349  11 456/−22 877  11 474/−22 906  11 485/−22 918  11 486/−22 912 
{0, 1}  11 243/−22 453  11 503/−22 964  11 539/−23 026  11 539/−23 019  11 539/−23 011 
{0, 1, 2}  11 333/−22 623  11 504/−22 956  11 539/−23 018  11 539/−23 011  11 539/−23 002 
{0, 1, 2, 3}  11 340/−22 628  11 527/−22 995  11 539/−23 010  11 540/−23 003  11 540/−22 994 
J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4}
{0}  11 187/−22 349  11 456/−22 877  11 474/−22 906  11 485/−22 918  11 486/−22 912 
{0, 1}  11 243/−22 453  11 503/−22 964  11 539/−23 026  11 539/−23 019  11 539/−23 011 
{0, 1, 2}  11 333/−22 623  11 504/−22 956  11 539/−23 018  11 539/−23 011  11 539/−23 002 
{0, 1, 2, 3}  11 340/−22 628  11 527/−22 995  11 539/−23 010  11 540/−23 003  11 540/−22 994 
TABLE III.

As Table I but for example system B.1.

J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4}
{0}  −5365/10 749  −5364/10 756  −5341/10 718  −5340/10 726  −5308/10 672 
{0, 1}  −5100/10 228  −5099/10 235  −5091/10 229  −5090/10 236  −5060/10 184 
{0, 1, 2}  −5098/10 233  −5097/10 241  −5089/10 234  −5089/10 243  −5057/10 188 
{0, 1, 2, 3}  −4769/9584  −4769/9593  −4744/9552  −4744/9561  −4742/9568 
{0, 1, 2, 3, 4}  −4768/9592  −4768/9601  −4743/9560  −4743/9569  −4742/9576 
J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4}
{0}  −5365/10 749  −5364/10 756  −5341/10 718  −5340/10 726  −5308/10 672 
{0, 1}  −5100/10 228  −5099/10 235  −5091/10 229  −5090/10 236  −5060/10 184 
{0, 1, 2}  −5098/10 233  −5097/10 241  −5089/10 234  −5089/10 243  −5057/10 188 
{0, 1, 2, 3}  −4769/9584  −4769/9593  −4744/9552  −4744/9561  −4742/9568 
{0, 1, 2, 3, 4}  −4768/9592  −4768/9601  −4743/9560  −4743/9569  −4742/9576 
TABLE IV.

As Table III but with only odd drift basis functions and only even diffusion basis functions.

J
I {0} {0, 2} {0, 2, 4}
{1}  −5102/10 222  −5094/10 215  −5063/10 164 
{1, 3}  −4771/9569  −4745/9527  −4744/9534 
{1, 3, 5}  −4770/9577  −4744/9534  −4744/9543 
J
I {0} {0, 2} {0, 2, 4}
{1}  −5102/10 222  −5094/10 215  −5063/10 164 
{1, 3}  −4771/9569  −4745/9527  −4744/9534 
{1, 3, 5}  −4770/9577  −4744/9534  −4744/9543 
TABLE V.

As Table I but for example system B.2.

J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4}
{0}  10 635/−21 243  10 636/−21 235  10 662/−21 278  10 662/−21 269  10 665/−21 267 
{0, 1}  10 746/−21 455  10 747/−21 448  10 906/−21 757  10 907/−21 750  10 940/−21 806 
{0, 1, 2}  10 751/−21 455  10 751/−21 447  10 908/−21 752  10 908/−21 743  10 942/−21 801 
{0, 1, 2, 3}  11 476/−22 897  11 476/−22 888  11 520/−22 966  11 520/−22 957  11 520/−22 948 
{0, 1, 2, 3, 4}  11 476/−22 888  11 476/−22 878  11 520/−22 957  11 520/−22 948  11 520/−22 939 
J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4}
{0}  10 635/−21 243  10 636/−21 235  10 662/−21 278  10 662/−21 269  10 665/−21 267 
{0, 1}  10 746/−21 455  10 747/−21 448  10 906/−21 757  10 907/−21 750  10 940/−21 806 
{0, 1, 2}  10 751/−21 455  10 751/−21 447  10 908/−21 752  10 908/−21 743  10 942/−21 801 
{0, 1, 2, 3}  11 476/−22 897  11 476/−22 888  11 520/−22 966  11 520/−22 957  11 520/−22 948 
{0, 1, 2, 3, 4}  11 476/−22 888  11 476/−22 878  11 520/−22 957  11 520/−22 948  11 520/−22 939 
TABLE VI.

As Table V but with only odd drift basis functions and only even diffusion basis functions.

J
I {0} {0, 2} {0, 2, 4}
{1}  10 745/−21 462  10 906/−21 774  10 940/−21 833 
{1, 3}  11 475/−22 913  11 518/−22 990  11 518/−22 982 
{1, 3, 5}  11 481/−22 916  11 519/−22 983  11 519/−22 974 
J
I {0} {0, 2} {0, 2, 4}
{1}  10 745/−21 462  10 906/−21 774  10 940/−21 833 
{1, 3}  11 475/−22 913  11 518/−22 990  11 518/−22 982 
{1, 3, 5}  11 481/−22 916  11 519/−22 983  11 519/−22 974 
TABLE VII.

As Table I but for example system C.1.

J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4} {0, 1, 2, 3, 4, 5}
{0}  394/−769  959/−1891  984/−1930  985/−1923  985/−1914  988/−1910 
{0, 1}  605/−1183  1162/−2288  1205/−2365  1210/−2365  1215/−2365  1215/−2357 
{0, 1, 2}  710/−1384  1170/−2294  1215/−2375  1223/−2381  1227/−2380  1227/−2371 
{0, 1, 2, 3}  770/−1495  1193/−2332  1221/−2377  1228/−2382  1234/−2386  1234/−2377 
{0, 1, 2, 3, 4}  786/−1516  1197/−2329  1222/−2371  1228/−2373  1234/−2376  1234/−2368 
J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4} {0, 1, 2, 3, 4, 5}
{0}  394/−769  959/−1891  984/−1930  985/−1923  985/−1914  988/−1910 
{0, 1}  605/−1183  1162/−2288  1205/−2365  1210/−2365  1215/−2365  1215/−2357 
{0, 1, 2}  710/−1384  1170/−2294  1215/−2375  1223/−2381  1227/−2380  1227/−2371 
{0, 1, 2, 3}  770/−1495  1193/−2332  1221/−2377  1228/−2382  1234/−2386  1234/−2377 
{0, 1, 2, 3, 4}  786/−1516  1197/−2329  1222/−2371  1228/−2373  1234/−2376  1234/−2368 
TABLE VIII.

As Table I but for example system C.2.

J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4} {0, 1, 2, 3, 4, 5} {0, 1, 2, 3, 4, 5, 6}
{0}  12 807/−25 586  13 720/−27 403  13 821/−27 595  13 848/−27 641  13 865/−27 665  13 870/−27 667  13 871/−27 659 
{0, 1}  13 003/−25 969  13 986/−27 926  14 089/−28 123  14 121/−28 177  14 142/−28 211  14 165/−28 248  14 171/−28 249 
{0, 1, 2}  13 326/−26 605  14 018/−27 982  14 133/−28 202  14 159/−28 244  14 178/−28 274  14 196/−28 299  14 198/−28 296 
{0, 1, 2, 3}  13 561/−27 066  14 114/−28 163  14 142/−28 211  14 184/−28 284  14 200/−28 307  14 213/−28 324  14 214/−28 318 
{0, 1, 2, 3, 4}  13 695/−27 326  14 130/−28 186  14 172/−28 262  14 186/−28 279  14 208/−28 314  14 219/−28 327  14 219/−28 319 
{0, 1, 2, 3, 4, 5}  13 730/−27 386  14 156/−28 230  14 186/−28 279  14 201/−28 302  14 208/−28 305  14 221/−28 321  14 221/−28 313 
J
I {0} {0, 1} {0, 1, 2} {0, 1, 2, 3} {0, 1, 2, 3, 4} {0, 1, 2, 3, 4, 5} {0, 1, 2, 3, 4, 5, 6}
{0}  12 807/−25 586  13 720/−27 403  13 821/−27 595  13 848/−27 641  13 865/−27 665  13 870/−27 667  13 871/−27 659 
{0, 1}  13 003/−25 969  13 986/−27 926  14 089/−28 123  14 121/−28 177  14 142/−28 211  14 165/−28 248  14 171/−28 249 
{0, 1, 2}  13 326/−26 605  14 018/−27 982  14 133/−28 202  14 159/−28 244  14 178/−28 274  14 196/−28 299  14 198/−28 296 
{0, 1, 2, 3}  13 561/−27 066  14 114/−28 163  14 142/−28 211  14 184/−28 284  14 200/−28 307  14 213/−28 324  14 214/−28 318 
{0, 1, 2, 3, 4}  13 695/−27 326  14 130/−28 186  14 172/−28 262  14 186/−28 279  14 208/−28 314  14 219/−28 327  14 219/−28 319 
{0, 1, 2, 3, 4, 5}  13 730/−27 386  14 156/−28 230  14 186/−28 279  14 201/−28 302  14 208/−28 305  14 221/−28 321  14 221/−28 313 

A comprehensive investigation of the model selection would involve running ensembles of realizations of the example systems and monitoring how often which model is chosen in the reconstruction as a function of the length of the data set N and the sampling interval δ t. We leave this to future work.

The reconstructed drift and diffusion functions of the various example systems are displayed in Figs. 1 and 2 together with pointwise 90% confidence intervals. A canonical choice for the block length in the uncertainty estimates (see Sec. II D) is l b = [ T x / δ t ], the largest integer smaller or equal to T x / δ t, that is, the number of data points per correlation time. Outside the range of the learning dataset, a linear extrapolation is applied to the drift functions and a constant extrapolation to the diffusion functions. For the systems driven by correlated noise (A.2, B.2 and C.2), the estimates of the correlation time parameter γ with their uncertainties are listed in Table IX.

TABLE IX.

Estimation of the correlation time parameter γ for the systems driven by correlated noise (A.2, B.2, and C.2): True and reconstructed values with 90% confidence intervals.

System True Reconstructed
A.2  0.200  0.189 ([0.153, 0.234]) 
B.2  1.000  0.950 ([0.869, 1.042]) 
C.2  0.500  0.559 ([0.491, 0.635]) 
System True Reconstructed
A.2  0.200  0.189 ([0.153, 0.234]) 
B.2  1.000  0.950 ([0.869, 1.042]) 
C.2  0.500  0.559 ([0.491, 0.635]) 

Within the range of the learning data sets both the drift and diffusion functions are very accurately identified. In the long tails of the linear models and the Rayleigh processes the errors and the uncertainties are somewhat larger. The correlation time parameter γ is also reasonably well estimated in all cases. The uncertainties of the diffusion function estimates are clearly smaller for the white noise models than for the correlated noise models, whereas the drift functions are slightly more accurately estimated in the correlated noise models. The reason for this is that in the white noise case process increments are in lowest order determined by the noise contribution and the drift enters only in the next order, whereas in the correlated noise case, drift and diffusion contribute in the same order.

FIG. 1.

Systems driven by white noise: True and reconstructed drift function (top) and diffusion function (bottom) with pointwise 90% confidence intervals. From left to right: linear model (A.1), bistable model (B.1), and Rayleigh process (C.1). Dotted vertical lines indicate the range of the learning dataset.

FIG. 1.

Systems driven by white noise: True and reconstructed drift function (top) and diffusion function (bottom) with pointwise 90% confidence intervals. From left to right: linear model (A.1), bistable model (B.1), and Rayleigh process (C.1). Dotted vertical lines indicate the range of the learning dataset.

Close modal
FIG. 2.

As Fig. 1 but for systems driven by correlated noise (A.2, B.2, and C.2).

FIG. 2.

As Fig. 1 but for systems driven by correlated noise (A.2, B.2, and C.2).

Close modal
For all of the example systems except for system A.2, the likelihood function appears to possess only one local and global maximum, which is found with both zero and random initial values of the model parameters. For system A.2, two local maxima of the likelihood function are found; the corresponding model parameters as well as the values of the log-likelihood and the BIC are listed in Table X. The global maximum very clearly identifies the correct model in which the long correlation time T x stems mainly from the long memory of the driving noise. The secondary local maximum corresponds approximately to a swap of α 1 and γ while keeping the position of the fixed point and a modification of the multiplicative noise; here, the autocorrelation of x comes from the long damping time scale of the drift term in the x-equation. To understand this behavior, we consider the correlated noise model with linear drift term and only additive noise,
(47)
(48)
Inference in this model has a fundamental non-identifiability issue (see also Ref. 14). When observing only x, the correct model is indistinguishable by any property of x from a model that swaps a and γ ( a ~ = γ and γ ~ = a) while at the same time rescaling the noise amplitude as σ ~ = γ / a σ; up to data sample fluctuations, the likelihood function has two identical maxima corresponding to the two different models. This can be seen from the solution of the system2,14,22 and also from the analysis of the UKF equations (or here equivalently the linear Kalman filter equations). The ambiguity is removed by nonlinear drift terms and/or multiplicative noise in the x-equation. In example system A.2, the imprint of the ambiguity is still visible in the bimodality of the likelihood function, but the state-dependence of the noise is strong enough to identify the correct model. If the nonlinearity of the drift function and the state-dependence of the noise are weak, it may need a very long dataset to reach a distinction between the two models.
TABLE X.

Bimodality of the likelihood function for system A.2: True and estimated system parameters as well as log-likelihood function and Bayesian information criterion for the two maxima.

α0 α1 β0 β1 β2 γ l ^/BIC
True  2.000  −2.000        0.200   
Global maximum  1.997  −1.958  −1.568  1.524  −0.377  0.189  11 539/−23 026 
Secondary local maximum  0.274  −0.265  −2.487  1.161  −0.228  1.751  11 481/−22 911 
α0 α1 β0 β1 β2 γ l ^/BIC
True  2.000  −2.000        0.200   
Global maximum  1.997  −1.958  −1.568  1.524  −0.377  0.189  11 539/−23 026 
Secondary local maximum  0.274  −0.265  −2.487  1.161  −0.228  1.751  11 481/−22 911 

Due to the symmetries of the drift and diffusion functions, the probability density in the systems B.1 and B.2 is symmetric about zero. The same holds for the reconstructed system preserving the symmetries of the drift and diffusion functions. The symmetry of the probability density is prescribed in the semiparametric estimator17 as there may be an imbalance between the two states due to their metastability even in very long time series.

It follows from detailed balance or equivalently time reversal symmetry that the increment probability density in all of the example systems is symmetric about zero for any time increment. Detailed balance is obvious for the one-dimensional white noise model of Eq. (1) for any drift and diffusion functions admitting long-term solutions as it can be seen from the Fokker–Planck equation that the probability flux or probability current of the stationary probability density vanishes everywhere provided it vanishes at the boundaries which may lie at plus and/or minus infinity. Detailed balance is less clear for the correlated noise model of Eqs. (2) and (3) or Eqs. (7) and (8) as the stationary probability density is not known explicitly. Detailed balance is confirmed numerically for the systems A.2, B.2, and C.2 by verifying the condition22 
(49)
from a long time series. Numerical experiments indicate that the correlated noise model is actually in detailed balance for any drift and diffusion functions, which admit long-term solutions given zero probability flux boundary conditions. The symmetry of the increment probability density is not prescribed here in the semiparametric estimator but is clearly visible in the plots below.

For example systems C.1 and C.2, we apply a reflecting boundary condition at zero when integrating the reconstructed models. Otherwise, there would be a finite probability of reaching negative values of x as the singularity of the drift function at zero cannot be captured in the reconstruction. This probability is very small, though, and the boundary at zero is actually never hit in our integrations. We here also apply a reflecting boundary condition at the largest data point in the learning dataset. The estimate of the drift function in the right tail of the data set is very uncertain due to the high noise level there and the extrapolation may lead to a positive drift for very large x and, thus, the escape of trajectories to infinity.

The summary statistics as estimated from a long-term integration of the true and reconstructed models are displayed in Figs. 3 and 4. We observe excellent agreement in all these statistics between the true and the reconstructed systems with only very small errors. Even details of heavy-tailed increment probability densities are very well captured.

FIG. 3.

Summary statistics for true and reconstructed systems driven by white noise: probability density (top), increment probability density (middle), and autocorrelation function (bottom). From left to right: linear model (A.1), bistable model (B.1), and Rayleigh process (C.1).

FIG. 3.

Summary statistics for true and reconstructed systems driven by white noise: probability density (top), increment probability density (middle), and autocorrelation function (bottom). From left to right: linear model (A.1), bistable model (B.1), and Rayleigh process (C.1).

Close modal
FIG. 4.

As Fig. 3 but for systems driven by correlated noise (A.2, B.2, and C.2).

FIG. 4.

As Fig. 3 but for systems driven by correlated noise (A.2, B.2, and C.2).

Close modal

The time series here extend over between 185 and 870 times the correlation time of x and between 10 and 27 samples per correlation time are used. The method is fairly robust with respect to these choices. It still provides virtually unbiased estimates of the drift and diffusion functions from shorter time series; obviously, the uncertainty increases with decreasing time series length N. Practically useful estimates can be obtained down to a length of about 50 times the correlation time. If the sampling interval δ t becomes too large, the model parameter estimates are biased as the Gaussian approximation of the UKF is no longer sufficiently valid. When and how strongly this effect shows up depends on the system under consideration, in particular, on the strength of the nonlinearity and the state-dependence of the driving noise.

A methodology for reconstructing Markovian and non-Markovian SDEs from time series data is introduced and discussed. It makes use of the unscented Kalman filter to estimate states and propagate their probability densities and then employs the filter predictions to estimate model parameters, leading to a principled maximum likelihood technique that naturally comes with uncertainty estimates. The drift and diffusion functions are represented in a semiparametric manner.

The technique is validated on a couple of synthetic time series datasets with known underlying dynamics ranging from linear systems, to nonlinear bistable systems, to intermittent systems. The maximization of the likelihood function is robust and fast. Drift and diffusion functions are accurately reconstructed, even if they lie outside of the prescribed polynomial model class.

The computational effort of the present method is considerably higher than that of direct estimation methods,13,14 but it is still not too onerous and much lower than the computational cost of Markov chain Monte Carlo techniques.10 The direct estimation method of Ref. 13 has the restriction that the correlation time of the driving noise is required to be much shorter than that of the variable x. The system reconstruction with the present method is very accurate for datasets of moderate length, whereas direct estimation methods13,14 require extremely long datasets to produce any reasonable results. Furthermore, the present method provides a straightforward optimization procedure without the need to tune hyperparameters by hand for each example system and it comes naturally with uncertainty estimates for the drift and diffusion functions, both of which is not the case with direct estimation methods.13,14 In summary, the Kalman filter maximum likelihood approach to identification of nonlinear stochastic systems appears to offer a very good compromise in terms of computation time, accuracy, and data requirements.

A detailed study of the model selection and reconstruction accuracy depending on the length of the time series and the sampling interval using multiple realizations of example systems is left for future work.

The authors have no conflicts to disclose.

F. Kwasniok: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review and editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
W.
Horsthemke
and
R.
Lefever
,
Noise-Induced Transitions: Theory and Applications in Physics, Chemistry, and Biology
, 2nd ed. (
Springer
,
2006
), p.
338
.
2.
C.
Gardiner
,
Stochastic Methods: A Handbook for the Natural and Social Sciences
, 4th ed. (
Springer
,
2010
), p.
468
.
3.
R.
Friedrich
,
J.
Peinke
,
M.
Sahimi
, and
M. R. R.
Tabar
, “Approaching complexity by stochastic methods: From biological systems to turbulence,”
Phys. Rep.
506
,
87
(
2011
).
4.
S.
Siegert
,
R.
Friedrich
, and
J.
Peinke
, “Analysis of data sets of stochastic systems,”
Phys. Lett. A
243
,
275
(
1998
).
5.
R.
Friedrich
,
S.
Siegert
,
J.
Peinke
,
S.
Lück
,
M.
Siefert
,
M.
Lindemann
,
J.
Raethjen
,
G.
Deuschl
, and
G.
Pfister
, “Extracting model equations from experimental data,”
Phys. Lett. A
271
,
217
(
2000
).
6.
D.
Kleinhans
and
R.
Friedrich
, “Maximum likelihood estimation of drift and diffusion functions,”
Phys. Lett. A
368
,
194
(
2007
).
7.
D.
Kleinhans
, “
Estimation of drift and diffusion functions from time series data: A maximum likelihood framework
,”
Phys. Rev. E
85
,
026705
(
2012
).
8.
F.
Kwasniok
and
G.
Lohmann
, “
Deriving dynamical models from paleoclimatic records: Application to glacial millennial-scale climate variability
,”
Phys. Rev. E
80
,
066104
(
2009
).
9.
F.
Kwasniok
, “
Analysis and modelling of glacial climate transitions using simple dynamical systems
,”
Phil. Trans. Roy. Soc. A
371
,
20110472
(
2013
).
10.
S.
Brooks
,
A.
Gelman
,
G.
Jones
, and
X.-L.
Meng
,
Handbook of Markov Chain Monte Carlo
, 1st ed. (
Chapman & Hall/CRC
,
2011
), p.
618
.
11.
S.
Lück
,
C.
Renner
,
J.
Peinke
, and
R.
Friedrich
, “The Markov-Einstein coherence length – a new meaning for the Taylor length in turbulence,”
Phys. Lett. A
359
,
335
(
2006
).
12.
E.
Wong
and
M.
Zakai
, “
On the convergence of ordinary integrals to stochastic integrals
,”
Ann. Math. Stat.
36
,
1560
1564
(
1965
).
13.
B.
Lehle
and
J.
Peinke
, “
Analyzing a stochastic process driven by Ornstein–Uhlenbeck noise
,”
Phys. Rev. E
97
,
012113
(
2018
).
14.
C.
Willers
and
O.
Kamps
, “
Non-parametric estimation of a Langevin model driven by correlated noise
,”
Eur. Phys. J. B
94
,
149
(
2021
).
15.
A.
Sitz
,
U.
Schwarz
,
J.
Kurths
, and
H. U.
Voss
, “
Estimation of parameters and unobserved components for nonlinear systems from noisy time series
,”
Phys. Rev. E
66
,
016210
(
2002
).
16.
F.
Kwasniok
and
G.
Lohmann
, “
A stochastic nonlinear oscillator model for glacial millennial-scale climate transitions derived from ice-core data
,”
Nonl. Proc. Geophys.
19
,
595
603
(
2012
).
17.
F.
Kwasniok
, “
Semiparametric maximum likelihood probability density estimation
,”
PLoS One
16
,
e0259111
(
2021
).
18.
S. J.
Julier
,
J. K.
Uhlmann
, and
H. F.
Durrant-Whyte
, “
A new method for the nonlinear transformation of means and covariances in filters and estimators
,”
IEEE Trans. Autom. Control
45
,
477
482
(
2000
).
19.
S. J.
Julier
and
J. K.
Uhlmann
, “
Unscented filtering and nonlinear estimation
,”
Proc. IEEE
92
,
401
422
(
2004
).
20.
G.
Schwarz
, “
Estimating the dimension of a model
,”
Ann. Stat.
6
,
461
464
(
1978
).
21.
L.
Fawcett
and
D.
Walshaw
, “
Improved estimation for temporally clustered extremes
,”
Environmetrics
18
,
173
188
(
2007
).
22.
G. A.
Pavliotis
,
Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations
(
Springer
,
2014
), p.
352
.
Published open access through an agreement with JISC Collections