This paper presents a numerical method to implement the parameter estimation method using response statistics that was recently formulated by the authors. The proposed approach formulates the parameter estimation problem of Itô drift diffusions as a nonlinear least-squares problem. To avoid solving the model repeatedly when using an iterative scheme in solving the resulting least-squares problems, a polynomial surrogate model is employed on appropriate response statistics with smooth dependence on the parameters. The existence of minimizers of the approximate polynomial least-squares problems that converge to the solution of the true least square problem is established under appropriate regularity assumption of the essential statistics as functions of parameters. Numerical implementation of the proposed method is conducted on two prototypical examples that belong to classes of models with a wide range of applications, including the Langevin dynamics and the stochastically forced gradient flows. Several important practical issues, such as the selection of the appropriate response operator to ensure the identifiability of the parameters and the reduction of the parameter space, are discussed. From the numerical experiments, it is found that the proposed approach is superior compared to the conventional approach that uses equilibrium statistics to determine the parameters.

In many dynamical systems, some of the model parameters may not be identifiable from their equilibrium statistics. For problems where the parameters can be identified from their two-point statistics, this paper provides a method to numerically estimate these parameters. Mathematically, we formulate the inverse problem as a well-posed dynamic constrained nonlinear least-squares problem that fits a set of linear response statistics under small external perturbations. Since solving this problem directly is computationally not feasible, we consider a polynomial surrogate modeling to approximate this least-squares problem. Convergence of the solution of the approximate problem to the solution of the original nonlinear least-squares solution is established under appropriate regularity assumptions. Supporting numerical examples on two classes of models with a wide range of applications, the Langevin dynamics and the stochastically forced gradient flows are given.

Parameter estimation is ubiquitous in modeling of nature. The goal in this inverse problem is to infer the parameters from observations with sufficient accuracy so that the resulting model becomes a useful predictive tool. Existing parameter estimation techniques often belong to one of the following two classes of approaches, the maximum likelihood estimation1 and the Bayesian inference such as Markov chain Monte-Carlo (MCMC),2 depending on the availability of prior information about the parameters. Since the model parameters are usually not directly measured, the success of any inference method depends crucially on the identifiability of the parameters from the given observations. When the dependence of the observations on the parameters is implicit, that is, through the model, sensitivity analysis (see, e.g., the review article3) is often a useful practical tool to determine the parameter identifiability.

In this paper, we consider the parameter estimation of ergodic stochastic differential equations driven by Brownian noise. When the corresponding invariant measure of the dynamical systems has an explicit dependence on the parameters, then these parameters can usually be inferred from appropriate equilibrium statistical moments. A popular approach that exploits this idea is the reverse Monte Carlo method4 developed in chemistry. In particular, it formulates an appropriate nonlinear least-squares system of integral equations by matching the equilibrium averages of some pre-selected observables. Subsequently, Newton’s iterations are used to estimate the parameters. At each iterative step, samples at the current parameter estimate are generated for Monte-Carlo estimation to approximate the least-squares integral equations and the corresponding Jacobian matrix. In practice, this method can be rather slow due to the repeated sampling procedure. A severe limitation of this approach is that it is restrictive to inference of parameters of the equilibrium density.

This limitation can be overcome by fitting appropriate two-point statistics. As shown in our previous work,5 we formulated the parameter estimation problem based on the linear response statistics subjected to an external forcing, which drives the system out of equilibrium. The fluctuation-dissipation theory (FDT), a hallmark in non-equilibrium statistical physics, suggested that the changes of the average of an observable under small perturbations can be approximated by appropriate two-point statistics, called the FDT response operators.6 The key point is that these FDT response operators can be estimated using the available samples of the equilibrium unperturbed dynamics so long as we know the exact form of the invariant measure of the dynamics, which we will assume to be the case for a large class of problems, such as the Langevin dynamics and stochastic gradient flows. We should point out that the proposed approach relies on the validity of FDT response statistics, which has been studied rigorously for a large class of stochastic system that includes the stochastic differential equation (SDE) setup in this paper.7 For deterministic dynamics, one can use, for example, the statistical technique introduced in Ref. 8 to verify the validity of the FDT linear response. Following the ideas from Refs. 911, we developed a parameter inference method using these response operators in Ref. 5. While the method in Refs. 9–11 involves minimizing an information-theoretic functional that depends on both the mean and variance response operators, our approach fits a finite number of quantities, to be referred to as the essential statistics, which would allow us to approximate the FDT response operators, of appropriate observables (beyond just the mean and variance).

In our previous work,5 we showed the well-posedness of the formulation on three examples, in the sense that under infinite sampling, appropriate choices of essential statistics will ensure the identifiability of the model parameters. On one of these examples, the simplified model for turbulence,12 which is a nonlinear system of SDEs with a Gaussian invariant measure, we were able to explicitly determine the choice of observables and external forcings (which in turn determine the FDT response operators) that allow us to identify all of the model parameters uniquely. In this case, one can directly estimate the parameters by solving the system of equations involving these essential statistics. While this result suggests that the choice of observables and external forcings is problem dependent, the FDT theory provides a guideline for choosing the appropriate two-point statistics for a well-posed parameter estimation. On another example, the Langevin dynamics (which will be discussed in Sec. IV), while the parameters can be identified by the essential statistics as shown in Ref. 5, the formulation suggests that one must include essential statistics that involve higher-order derivatives of the FDT response operators. This can be problematic because these higher-order statistics are rarely available in practice unless the available data are solutions of high-order SDE solvers. Given this practical issue, we propose to fit only the essential statistics that include the zeroth- and first-order derivatives of the FDT response operator (if possible, use only the zeroth-order derivative information). The price paid by restricting to only the lower-order essential statistical information is that the dependence of the essential statistics on some of the model parameters becomes implicit through the solutions of the dynamical model and the identifiability of these parameters becomes questionable.

In this paper, we devise a concrete numerical method to estimate the parameters by solving dynamic-constrained least-squares problems of integral equations involving these low-order essential statistics. As one would imagine, the implementation of this approach will face several challenges. First, just like in the reverse Monte-Carlo method, naive implementation of an iterative method for solving this dynamically constrained least-squares problem requires solving the model repeatedly. To avoid solving the model repeatedly in the minimization steps, we employ a polynomial surrogate model approach13,14 on the least-squares cost function that involves the essential statistics. This approach is motivated by the fact that the cost function (or effectively the essential statistics) is subjected to sampling error and that the essential statistics have smooth dependence on the parameters, assuming that the Fokker-Planck operator of the underlying dynamics has smooth dependence on the parameters. Under appropriate regularity assumption on the essential statistics, we will provide a theoretical guarantee for the existence of minimizers of the approximate polynomial least-squares problem that converge to the solution of the true least-squares problem. The second related issue, as mentioned above, is that certain (lower-order) essential statistics might not be sensitive enough to some of the parameters, which will lead to inaccurate estimations. To ensure the practical identifiability of the parameters, we employ an empirical a priori sensitivity analysis based on the training data used in constructing the polynomial surrogate models and a posteriori local sensitivity analysis to ensure the validity of the a prior sensitivity analysis.

While the proposed technique can be used to infer all of the model parameters, in practice, one should use the polynomial approximation to infer parameters that cannot be estimated directly to avoid the curse of dimensionality. This issue is a consequence of using the polynomial expansion in approximating the least-squares problem, where the required training data set for constructing the surrogate model increases exponentially as a function of the parameter dimension. In the numerical examples below, we will apply the approximate least-squares problems to estimate parameters that cannot be directly estimated from matching equilibrium and/or two-point statistics that involve the first-order derivative of the FDT response operator.

The rest of the paper is organized as follows. In Sec. II, we briefly review the concept of essential statistics and parameter inference method using the linear response statistics developed in Ref. 5. In Sec. III, we present the proposed numerical algorithm based on the polynomial based surrogate model and discuss its convergence (with detailed proof in  Appendixes A– C). In Secs. IV and V, we show applications on two nonlinear examples, a Langevin model and a stochastic gradient system with a triple-well potential, respectively. In Sec. VI, we conclude the paper with a summary and discussion.

We begin by reviewing the parameter estimation formulation introduced in our previous paper.5 Consider an n-dimensional Itô diffusion

dX=b(X;θ)dt+σ(X;θ)dWt,
(1)

where the vector field b(X;θ) denotes the drift and σ(X;θ) is the diffusion tensor. Both coefficients are assumed to be smooth functions in X and θ; Wt represents the standard Wiener process, and the variable θD contains model parameters, where DRN is the parameter domain. In this paper, we only consider the case where D is bounded and for simplicity, we assume D=[1,1]N. Throughout this paper, we denote the underlying true parameter value as θ and the corresponding estimate as θ^. We assume that Eq. (1) is ergodic (see Ref. 15 for precise conditions) with equilibrium density peq(x;θ) for all θD, and peq(x;θ) smoothly depends on both x and θ. When θ=θ, we assume that we have the access to the explicit formula of peq(x;θ) as a function of x only, which is shortened as peq(x). As we shall see in many applications, peq(x;θ) only depends on a subset of the parameters θ.

The main idea of the parameter estimation formulation introduced in Ref. 5 is to infer the parameters from the linear response operator associated with the fluctuation dissipation theory (FDT). Recall that, in the FDT setting, we are interested in an order-δ (0<δ1) external perturbation of the form f(x,t)=c(x)δf(t), such that the solutions of the perturbed dynamics

dXδ=[b(Xδ,θ)+c(Xδ)δf(t)]dt+σ(Xδ,θ)dWt,

which would otherwise remain at the equilibrium state of the unperturbed dynamics (1) and can be characterized by a perturbed density pδ(x,t;θ). The density function is governed by the corresponding Fokker-Planck equation under initial condition pδ(x,0;θ)=peq(x;θ).

By a standard perturbation technique, e.g., Ref. 1, the difference between the perturbed and unperturbed statistics of any integrable function A(x) can be estimated by a convolution integral, that is,

ΔE[A](t):=Epδ[A(X)](t)Epeq[A(X)]=0tkA(ts)δf(s)ds+O(δ2).
(2)

In (2), the term kA(t) is known as the linear response operator. The FDT formulates the linear response operator as the following two-point statistics:

kA(t;θ):=Epeq{A[X(t)]B[X(0);θ]}withBi(X;θ):=Xi[ci(X)peq(X;θ)]peq(X;θ),
(3)

where Bi and ci denote the ith components of B and c, respectively. We should point out that the validity of the FDT has been studied rigorously under mild conditions.7 A more explicit form is given by

kA(t;θ)=RnRnA(x)B(y)ρ(x,t|y,0)peq(y;θ)dxdy,
(4)

where ρ is the solution of the Fokker-Planck equation

tρ=Lρ,ρ(x,0|y,0)=δ(xy).
(5)

Here, L denotes the generator of the unperturbed dynamics (1) and L is its adjoint operator in the L2(Rn).

Remark 2.1

The result in Ref. 16  states that if the coefficients of linear parabolic PDEs are Ck as functions of θ, then the weak solutions are also k-time differentiable with respect to θ. In our case, if b,σCk([1,1]N), then the linear response operator kA(t;θ)(4) is also Ck with respect to the parameters θ under the mild assumption that peq(x;θ) is smooth with respect to θ. In Sec. III B, we will conduct a convergence analysis on the proposed parameter estimation method to determine the necessary value of k, the smoothness of the linear response operators as functions of θ.

Notice that B(X):=B(X;θ) can be determined analytically based on our assumption of knowing the explicit formula of peq. Given t, the value of kA(t;θ) can be computed using a Monte-Carlo sum based on the time series of the unperturbed system, X, at peq(x). Therefore, the linear response operator can be estimated without knowing the underlying unperturbed system in (1) so long as the time series of X at peq(x) and the explicit formula for this equilibrium density are known.

The FDT, along with the expression of the kernel function (3), has been implemented to predict the change of the expectation of the observable A.17–22 The advantage is that the response operator is defined with respect to the equilibrium distribution, which does not involve the prior knowledge of the perturbed density pδ. In statistical physics, the FDT is a crucial formulation to derive transport coefficients, e.g., electron conductivity, viscosity, diffusion coefficients, etc.6 In the current paper, however, we propose to use the response statistics to infer the underlying true parameter value θ.

Since this linear response operator is in principle infinite-dimensional, it is necessary for practical purposes to introduce a finite-dimensional approximation. The following parametric form of the response operator has been motivated by the rational approximation of the Laplace transform.23 To explain the idea, consider the Laplace transformation of kA(t;θ) denoted by K(s;θ), which can be approximated by a rational function in the following form:

K(s;θ)(Is1β1smβm)1(s1α1++smαm),αi,βiRn×n.

In the time domain, kA(t;θ) can be written explicitly as

kA(t;θ)gm(t;θ):=(I00)etG(α1α2αm),withG=(β1Iβ20Iβm0),
(6)

where I denotes the n-by-n identity matrix. Here, m stands for the order of the rational approximation. In Ref. 23, the coefficients in the rational approximations were determined based on certain interpolation conditions, and such a rational approximation has been proven to be an excellent approximation for the time correlation function. For the current problem, gm will be determined using finitely many essential statistics defined as follows.

Definition 2.2

The values {kA(j)(ti;θ)}i=1,2,,K are called essential statistics if they are sufficient to approximate the response operator kA(t;θ) up to order-m using (6), for j=0,,2m1 and t1<t2<tK, where j indicates the order of derivatives.

Remark 2.3

In Ref. 5, we have considered the derivatives of kA at t=0 for a Langevin dynamics model, and explicit formulas involving the model parameters have been found. Theoretically, all the parameters can be identified from {kA(j)(0+;θ)}, which to some extent shows the consistency. However, in practice, we use only the derivative of order no greater than one, since estimating higher order derivative of kA(t,θ) requires time series X(ti) with sufficient accuracy, which is not necessarily available. By using the essential statistics correspond to lower-order derivatives at ti other than zero, we give up the explicit expressions of the higher-order statistics with respect to the parameters. This prompted us to solve the problem in the least-squares sense. Furthermore, it requires other more practical methods, such as a sensitivity analysis test, to determine the identifiability of the parameters from these statistics.

Figure 1 shows the performance of such a discrete representation, where the linear response operator arises from a Langevin model subjected to a constant external forcing (25), which will be discussed in Sec. IV. With such discrete representation, we reduce our problem of inferring from an infinite-dimensional kA(t;θ) to a finite number of essential statistics. To derive a system of equations involving both θ and those essential statistics, we introduce

k^A(t;θ):=Epeq(θ){A[X(t)]B[X(0)]}.
(7)

We also define for j=0,1,

Mj(ti):=kA(j)(ti;θ)=djdtjEpeq{A[X(ti)]B[X(0)]}.

We should stress that (7) is not the FDT response, since the B(x) in (7) is defined with respect to peq. The key idea in Ref. 5 is to estimate the true parameter values θ by solving

Mj(ti)=k^A(j)(ti;θ),j{0,1},i=1,2,,K.
(8)

Here, the term on the left-hand-side is estimated from the available sample at peq, and the term on the right-hand-side will be computed from the time series generated by solving (1) for a given θD.

FIG. 1.

The response function (solid lines) and its order-4 approximation (dashed lines).

FIG. 1.

The response function (solid lines) and its order-4 approximation (dashed lines).

Close modal
Remark 2.4

Since peq usually depends on a subset of the true parameter value θ, the assumption of knowing the explicit form of peq as a function of x only is too strong. In general, we cannot even compute the left-hand-side term in (8) since the term B, which may depend on the unknown true value θ, is not available to us. We will discuss how to address such issue on two examples in Secs. IV and V, where each of them belongs to a class of models with a wide range of applications.

To ensure the solvability, we assume that the total number of equations in (8) is always greater than the dimension of the unknown parameters θ. For this over-determined scenario, we propose to solve (8) in the least-squares sense, addressing the practical issue raised in Ref. 5 when the explicit solutions are not available. The main difficulty in solving the nonlinear least-squares problem comes from evaluating k^A(j)(ti;θ), which often requires solving the model (1) repeatedly. In Sec. III, we will propose an efficient numerical method to address this issue.

In Sec. II, we formulated a parameter estimation method as a problem of solving a nonlinear system (8) subject to a dynamical constraint in (1). In a compact form, we denote the system (8) as

fi(θ):=M0(ti)k^A(ti;θ)=0,i=1,,K,
(9)

where θ[1,1]N, K>N. Notice that we have neglected the derivative constraints in (8). While similar algorithm and convergence result as shown below also hold if the first-order derivative constraints are included in the least-squares procedure, we will only numerically solve the least-squares problem for the zeroth-order derivative constraints as in (9). As we shall see in Secs. IV and V, we will use the derivative constraints to directly estimate some of the parameters.

Our goal is to solve (9) in the sense of least-squares, that is,

minθ[1,1]Ni=1Kfi2(θ).
(10)

However, as we mentioned in Sec. II, we do not necessarily have the explicit expressions for fi(θ) and evaluating fi for certain values of θ requires solving the true model (1) to approximate the integral in fi(θ) by a Monte-Carlo sum. For example, if we apply the Gauss-Newton method to solve the nonlinear least-squares problem (10), the overall computation will become very expensive since we have to solve the model (1) repeatedly. As a remedy, we apply the idea of polynomial based surrogate model, which was introduced in Refs. 13 and 14 for solving Bayesian inference problems, to alleviate this part of computation. The main difference of our approach from Refs. 13 and 14 is that they used the orthogonal polynomials to expand the dynamical model, X(t,θ), whereas here, we will use the orthogonal polynomials to expand fi(θ). While the convergence of the approximate posterior density estimate of the Bayesian inference problem is established in Ref. 14, we will end this section by analyzing the convergence of the approximate least-squares problem.

The key idea of our method is to approximate fi(θ) by a polynomial function fiM(θ). This is motivated by the fact that the sampling error cannot be avoided in computing the value of essential statistics, that is, there is uncertainty on the left-hand-side of (8). Thus, it is more reasonable to think of the nonlinear least-squares solution of (10) as a random variable with certain distribution μ over [1,1]N. Furthermore, since {fi(θ)} are Ck with respect to θ (Remark 2.1) whenever b and σ in (1) are Ck with respect to θ, polynomial chaos expansion provides a natural choice of fiM(θ) based on the orthogonal polynomials with respect to μ.

For instance, if we consider μU[1,1], i.e., the uniform distribution, then the corresponding orthogonal polynomials are the Legendre polynomials {pn}. They are given by p0=1, p1=x, and the recursive formula

(n+1)pn+1(x)=(2n+1)xpn(x)npn1(x),

which form a basis for L2([1,1],μ). The orthogonality of {pn} over L2([1,1],μ) is described by

11pm(x)pn(x)dμ(x)=22n+1δmn.

One can normalize pn by introducing Pn:=pnn+12.

For multi-dimensional cases, one can define Pk(θ):=j=1NPkj(θj) (N represents the dimension of θ), where k0 is a multi-index with k=(k1,,kN){0,1,2,}N and θ=(θ1,,θN)[1,1]N. We can consider an order-M approximation of fi(θ)

fi(θ1,,θN)kMαk(i)Pk(θ)=:fiM(θ).
(11)

In practice, there are two approaches to determine the coefficient αk(i). The first approach is the Galerkin method, which requires that the errors in the approximation in (11) to be orthogonal to the finite-dimensional subspace of L2([1,1],μ) generated by {Pk(θ)}kM. In our convergence analysis, we will deduce the error estimates based on the coefficients obtained through this Galerkin (or least-squares) fitting. Alternatively, one can determine the coefficient αk(i) by the collocation method, that is, matching the values of fi at certain points of θ in D, denoted by Θ, which leads to the following linear equations:

fi(θ)=kMαk(i)Pk(θ),θΘ.
(12)

Since (12) is equivalent to a polynomial interpolation, a common choice of Θ is the product space of order-MC Chebyshev nodes, where MCM+1 [when MC>M+1, (12) turns into a linear least-squares problem]. In the numerical examples in this paper, we will use the collocation method to determine the coefficients.

Replacing fi(θ) in (10) by fiM(θ), we obtain a new least-squares problem for the order-M polynomial based surrogate model

minθ[1,1]Ni=1K(fiM)2(θ).
(13)

For easier notations, let us use fM(θ) and JM(θ) to denote the vector-valued function, [f1M(θ),,fNM(θ)], and its Jacobian matrix, respectively.

In summary, we have Algorithm 1 based on the Gauss-Newton method.

Algorithm 1.

Parameter estimation: Polynomial based surrogate model

Let Θ be a set of collocation nodes in [1,1]N
1. For each θΘ, solve (1)
2. Estimate fi(θ) using a Monte-Carlo sum over i=1,,K
3. Compute the coefficients αk(i) by solving the linear system (12) and obtain the approximations fiM(θ) for i=1,,K. 
4. For an initial guess θ0[1,1]N and a threshold δ>0. Compute
θk=θk1{[JM(θk1)]JM(θk1)}1[JM(θk1)]fM(θk1).
(14)
 
while The step length θkθk1δdo 
 Repeat (14)
end while 
returnθk 
Let Θ be a set of collocation nodes in [1,1]N
1. For each θΘ, solve (1)
2. Estimate fi(θ) using a Monte-Carlo sum over i=1,,K
3. Compute the coefficients αk(i) by solving the linear system (12) and obtain the approximations fiM(θ) for i=1,,K. 
4. For an initial guess θ0[1,1]N and a threshold δ>0. Compute
θk=θk1{[JM(θk1)]JM(θk1)}1[JM(θk1)]fM(θk1).
(14)
 
while The step length θkθk1δdo 
 Repeat (14)
end while 
returnθk 

Computationally, the most expensive step is in solving (1) on the collocation nodes Θ, which can be done in parallel.

In this section, we will provide the convergence analysis of the solutions of the approximate least-squares problem in (13) to the solution of the true least-squares problem in (10) as M, under appropriate conditions. We denote the solution of (13) by θM, that is,

θM:=argminθ[1,1]Ni=1K(fiM)2(θ).

Since each fiM is a multivariate polynomial, the minimum (13) can be attained in [1,1]N; but such θM may not be unique. For the case of K=N and sufficiently smooth fi, assuming that the true parameter value θ(1,1)N is the unique solution of the true least-squares problem in (10), we can show that there exists a sequence of minimizers of the approximated problem (13){θM} such that θMθ as M+. To be more precise,

Before we apply numerical method to solve the problem, a fundamental question will be: assuming θ(1,1)N, the true parameter value, is the unique solution of (10) can we find a sequence of minimizers of the approximated problem (13){θM} such that θMθ as M+? We are able to prove the following theorem, which partially answers this question.

Theorem 3.1
Consider the N-dimensional nonlinear least-squares problem (10) and its surrogate model (13). In particular, we let K=N and assume
  1. fiC([1,1]N),=32N+2, i=1,2,,N,

  2. Let θ be the solution of (10) such that fi(θ)=0,i=1,2,,N,

  3. the Jacobian matrix of f:=(f1,f2,,fN) at θ, denoted by f(θ), is invertible.

Then, there exists a sequence of minimizers {θM} such that
θM=argminθ[1,1]Ni=1N(fiM)2(θ)
and θMθ as M+. Moreover, the residual error fM(θM) converges to 0 as M+.
Remark 3.2

We set N=K since the existence of the minimizers rely on the technical assumption in Lemma 3.6. The second and third assumptions provide the well-posedness of the original nonlinear least-squares problem (10). Since we discuss the parameter estimation problem under the perfect model setting, the second assumption naturally holds.

The proof of Theorem 3.1 can be found in  Appendix A. Here, we will present the proof for the one-dimensional case (K=N=1) to illustrate the main ideas. Notice that in the one-dimensional case, the formula in (11) is reduced to

fM(θ):=k=0MαkPk(θ),
(15)

where {Pn} are normalized Legendre polynomials. We start with two Lemmas, which pertain to the pointwise convergence of fM, and the proof can be found in Ref. 24.

Lemma 3.3
Let fC([1,1]), then the corresponding least-squares approximation given by (15) satisfies
limk+αkk=0.

Lemma 3.3 connects the smoothness of f and the decay rate of the generalized Fourier coefficients αk. The following Lemma is a direct result of Lemma 3.3.

Lemma 3.4
Let fC2([1,1]) and fM be the corresponding least-squares approximation given by (15), then for any ϵ>0, there exists N>0 such that M>N we have
ffMϵM.
Thus, we have fMf pointwise on [1,1].

Here, denotes the L-norm on C2[1,1]. In the proof of Proposition 3.7, we also need (fM)f0 as M+. To obtain such convergence, one needs higher regularity on f. In particular, we have

Lemma 3.5

Let fC4([1,1]) and fM be the corresponding least-squares approximation given by (15). We have f(fM)0 as M+.

Proof.
Following the same idea in the proof of Lemma 3.4,24 we are going to show that {(fM)} is a Cauchy sequence under L-norm. Notice for m>n,
(fmfn)=k=n+1mαkPkk=n+1m|αk|Pk,
which suggests that we need to find a L bound for Pk. Using the recursion relation pn+1pn1=(2n+1)pn and pn=1, we have
Pn=n+12pn12n(n+1)n+12.
(16)
As a result,
(fmfn)12k=n+1m|αk|k(k+1)k+12.
From Lemma 3.3, we know that fC4([1,1]) is enough to ensure {(fM)} is Cauchy. Let (fM)φC0([1,1]) in L sense. The remaining issue is whether φ=f. This is straightforward, since by Lemma 3.4 we know fMf pointwisely. Moreover,
fM(θ)fM(1)=1θ(fM)(t)dt,limM+1θ(fM)(t)dt=1θφ(t)dt,θ[1,1],
and the convergence is uniform with respect to x. Thus, x[1,1],
limM+fM(θ)fM(1)=f(θ)f(1)=1θφ(t)dt,
that is, f=φ.

So far, we have shown that with enough regularity of f, we have the uniform convergence of both fMf and (fM)f on [1,1]. To show the existence of minimizer near θ, the following lemma is crucial.

Lemma 3.6

Let F:DRnR be continuously differentiable in the domain D and suppose that there is an open ball B(x0,r)D and a positive γ such that F(x)1γxB(x0,r) and r>γF(x0). Then, F(x)=0 has a solution in B(x0,r).

The proof can be found in Ref. 25. In the one-dimensional case, Lemma 3.6 basically says that if a C1 function f changes with a minimum speed (|f|>γ1) on the interval (x0r,x0+r) and the function value at x0 is small enough [f(x0)rγ], then the function must reach zero in this interval. With Lemmas 3.43.6, we are able to prove the following result, which is the one-dimensional analogy of Theorem 3.1.

Proposition 3.7
Consider the following one-dimensional nonlinear least-squares problem:
minθ[1,1]f2(θ),
with solution θ. If f is approximated by fM given by (15) and we assume that
  1. fC4([1,1]),

  2. f(θ)=0,

  3. |f(θ)|0,

then there exists a sequence of minimizer {θM} such that
(fM)2(θM)=minθ[1,1](fM)2(θ)=:eM2
and
limM+θM=θ,limM+|fM(θM)|=0.
Proof.
From Lemmas 3.4 and 3.5, we know that the first assumption provides
limn+ffM=0,limn+f(fM)=0.
Let FM(θ):=fM(θ)eM, and we are going to apply Lemma 3.6 to FM at θ. Since f(θ)0, by continuity, we know that there exist positive constants γ and r such that
|f(θ)|1γ2,θ(θr,θ+r)(1,1).
(17)
Further notice that (FM)=(fM) and f(fM)0. This implies that there exists a positive constant L1 such that M>L1
|(FM)(θ)1|γ,θ(θr,θ+r)(1,1).
(18)
At θ=θ, we have
|FM(θ)|=|fM(θ)eM||fM(θ)|+eM2|fM(θ)|,
where we used the fact that eM|fM(θ)|, since eM2 is the minimum of (fM)2 on [1,1]. As a result,
limM+|FM(θ)|limM+2|fM(θ)|=2|f(θ)|=0.
Thus, we are able to select L2>0 such that M>L2
|FM(θ)|<rγ.
(19)
Since we have both (18) and (19)M>max{L1,L2}=:L, applying Lemma 3.6 to FM at θ, we conclude that FM(θ)=0 has a solution in (θr,θ+r). Denote such solution as θM, which is a minimizer of the approximated least-squares problem, that is,
(fM)2(θM)=eM2=minθ[1,1](fM)2(θ),M>L.
Apply the mean value theorem to f(θM)f(θ) for M>L, we have
|θθM||f(ξM)|1|f(θ)f(θM)|=|f(ξM)|1|f(θM)|,
for some ξM(θr,θ+r). Thus, the stability condition in (17) leads to
|θθM|γ2|f(θM)|γ2[|f(θM)fM(θM)|+|fM(θM)|]γ2[ffM+|fM(θ)|]γϵM0,
as M and for any ϵ>0. Here, we have used Lemma 3.4 to bound the first term and the residual error
|fM(θM)||fM(θ)|=|fM(θ)f(θ)|ϵM.

This result guarantees the existence of minimizers of the approximate polynomial least-squares problem that converges to the solution of the true least-squares problem under a fairly strong condition, that is, f is smooth enough. In practice (see Sec. IV), we shall see reasonably accurate estimates even when the essential statistics are not as regular as required by this theoretical estimate.

In Algorithm 1, we are solving a least-squares problem (13). For the Gauss-Newton method, a standard local convergence result under a full rank assumption of the Jacobian matrix is well known (e.g., see Chapter 2 of Ref. 26 for details). To achieve such a local convergence in our numerical algorithm, we will now verify the full rank condition in the polynomial chaos setting.

For our problem, the Jacobian matrix JM(θ)RK×N is given by

(JM)ij=fiMθj,i=1,,K,j=1,N,

where fiM is an element of the finite-dimensional polynomial space ΓNM defined by

ΓNM:=span{θ1k1θ2k2θNkN|ki=0,1,,M}.

Using linear independence over the function space ΓNM and some fundamental results in algebra, we can prove the following theorem.

Theorem 3.8
If F:={fi}i=1KΓNM is linearly independent and K>max{N,(M+1)N1}, then the column space of the corresponding Jacobian matrix J(θ) is full rank in (ΓNM)K. Furthermore, the set N(J) is defined by
N(J):={θRN|rank[J(θ)]N1}
is nowhere dense over RN.
Proof.

See  Appendix B.

Remark 3.9

Here, N(J) is the set of θ for which J(θ) is not full rank in the matrix sense. In our application,N is the dimension of the parameter θ,K is the total number of equations in (9), and M is the order of approximation used in (11). In practice, since the orthogonal polynomials involved in (11) form a basis for ΓNM, it is enough to check the rank of the coefficient matrix AM, where the ith column consists of the coefficients αk(i) in (11) for all possible k. The linear independence hypothesis is indeed sensible in practice since we want to avoid solving underdetermined least-squares problems.

For the first example, we consider a classical model in statistical mechanics: the dynamics of a particle driven by a conservative force, a damping force, and a stochastic force. In particular, we choose the conservative force based on the Morse potential

U(x)=U0[a(xx0)],U0(x)=ϵ(e2x2ex+0.01x2),
(20)

where the last quadratic term in U0 acts as a retaining potential (also known as a confining potential), preventing the particle from moving to infinity. For this one-dimensional model, we rescale the mass to unity m=1 and write the dynamics as follows:

{x˙=v,v˙=U(x)γv+2γkBTW˙,
(21)

where W˙ is a white noise. The generator of the system (21), denoted by L, is given by

L=vx+[U(x)γv]v+γkBT2v2.
(22)

The smooth retaining potential U(x) guarantees the ergodicity of the Langevin system (21) (see  Appendix C for details). Namely, there is an equilibrium distribution (Gibbs measure) peq(x,v), given by

peqexp{1kBT[U(x)+12v2]},
(23)

which is independent of γ. In particular, we have vN(0,kBT) at equilibrium. For this Langevin model, there are a total of five unknown parameters θ:=(γ,T,ϵ,a,x0) with true values θ:=(γ,T,ϵ,a,x0).

For this specific problem, notice that all parameters except γ appear in the equilibrium density function. Intuitively, this suggests that one can estimate four parameters, {T,ϵ,a,x0}, from equilibrium statistics and γ from two-point statistics, respectively. We will present a conventional estimation method based on this guideline for diagnostic purpose. In particular, the parameter γ will be directly estimated from a two-point statistic as shown in Sec. IV A. Subsequently, the remaining four parameters are obtained via one-point statistics, as shown in (27) and Sec. IV B 1.

As for the proposed approach, we will consider estimating all the parameters using the essential statistics in the most efficient manner. First, two of the parameters, {γ,T}, will be estimated using the essential statistics in (27). Then, via a sensitivity analysis discussed in Sec. IV B 2, we will show that parameter x0 is independent of the essential statistics. As a consequence, we formulate a least-square problem to estimate θ=(ϵ,a) and use equilibrium statistics to estimate x0.

In our previous work,5 we have suggested an approach to estimate T and γ directly from the essential statistics. To construct such essential statistics, we consider a constant external forcing δf with δ1. The corresponding perturbed system is given by

{x˙=v,v˙=U(x)γv+δf+2γkBTW˙,
(24)

that is, c(x)=(0,1) in the FDT formula (3). By selecting the observable A=(0,v), we work with the (2,2) entry of the response operator given by

kA(t;θ)=1kBTEpeq[v(t)v(0)]=1kBTR2R2vv0ρ(x,v,t|x0,v0,0)×peq(x0,v0)dxdvdx0dv0,
(25)

where ρ is the solution of corresponding Fokker-Planck equation (5) associated with the Langevin dynamics. Taking the time derivative of Epeq[v(t)v(0)] for t>0, we obtain

tEpeq[v(t)v(0)]=R2R2vv0Lρ(x,v,t|x0,v0,0)peq(x0,v0)dxdvdx0dv0=R2R2Lvv0ρ(x,v,t|x0,v0,0)peq(x0,v0)dxdvdx0dv0=R2R2[U(x)γv]v0ρ(x,v,t|x0,v0,0)×peq(x0,v0)dxdvdx0dv0=Epeq({U[x(t)]γv(t)}v(0)).
(26)

Let t0+ in (26) and recall that vN(0,kBT) at equilibrium, we have

Epeq[v(t)v(0)]|t=0=kBT,tEpeq[v(t)v(0)]|t=0+=γkBT,
(27)

where both the left-hand-sides can be computed from the sample. Thus, (27) provides direct estimates for T and γ. As a result, the original parameter estimation problem can be reduced into estimating (ϵ,a,x0) in the potential function U, which is a non-trivial task since the dependence of peq on these parameters is quite complicated. With these estimates, B becomes available since it only depends on kBT, which allows one to compute Mj(ti) in (8) and avoid the issue pointed out in Remark 2.4.

In this subsection, we focus on the three-dimensional parameter estimation problem with θ:=(ϵ,a,x0). First, we review the conventional approach, which matches the equilibrium statistics of x. Then, we discuss the proposed new approach using the essential statistics. In particular, we perform a sensitivity analysis to determine the identifiability of the parameters from the essential statistics. Finally, we present the numerical results including a comparison between the conventional and the new approaches.

1. Conventional method

We first look at what we can learn from the equilibrium statistics. Since the marginal distribution of x at equilibrium state is proportional to exp[U(x;θ)/kBT], a natural idea is to match the moments of x with respect to this equilibrium density. In particular, one can introduce the following three equations:

Epeq[xi]=Epeq[xi;ϵ,a,x0],i=1,2,3,
(28)

where the left-hand-sides are computed from the given data, while the right-hand-sides are treated as functions of (ϵ,a,x0), obtained from solving the model (24). The following pre-computation simplifies Eq. (28) into a one-dimensional problem.

To begin with, we define the probability density functions

peqa,x0(x):=1Nexp{1kBTU0[a(xx0)]},peq1,0(x):=1N0exp[1kBTU0(x)],

in terms of U and U0, respectively. With a change of variables y:=a(xx0), the normalizing constants N and N0 satisfy

N=Rexp{1kBTU0[a(xx0)]}dx=1aRexp[1kBTU0(y)]dy=1aN0.

As a result, the first two equations in (28) can be written into

Epeq[x]=Epeqa,x0[x]=1aEpeq1,0[x]+x0,Epeq[x2]=Epeqa,x0[x2]=1a2Epeq1,0[x2]+2x0aEpeq1,0[x]+x02,

with a unique solution

a=Varpeq1,0[x]Varpeq[x],x0=Epeq[x]Epeq1,0[x]Varpeq[x]Varpeq1,0[x],
(29)

where Varpeq1,0[x] stands for the variance of x with respect to the rescaled equilibrium density peq1,0. In practice, both Epeq[x] and Varpeq[x] can be empirically estimated directly from the available data, while Varpeq1,0[x] is a function of ϵ only. Thus, (29) can be denoted as

a=a(ϵ);x0=x0(ϵ),

and (28) is reduced to a one-dimension problem for ϵ,

Epeq[x3]=Epeq[x3;ϵ,a,x0][=Rx3peq(x;ϵ,a,x0)dx],s.t.a=a(ϵ),x0=x0(ϵ).
(30)

Note that the left-hand-side of (30) will be estimated by Monte-Carlo averaging from the unperturbed data, whereas the right-hand-side integral will be approximated using a numerical quadrature.

2. Sensitivity analysis of the essential statistics

To ensure a successful parameter estimation, it is useful to gather some a priori knowledge of the parameter identifiability from the proposed essential statistics. This is important especially since there are non-unique choices of essential statistics to be fit and we want to ensure that we can identify the parameters from appropriate essential statistics: Recall that the essential statistics are defined based on the choices of external forcing and observable.

While local sensitivity analysis such as the pathwise derivative method described in Ref. 27 is desirable, it requires knowledge of the true parameters, which are not available to us. Essentially, the pathwise derivative method is to compute y¯θ:=E[Y(t,ω)], where Y(t,ω):=DθX(t,θ) and the expectation is defined with respect to the density of the Itô diffusion (written in its integral form)

X(t,θ)=X0(θ)+0tb[X(s,θ),θ]ds+0tσ[X(s,θ),θ]dWs.
(31)

Since we have no explicit solutions of the density ρ(x,t;θ), one way to approximate y¯θ is using an ensemble average of the solutions of the following equation:

Y(t,θ)=Y0(θ)+0t{bX[X(s,θ),θ]Y(s,θ)+bθ[X(s,θ),θ]}ds+0t{σX[X(s,θ),θ]Y(s,θ)+σθ[X(s,θ),θ]}dWs.
(32)

Notice that this equation depends on (31) and the unknown parameters, θ. The dependence on θ implies that one cannot use this method for a priori sensitivity analysis. However, it can be used for a posteriori sensitivity analysis, evaluated at the estimates, to verify the difficulties of the particular parameter regimes.

As an empirical method for a priori sensitivity analysis, we simply check the deviation of k^(t,θ) as a function of the training collocation nodes, θΘ, which are available from Step 2 of Algorithm 1. Of course, a more elaborate global sensitivity analysis technique such as the Sobol index28 can be performed as well, but we will not pursue this here. From this empirical a priori sensitivity analysis, we found that the corresponding essential statistics, Epeq[v(t)v(0)] is not sensitive to x0. However, it is sensitive to a and strongly sensitive to ϵ. Also, the sensitivity of the low damping case (γ=0.5) is stronger than the high damping case (γ=0). To verify the validity of this empirical method, we compute the local sensitivity analysis indices y¯θ:=E[Y(t,ω)], where Y(t,ω) solves (32) with the true parameters (again this is not feasible in practice since the true parameters are not available).

For our Langevin model, we have the corresponding drift and diffusion coefficients

b=[v,U(x)γv],σ=(0,2γkBT).

Suppose we are interested in the dependence on parameter x0. Introducing the notations xx0:=x0x(t,x0) and vx0:=x0v(t,x0), the joint system of (31) and (32) (written in differential form) is given as follows:

{x˙=v,v˙=U(x;x0)γv+2γkBTW˙,x˙x0=vx0,v˙x0=U(x;x0)xx0γvx0x0U(x;x0).
(33)

There is no stochastic term in the last equation, since the noise in the Langevin model is additive with coefficient independent of x0. Notice that (xx0,vx0)(1,0) is a fixed point for the last two equations in (33) because of

x0U(x;x0)=ax0U0[a(xx0)]=a2U0[a(xx0)]=xU(x;x0).

As a result, by choosing the initial condition to be [x(0),v(0)]=(x0,c) for an arbitrary constant c, one can claim that the solution v(t) of the Langevin model (21) is independent of x0. Thus, under this circumstance, Epeq[v(t)v(0)] is independent of x0, confirming the conclusion from our empirical a priori sensitivity analysis.

While this parameter insensitivity may seem discouraging, we can use it to our advantage to simplify the problem. That is, we can assign an arbitrary value to x0 in applying Algorithm 1 and consider estimating only θ=(a,ϵ). Once these two parameters are estimated, we can use the formula in (29) to estimate x0.

We also use this local sensitivity analysis to verify the validity of the a priori sensitivity analysis with respect to ϵ and a. Since it is difficult to analyze explicit behaviors as above, for a given realization of the Langevin model, we solve the remaining ODEs by the fourth order Runge-Kutta method. Figure 2 shows the numerical results of v¯a and v¯ϵ, where the average is estimated over 3000 realizations of the Langevin model. Based on the scales of v¯a and v¯ϵ in Fig. 2, we confirm the validity of the sensitivity of the two-point statistics Epeq[v(t)v(0)] that was found empirically. Namely, the identifiability of ϵ is stronger than a and the identifiability of the low damping case (γ=0.5) is stronger than the high damping case (γ=5.0).

FIG. 2.

The absolute value of v¯a and v¯ϵ for both γ=0.5 and γ=5.0.

FIG. 2.

The absolute value of v¯a and v¯ϵ for both γ=0.5 and γ=5.0.

Close modal

We now present several numerical results. Based on a time series of size 107 (with temporal lag h=2×103), we compare the conventional method and the proposed scheme that fits the essential statistics under γ=0.5 and 5.0, representing a low damping case and a high damping case, respectively. The true values of the remaining four parameters are set to (kBT,ϵ,a,x0)=(1,0.2,10,0). The data, in the form of a time series, are generated from the model (24) using an operator-splitting method,29 which is a strong second-order method.

1. Low damping case

We start with the conventional method. To examine the sensitivity of the estimation for (ϵ,a,x0) to the estimates T^, as a comparison, we also include the estimation in which we use the true value kBT=1 in solving (30) (we name the results of this scenario as partial estimates). The results have been listed in Table I. We clearly observe that the results of the conventional method are sensitive to the value of T^. To explain the sensitivity, recall that in solving (30), a and x0 are viewed as functions of ϵ. This suggests that the high sensitivity would occur if

ϵkBT=(Epeq[x3;T,ϵ,a(ϵ),x0(ϵ)]ϵ)1Epeq[x3;T,ϵ,a(ϵ),x0(ϵ)]kBT1,

where a(ϵ) and x0(ϵ) are given by (29) and the implicit function theorem has been used. By direct computation, we get

kBTEpeq[xn]=1(kBT)2{Epeq[xnU(x)]Epeq[xn]Epeq[U(x)]},n=1,2,.

Since the explicit formula for the partial derivative with respect to ϵ is not easy to compute, its value will be computed using a finite difference formula. Evaluating it at (kBT,ϵ,a,x0)=(1,0.2,10,0), we have

kBTEpeq[x3;T,ϵ,a(ϵ),x0(ϵ)]=9.034,ϵEpeq[x3;T,ϵ,a(ϵ),x0(ϵ)]=0.06396,

which supports our observation.

TABLE I.

Full and partial estimates of the conventional method using the equilibrium statistics (above) and the estimates using essential statistics (below): the low damping case.

kBTγϵax0
True 1.0000 0.50000 0.20000 10.000 0.0000 
Eq. stat. full estimates 0.99903 0.50003 0.16870 10.888 0.001120 
Eq. stat. partial estimates  0.50003 0.20039 10.005 0.011263 
Essential stat estimates 0.99903 0.50003 0.20004 9.9741 0.008265 
kBTγϵax0
True 1.0000 0.50000 0.20000 10.000 0.0000 
Eq. stat. full estimates 0.99903 0.50003 0.16870 10.888 0.001120 
Eq. stat. partial estimates  0.50003 0.20039 10.005 0.011263 
Essential stat estimates 0.99903 0.50003 0.20004 9.9741 0.008265 

Next, we report the estimations obtained from the new approach using essential statistics, and the results are listed in Table I. The improvement in accuracy for ϵ and a is noticeable compared with the full estimates of the conventional method. Here, we used a total of 20 essential statistics given by kA(ti;θ) for ti=0.1i, i=1,2,,20. We applied Algorithm 1 to solve an ensemble of two-dimensional (a and x0) nonlinear least-squares problems based on 300 uniformly generated random initial conditions. Figure 3 (left) shows the contour plot of the cost function together with all of the estimates. Notice that except for few outliers, most of the estimates lie along the low value of the contour of the cost function. The estimate ϵ^ reported is the average of all the estimates (excluding the outliers). To satisfy the equilibrium constraint, a(ϵ^) given by (29) is used as the estimates of a.

FIG. 3.

The contour plot of the cost function together with the estimates: low damping case (left) and high damping case (right).

FIG. 3.

The contour plot of the cost function together with the estimates: low damping case (left) and high damping case (right).

Close modal

The results from this test indicate that for the conventional method, it is difficult to obtain very accurate estimates for ϵ, unless T can be estimated accurately, e.g., by using longer time series. In contrast, the proposed approach of using essential statistics is much less sensitive to the error in T^. This approach, however, requires a pre-computing step as noted in Algorithm 1. In our numerical test, we solved the model (21) on 64 collocation nodes to evaluate the essential statistics over different values of (ϵ,a), that is, an order MC=8 Chebyshev nodes were used to construct the Θ used in (12). However, as we have previously alluded to, this can be done in parallel, and it will not become a serious issue as long as the dimension of the parameter is relatively small. As for the value of M in (12), we picked M=6. (The same scheme was applied to the high damping case.)

Another interesting issue arises when the damping parameter is large. Since the conventional method fully relies on one-point statistics with respect to equilibrium density, it is important to have high-quality independent samples. In Fig. 4(left), we show the time correlation of x for both the low damping regimes (γ=0.5) and high damping regime (γ=5.0). We observe that in the latter case, the auto-correlation function of x decays much slower, indicating a strong correlation among the samples with small lags. In this case, the estimates from the conventional method will deteriorate due to the difficulty in obtaining high quality independent sampling.

FIG. 4.

The time auto-correlation of x (left) and v (right).

FIG. 4.

The time auto-correlation of x (left) and v (right).

Close modal

2. High damping case

In this section, we focus on the high damping regime γ=5.0. We will also make a connection between Fig. 2 and the estimation using the essential statistics. As shown in Fig. 4 (left), we no longer have a fast decay of the time auto-correlation of x. As a result, without changing the sample size, significant error will occur in estimating the moment of x used in the conventional method. This can be clearly seen from the results listed in Table II. In particular, besides suffering from the sensitivity to T^, the error in estimating Epeq[xn] for n=1,2,3 also leads to inaccurate estimates for ϵ, a, and x0. We should point out an interesting fact, that is, although the marginal distribution of x at the equilibrium state is independent of γ, a large value of γ causes difficulties in estimating the moments of x in practice.

TABLE II.

Full and partial estimates of the conventional method using the equilibrium statistics (above) and the estimates using essential statistics (below): the high damping case.

kBTγϵax0
True 1.0000 5.0000 0.20000 10.000 0.0000 
Eq. stat. full estimates 0.99949 4.9993 0.69579 5.4717 0.12990 
Eq. stat. partial estimates  4.9993 0.25931 8.9295 −0.0034460 
Essential stat estimates 0.99949 4.9993 0.21160 9.8663 −0.01822 
kBTγϵax0
True 1.0000 5.0000 0.20000 10.000 0.0000 
Eq. stat. full estimates 0.99949 4.9993 0.69579 5.4717 0.12990 
Eq. stat. partial estimates  4.9993 0.25931 8.9295 −0.0034460 
Essential stat estimates 0.99949 4.9993 0.21160 9.8663 −0.01822 

Table II and Fig. 3 (right) show the numerical results using the essential statistics. Although the relative error is not as small as in the low damping case, it is still lower than the estimates from the conventional method. Here is how our method is implemented: Similar to the low damping case, (29) is used in estimating the value of a based on the estimates ϵ^. The 20 essential statistics here are given by kA(ti;θ) for ti=0.04i+0.2, i=1,2,,20 [suggested by the time auto-correlation of v shown in Fig. 4 (right)], which are in a much shorter time interval. Compared to the low damping case, the loss of accuracy can be verified based on the sensitivity analysis (Fig. 2), which indicated that the parameters (a,x0) in this regime are less identifiable when γ is large.

Our next example is a gradient system driven by white noise. We consider a two-dimensional stochastic system as follows:

dx(t)=CV(x)dt+2kBTdWt,
(34)

where Wt is a two-dimensional Wiener process, V is a potential energy, and C is a matrix defined by

C=(1dd1),d(1,1).

The generator L of the system (34) is given by

Lf=(Vx1dVx2)fx1(dVx1+Vx2)fx2+kBTΔf.
(35)

We choose a triple-well potential function V similar to the model in Ref. 30,

V(x1,x2)=v(x12+x22)(1γ)v[(x12a)2+x22](1+γ)v[(x1a)2+(x2a3)2]+0.2[(x1a)2+(x2a/3)2],
(36)

with

v(z)=10exp(1z2a2)χ(a,a)(z),zR,

where χ(a,a)(z) denotes the characteristic function over the interval (a,a). Notice that the matrix C is positive definite. The additional quadratic term 0.2[(x1a)2+(x2a/3)2] in the triple-well potential (36) is, again, a smooth retaining potential. It is well known1 that the triple-well model (34) yields an equilibrium distribution given by

peq(x)exp(V(x)kBT)xR2,
(37)

which is independent of parameter d, the off-diagonal element of C.

In the numerical tests, we set (d,a,kBT,γ)=(0.5,1,1.5,0.25) as the true values of the parameters. Figure 5 shows the contour plot of the potential and the scatter plot of the time series under this set of parameters. To generate the data from (34), we applied the weak trapezoidal method introduced in Ref. 31, which is a weak second-order method. In the remainder of this section, we will choose appropriate essential statistics from which {d,T} can be directly estimated and the remaining parameters, {a,γ}, will be estimated using an appropriate least-square problem.

FIG. 5.

The contour plot of the potential (left) and the scatter plot of the solution with 103 sample points (right).

FIG. 5.

The contour plot of the potential (left) and the scatter plot of the solution with 103 sample points (right).

Close modal

Similar to the Langevin model, we consider an external forcing that is constant in x. Subsequently, the perturbed dynamics is given by

dx(t)=[CV(x)+f(t)δ]dt+2kBTdWt,
(38)

where |δ|1. If we select A(x):=x as the observable, the corresponding linear response operator reads

kA(t;θ)=Epeq{A[x(t)]B[x(0)]},

where B=(B1,B2) with

Bi(x1,x2)=1kBTxiV(x1,x2),i=1,2,

given by (3). As a result, the entries of the linear response operator kA(t;θ) satisfy

k(i,j)(t;θ):=1kBTEpeq{xi(t)Vxj[x1(0),x2(0)]},i,j=1,2,
(39)

where Vxi denotes the partial derivative of V with respect to xi. Using integration by parts, one can show that

RxiVxi(x1,x2)exp[V(x1,x2)/kBT]dxj=δijkBTRexp[V(x1,x2)/kBT]dxj,i,j=1,2,

which leads to

ki,j(0;θ)=δij.
(40)

This is known as the equipartition of the energy in statistical mechanics.

Notice that the response operators k(i,j)(t;θ) are not accessible due to the fact that the function Vxi(x,θ) depends on unknown true parameter values (kBT,a,γ). This is precisely one of the issues raised in Remark 2.4. For gradient flow systems, this problem can be overcome by introducing a linear transformation to another set of two-point statistics. We define

mi,j(t;θ):=Epeq[xi(t)xj(0)],i,j=1,2
(41)

and consider their time derivatives. Following the same calculations that led to (26), we obtain the following identities for t>0:

ddtmi,j(t;θ)=kBT[kj,i(t;θ)+(1)i1dkj,3i(t;θ)],i,j=1,2.
(42)

It is more helpful to rewrite (42) into a linear system for t>0,

kBT(1d00001dd10000d1)(k1,1(t)k1,2(t)k2,1(t)k2,2(t))=(m1,1(t)m1,2(t)m2,1(t)m2,2(t)),
(43)

where the coefficient matrix is non-singular since d(1,1). This linear relationship suggests that one can consider fitting mi,j in place of ki,j, since the former is numerically accessible.

Let t0+ and apply (40) to the first and the third equations of (43) under θ=θ, we obtain the following estimates for d and T:

kBT=m1,1(0+;θ);d=m2,1(0+;θ)/kBT,
(44)

where mi,j(0+;θ) is computable from the available sample. Equation (44) reduces the problem into a two-dimensional problem of estimating (a,γ) only.

As we have pointed out earlier, the linear response operator, k(i,j)(t;θ)(39), is not directly accessible. Motivated by the linear relation in (43), we consider to infer the values (a,γ) from the two-point statistics mi,j(t;θ)(41). In particular, we choose m1,1(ti;θ) with ti=0.1i, i=1,,20, as the essential statistics in our numerical test.

In light of the fact that m1,1(t;θ) is the two-point statistics of x1, we can apply the empirical a priori sensitivity test by checking the dependence of k^(t,θ) over the training collocation nodes θ=(a,γ)Θ, which are available to us from Step 2 of Algorithm 1. In this numerical experiment, the training used 25 collocation nodes, θΘ. To validate this empirical sensitivity test, we also perform the local sensitivity analysis as described in Sec. IV B 2 to parameters a and γ (which, again, is not possible in practice since the true parameters are unknown). The results are shown in Fig. 6 (left). From the scales of x¯a and x¯γ (average over 2000 realizations of x), we can see that this two-point statistic, m1,1(t;θ), is quite sensitive to the parameters γ and a, with stronger dependence on the parameter a.

FIG. 6.

Local sensitivity test of x1 with respect to parameter a and γ (left) and the contour plot of the cost function together with the estimates (right).

FIG. 6.

Local sensitivity test of x1 with respect to parameter a and γ (left) and the contour plot of the cost function together with the estimates (right).

Close modal

Based on a time series of size 4×106 (with time lag h=1×103), we implemented Algorithm 1 with M=4 and collocation nodes, Θ, constructed from order MC=5 Chebyshev nodes in (12).

The estimates are shown in Table III. We also show in Fig. 6 (right) the contour plot of the cost function together with the estimates based on 300 uniformly generated initial guesses. Notice that the true parameter value does not lie on the lowest contour value and these discrepancies are due to the surrogate modeling and quality of the samples. However, the estimates a^ and γ^ are still reasonably accurate, as reported in Table III. We should point out that the estimates a^ and γ^ reported in this table are the average of all the estimates (excluding the outliers), while the estimates kBT^ and d^ are obtained by solving (44). The contour plot also confirms the sensitivity analysis, which suggested that estimating a is easier than γ.

TABLE III.

The estimation results: the Triple-Well model.

aγkBTd
True 1.0000 0.2500 1.5000 0.5000 
Estimates 1.0249 0.2463 1.4946 0.5077 
aγkBTd
True 1.0000 0.2500 1.5000 0.5000 
Estimates 1.0249 0.2463 1.4946 0.5077 

One interesting question is whether the approximate values of the parameters can reproduce the essential statistics, which is useful in predicting non-equilibrium averages in the presence of external forces. Using the estimates reported in Table III, we compare the resulting two-point statistics used in estimating the parameters, m1,1(t;θ^), with the corresponding true statistics, m1,1(t;θ) (see Fig. 7, left). In Fig. 7 (right), we also compare the estimated response operator, k1,1(t,θ^), and the true response operator, k1,1(t,θ). Excellent agreement is found.

FIG. 7.

The recovery of the response statistics Epeq[x1(t)x1(0)] (left) and the two-point statistics Epeq{x1(t)Vx1[x1(0),x2(0)]} (right). All the trajectories have been normalized such that they start at point (0,1).

FIG. 7.

The recovery of the response statistics Epeq[x1(t)x1(0)] (left) and the two-point statistics Epeq{x1(t)Vx1[x1(0),x2(0)]} (right). All the trajectories have been normalized such that they start at point (0,1).

Close modal

This paper presented a parameter estimation method for stochastic models in the form of Itô drift diffusions. To infer parameters that do not appear in the equilibrium density function, we proposed to fit two-point essential statistics formulated using the fluctuation dissipation theory. Building upon the framework established in our previous work,5 we formulated the problem as a nonlinear least-squares problem subjected to a dynamical constraint when the parameters cannot be estimated directly by solving the corresponding statistical constraints. To avoid expensive computational cost in evaluating the essential statistics at each iteration when the Gauss-Newton method is used, we proposed to solve an approximate least-squares problem based on the polynomial surrogate modeling approach. This approach is motivated by the fact that the sampling error cannot be avoided in computing the value of essential statistics and the essential statistics are smooth functions of the parameters. We guaranteed the existence of minimizers of the approximate least-squares problem that converge to the solution of the true least-squares problem that involves the essential statistics under the assumption that these statistics are smooth functions of the parameters. We also showed that the polynomial approximate least-squares problem has a Jacobian that is full rank almost everywhere, which implies the local convergence of Gauss-Newton solutions. We tested the proposed methods on two examples that belong to two large classes of stochastic models—a Langevin dynamics model and a stochastic gradient system. In general, we expect that the parameter estimation procedure should be carried out as follows:

  1. Reduce the parameter space by direct estimation using appropriate statistics that are easily computable, whenever this is possible.

  2. Based on the observable of interest, the functional form of the equilibrium density, the external forcing term, and the available data, identify appropriate essential statistics for the remaining parameters using the a priori sensitivity analysis test. In our implementation, we compare the essential statistics computed on the training parameter values (e.g., collocation nodes). A more elaborate global sensitivity analysis technique such as the Sobol index28 can be used to determine the parameter identifiability.

  3. For the remaining parameters, formulate a nonlinear least-squares problem using the appropriate essential statistics at some training parameter values.

  4. Apply Algorithm 1 to obtain the estimates θ^ of the true parameter values θ.

  5. Apply a sensitivity analysis, such as the local sensitivity analysis as discussed in Sec. IV B 2, as a posteriori confidence check for the estimates. If the parameters are found to be insensitive to the selected response statistics, go back to Step 3 and choose a different set of response statistics.

One of the restrictions with our formulation is to be able to compute B in (7), which may require knowledge of (some of) the true values θ and this is not feasible in general. As a result, the non-linear system (8) cannot be evaluated without knowing the true parameters. For the Langevin dynamics, we found that the parameter kBT in B can be estimated with the equilibrium statistics, variance of v. For the gradient flow problem, there exists an invertible linear transformation between the linear response statistics that are not computable and other two-point statistics that can be estimated without knowing the true parameter values. Since numerical Algorithm 1 does not rely on the FDT formulation, one can apply it on any available two-point statistics, e.g., time auto-correlations of the solution. We are not aware of a general routine that bypasses this difficulty. However, as we have demonstrated, at least for general Langevin dynamics models and stochastic gradient systems, this idea will work.

While the proposed method requires the knowledge of the equilibrium density function of the unperturbed system, one can relax this condition and estimate it from the data. For low-dimensional problems, one can use the non-parametric kernel density estimation method.32,33 If the invariant distribution of the dynamical system can be characterized by a density function defined on a smooth manifold (embedded in a high-dimensional phase space) that the data lie on or are close to, then one can estimate the density function by applying the theory of the kernel embedding of distribution34 on a Hilbert space with basis functions defined on the data manifold. Analogous to the conditional density estimation proposed in Refs. 37 and 38, these data-driven basis functions can be obtained via the diffusion maps algorithm.35,36 Another alternative to nonparametric methods is to consider a parametric density function estimation using the moment constrained maximum entropy principle.39 For densities with moderate, say five to seven, dimensions, one can use a recently developed Equation-By-Equation (EBE) method40 for solving the moment constrained maximum entropy problems. The source code of the EBE scheme is available at Ref. 41.

One of our ultimate goals is to use this method for parameter estimation of molecular modeling, in which a Langevin type of model is usually used. The potential energy typically involves a large set of parameters, e.g., the stiffness constants associated with bond stretching and deformation of bond angles, as well as the coefficients in the van der Waals and screened electro-static interactions. The selection of damping coefficients is also non-trivial; see the review paper42 for various perspectives. Compared to the existing methods, the novelty of our approach is that we use the response statistics to formulate the parameter estimation problem, which can reveal parameters that do not appear in the equilibrium density. In addition, the polynomial surrogate model provides an efficient mean to solve the nonlinear least-squares problem.

To achieve this goal, however, there are some remaining challenges. For large parameter space, we suspect that one needs to combine the existing methods to compute the parameters associated with the equilibrium density as one way to reduce the problem (as suggested in Step 1 above) with the proposed method to estimate the remaining parameters. Second, the proposed method requires high quality and possibly large amount of samples for accurate evaluation of the essential statistics. However, since the essential statistics depend on the choice of the observable and external forcing, approximating the high-dimensional integral can be avoided so long as the dimensions of the ranges of these two functions are small. Finally, the underlying model might be subject to modeling error. For example, the model may be derived from a multiscale expansion or just empirically postulated. The formulation of the response statistics in the presence of modeling error will be investigated in separate works.

This research was partially supported by the National Science Foundation (NSF) (Grant No. DMS-1619661). X.L. is also supported by the NSF (Grant No. DMS-1522617). J.H. is also supported by the Office of Naval Research (ONR) (Grant No. N00014-16-1-2888).

In Sec. III B, we presented the convergence result Theorem 3.1 and included the proof for the one-dimensional case. Here, we show the proof for the general N-dimensional case. We will first generalize Lemmas 3.33.5 to the N-dimensional case.

Lemma A.1
Let fC([1,1]N). For the coefficient of the corresponding least-squares approximation αk, we have
limλ+αk+λei(ki+λ)=0,
where ki denotes the ith component of k and ei is the ith unit vector.
Proof.
To apply Lemma 3.3, we introduce the single-variable function
g(θi):=[1,1]N1f(θ)Pk(θ)Pki(θi)dμ(θ1)dμ(θi1)dμ(θi+1)dμ(θN).
It is easy to check that g(θi)C([1,1]) and
11g(θi)Pki+λ(θ)dμ(θi)=αk+λeij.
We consider the least-squares approximation of g(θi) and apply Lemma 3.3 to the corresponding coefficients. We are able to draw the conclusion.
Lemma A.2

Let fC([1,1]N) with >32N and consider its least-squares approximation fM given by (11). Then, we have ffM0 as M+.

Proof.
From the proof of Lemma 3.4, it is enough to show that {fM} is a Cauchy sequence with respect to L-norm. For m>n, we have
fmfnλ=n+1m{k;k=λ}|αk|Pkλ=n+1m(λ+12)N2{k;k=λ}|αk|,
where we have used the fact that for k=λ,
Pk=i=1NPki=i=1NPkii=1Nki+12(λ+12)N2.
We further notice the multi-index set {k;k=λ} has total (λ+1)NλN=O(λN1) elements, and by Lemma A.1, we know
|αk|=o(λ),λ=k,
for λ large enough. Thus, for large enough m>n, we have
fmfnCλ=n+1m(λ+12)N2λN1λ,
where C is a constant that only depends on N and . Finally, with >32N, we know {fM} is indeed a Cauchy sequence w.r.t. to L-norm, which completes the proof.

To generalize Lemma 3.5, it is enough to consider the partial derivative case, that is, finding the condition for fxiMfxi0 as n+.

Lemma A.3
For fC([1,1]N) with >32N+2, we consider its least-squares approximation fM, which is given by (11). Then, we have
limM+xi(fMf)=0,i=1,2,,N.
Proof.
Similar to the proof of Lemma A.2, we first show that {fxiM} is a Cauchy sequence. For m>n, we have
fximfxinλ=n+1m{k;k=λ}|αk|xiPkλ=n+1m12λ(λ+1)(λ+12)N2{k;k=λ}|αk|,
where we have used the inequality (16). By Lemma A.1, similar to the proof of Lemma A.2, we know fC([1,1]) provides
fximfxinCλ=n+1m12λ(λ+1)(λ+12)N2λN1λ,
where C is a constant that only depends on N and . Finally, with >32N+2, we are able to conclude that {fxiM} is a Cauchy sequence with respect to L-norm. With the same trick in the proof of Lemma 3.5, one can show that fxiM indeed converges to fxi uniformly on [1,1].

Now, we are ready to prove Theorem 3.1.

Proof.
First, by Lemmas A.2 and A.3, the regularity assumption simply provides
limM+fMf=0,limM+fMf=0.
To construct a similar function FM used in the proof of Proposition 3.7, we introduce the notation
minθ[1,1]Ni=1N(fiM)2(θ)=:i=1N(eiM)2,eM:=(e1M,e2M,,eNM)
and define FM:=fMeM. First, we show the existence of a sequence of solutions, {θM}, that are close to θ by checking the hypothesis of Lemma 3.6 to FM at θ. Since f(θ) is invertible, by continuity, for some r>0 small enough, there exists positive constants γ such that
f(θ)1γ2,θB(θ,r).
Later, we will specify r for the convergence of θMθ as M.
We further notice that FM=fM and fM converges to f uniformly. This implies that there exists L1 such that M>L1
FM(θ)1γ,θB(θ,r).
(A1)
By the definition of eM, we know at θ=θ
FM(θ)=fM(θ)eM2fM(θ),
which leads to
limM+FM(θ)limM+2fM(θ)=2f(θ)=0.
Thus, there exists L2 such that M>L2
FM(θ)<rγ.
(A2)
Let M>max{L1,L2}=:L, and combine conditions (A1) and (A2), by Lemma 3.6 to FM at θ, we conclude that FM(θ)=0 has a solution in B(θ,r). We denote such solution as θMM>L, which is a minimizer of the approximated least-squares problem (13).
Our next task is to show that θM converges to θ. Notice that in the multi-dimensional case, we do not have the mean-value theorem. As a remedy, we introduce the following matrix-valued function:
J(θ;θ):=01f[θ+t(θθ)]dt.
From the definition, we know that J(θ;θ)=f(θ), which is invertible and
f(θ)f(θ)=01f[θ+t(θθ)](θθ)dt=J(θ;θ)(θθ).
(A3)
Since J(θ;θ)1 exists at θ=θ and it is a continuous function of θ close to θ, we are able to select r>0 [the same r in (A1) and (A2)] such that J(θ;θ)1 exists and
J(θ;θ)1<Δ,θO(θ,r),
for some positive constant Δ. Let θ=θM in (A3), and we obtain
θMθJ(θM;θ)1f(θM)f(θ)<Δf(θM)Δ[f(θM)fM(θM)+fM(θM)]Δ(ffM+fM(θ)).
Thus,
limM+θMθlimM+Δ[ffM+fM(θ)]=0.
As for the residual error fM(θM), we know
limM+fM(θM)limM+fM(θ)=0.
This concludes the proof.

The full rank condition of the Jacobian matrix is usually required in the local convergence theorem of Newton-like methods. We are going to discuss this issue over a finite dimensional function space ΓNM defined by

ΓNM:=span{θ1k1θ2k2θNkN|ki=0,1,,M}.

Since the zero element in ΓNM is the function that is identically zero, we can introduce the following definition of linear independence.

Definition B.1
{f1,,fn}ΓNM is said to be a linearly independent set if
i=1ncifi0ci=0,i=1,,n.

We define the differential operator Dθi:ΓNMΓNM, which is as follows:

Dθif:=θif,i=1,N.

Now, let us deduce some relevant properties of such an operator.

Lemma B.2

The operators {Dθi}i=1N satisfy

  1. rank(Dθi)=M(M+1)N1;

  2. ij, there exists a permutation Pij satisfying Dθj=PijDθiPij.

Proof.
We first prove rank(Dθ1)=M(M+1)N1. It is enough to show that the dimension of the null space of Dθ1 is (M+1)N1, that is, dim[ker(Dθ1)]=(M+1)N1. With the help of multi-indices, one can see that {θk|kM} provides a natural basis for ΓNM, and
Dθ1(θk)0k1=0,
which yields dim[ker(Dθ1)]=(M+1)N1. We further notice that any 2-cycle (i,j) in the symmetric group SN induces a permutation Pij over the basis
Pij:θikiθjkjθjkjθiki.
This permutation leads to the identity Dθj=PijDθiPij, which implies that rank(Dθi)=rank(Dθ1)=M(M+1)N1 for i=1,,N.

The following lemma will also be useful to prove our main result.

Lemma B.3
Let i=1NλiDθi be a non-trivial linear combination of {Dθi}i=1N, then
rank(i=1NλiDθi)M(M+1)N1.
Proof.
Let Λ=i=1NλiDθi be a non-trivial linear combination of {Dθi}i=1N, i.e., λj0. Since Λ is also a linear operator over ΓNM, we have
rank(Λ)=dim[ker(Λ)]=(M+1)Ndim(kerΛ)=(M+1)Ndim[ker(Λ)].
We notice that the image of θk can be decomposed into
Λ(θk)=λjDθj(θ1k1θjkjθNkN)+ijλiDθi(θ1k1θjkjθNkN)=λjkjθ1k1θjkj1θNkN+θjkjR(θ1,,θj1,θj+1,,θN),
where R:=ijλiDθi(θ1k1θj1kj1θj+1kj+1θNkN) is a polynomial independent with θj. It is easy to see that if kj0, the image cannot be identically zero. This implies that ker(Λ)span{θk,kj=0}; thus, rank(Λ)M(M+1)N1.

To gain intuitions, one can consider the following example with N=2. One can select the following ordered basis for Γ2M

B:={1,θ2,θ22,,θ2M,θ1,θ1θ2,,;θ1θ2M,θ12,,θ1Mθ2M}.

We can derive the matrix representation of Dθ1 and Dθ2

Dθ1=(0I02IMI0),Dθ2=(EEE),

where IRM+1×M+1 is the identity matrix and ERM+1×M+1 is defined as follows:

E:=(0102M0).

The linear combination Dθ1+λDθ2 satisfies

Dθ1+λDθ2=(λEIλE2IMIλE)

and rank(Dθ1+λDθ2)M(M+1) for λR.

We now turn to our discussion of the full rank condition of the Jacobian matrix,

J(θ):=(fiθj)i=1,,K,j=1,,N,
(B1)

where fiΓNM and KN. Let f be the column vector [f1(θ),,fK(θ)], then f(ΓNM)K, which is the K-time product space of ΓNM. We can rewrite the Jacobian matrix in the following way:

J(θ)=(Dθ1f,,DθNf),

where Dθif is defined by operating Dθi on each component of f. With the above observation, we are ready to prove Theorem 3.8.

Proof of Theorem 3.8.
By the definition, we know that the column space of J(θ) is generated by
{Dθ1f,,DθNf}(ΓNM)K.
Considering a non-trivial linear combination of those vectors denoted by i=1NciDθif, then we have
i=1NciDθif0(i=1NciDθi)f0{fi}i=1Kker(i=1NciDθi).
By Lemma B.3, we know that the dimension of ker(i=1NciDθi) is less than or equal to (M+1)N1, while dim[span(F)]>(M+1)N1. Therefore, the column space of J(θ) has full rank in (ΓNM)K. Furthermore, this implies that p(θ):=det[J(θ)J(θ)] is not a zero polynomial. Since N(J) corresponds with the root of p(θ), by standard results in algebra, we know that N(J) is a finite set for N=1 and a nowhere dense set for N2.

With a change of variables [q=a(xx0), p=av], we reduce (21) into

{q˙=p,p˙=U0(q)γp+2γkBTW˙,U0(q)=ϵ(e2q2eq+0.01q2).
(C1)

To verify the ergodicity of (C1), we apply Theorem 3.2 in Ref. 15, which requires the potential U0 satisfying the following two conditions:

  1. U0(q)0 for all qR.

  2. There exists an α>0 and β(0,1) such that
    12U0(q)qβU0(q)+γ2β(2β)8(1β)q2α.
    (C2)

For the first condition, we simply notice that adding a constant term in U0 does not affect the dynamic or the equilibrium distribution. Replacing U0 in (C1) by

U0(q)=ϵ(e2q2eq+0.01q2+1),
(C3)

we have

ϵ(e2q2eq+0.01q2+1)=ϵ[(eq1)2+0.01q2]0.

For the second condition, substituting U0 in (C2) by (C3), the inequality becomes qR

qe2q+qeqβe2q+2βeq+[0.01(1β)γ2β(2β)8ϵ(1β)]q2α+β.

It is enough to show that

  1. the function f(q)=qe2q+qeqβe2q+2βeq has a lower bound, and

  2. for any given γ,ϵ>0, β(0,1) such that
    0.01(1β)γ2β(2β)8ϵ(1β)>0.
    (C4)

Since f(q) is continuous, by checking its limits on both sides

limq+f(q)=0,limqf(q)=limq(q+β)[(eq12)214]+βeq=+,

one can see that f(q) must have a lower bound. Given ϵ,γ>0, (C4) can be satisfied by taking β(0,110.080.08+γ2/ϵ). Thus, one can conclude that both two conditions can be satisfied and the Langevin dynamics (21) is indeed ergodic under all possible values of the parameters.

1.
G. A.
Pavliotis
,
Stochastic Processes and Applications
(
Springer
,
2016
).
2.
D.
Gamerman
and
H.
Lopes
, Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2nd ed., Chapman & Hall/CRC Texts in Statistical Science (Taylor & Francis, 2006).
3.
E.
Borgonovo
and
E.
Plischke
,
Eur. J. Oper. Res.
248
,
869
(
2016
).
4.
A. P.
Lyubartsev
and
A.
Laaksonen
,
Phys. Rev. E
52
,
3730
(
1995
).
5.
J.
Harlim
,
X.
Li
, and
H.
Zhang
,
J. Stat. Phys.
168
,
146
(
2017
).
6.
M.
Toda
,
R.
Kubo
, and
N.
Hashitsume
,
Statistical Physics II. Nonequilibrium Statistical Mechanics
(
Springer
,
1983
).
7.
M.
Hairer
and
A. J.
Majda
,
Nonlinearity
23
,
909
(
2010
).
8.
G.
Gottwald
,
J.
Wormell
, and
J.
Wouters
,
Physica D Nonlinear Phenom.
331
,
89
(
2016
).
9.
D.
Qi
and
A. J.
Majda
,
J. Atmos. Sci.
73
,
4609
(
2016
).
10.
A.
Majda
and
D.
Qi
,
J. Nonlinear Sci.
26
,
233
(
2016
).
11.
A.
Majda
and
D.
Qi
,
SIAM Rev.
60
,
491
(
2018
).
12.
A.
Majda
,
Introduction to Turbulent Dynamical Systems in Complex Systems
(
Springer
,
2016
).
13.
Y. M.
Marzouk
,
H. N.
Najm
, and
L. A.
Rahn
,
J. Comput. Phys.
224
,
560
(
2007
).
14.
Y.
Marzouk
and
D.
Xiu
,
Commun. Comput. Phys.
6
,
826
(
2009
).
15.
J. C.
Mattingly
,
A. M.
Stuart
, and
D. J.
Higham
,
Stoch. Process. Their Appl.
101
,
185
(
2002
).
16.
J. R.
Singler
,
Math. Comput. Model.
47
,
422
(
2008
).
17.
C. E.
Leith
,
J. Atmos. Sci.
32
,
2022
(
1975
).
18.
A.
Gritsun
,
G.
Branstator
, and
A.
Majda
,
J. Atmos. Sci.
65
,
2824
(
2008
).
19.
R.
Abramov
and
A.
Majda
,
J. Nonlinear Sci.
18
,
303
(
2008
).
20.
R.
Abramov
and
A.
Majda
,
J. Atmos. Sci.
66
,
286
(
2009
).
21.
A.
Majda
and
X.
Wang
,
Commun. Math. Sci.
8
,
187
(
2010
).
22.
R.
Abramov
and
A.
Majda
,
J. Phys. Oceanogr.
42
,
243
(
2011
).
23.
L.
Ma
,
X.
Li
, and
C.
Liu
,
J. Chem. Phys.
145
,
204117
(
2016
).
24.
E.
Isaacson
and
H. B.
Keller
,
Analysis of Numerical Methods
(
Courier Corporation
,
2012
).
25.
J. M.
Ortega
and
W. C.
Rheinboldt
,
Iterative Solution of Nonlinear Equations in Several Variables
(
SIAM
,
1970
), Vol. 30.
26.
C. T.
Kelley
,
Iterative Methods for Optimization
(
SIAM
,
1999
).
27.
P.
Sheppard
,
M.
Rathinam
, and
M.
Khammash
,
J. Chem. Phys.
136
,
034115
(
2012
).
28.
I. M.
Sobol
,
Math. Model. Comput. Exp.
1
,
407
(
1993
).
29.
A.
Telatovich
and
X.
Li
, e-print arXiv:1706.04237 (2017).
30.
A.
Hannachi
and
A.
O’Neill
,
Q. J. R. Meteorol. Soc.
127
,
939
(
2001
).
31.
D. F.
Anderson
and
J. C.
Mattingly
,
Commun. Math. Sci.
9
(
1
),
301
318
(
2009
).
32.
M.
Rosenblatt
,
Ann. Math. Stat.
27
,
832
(
1956
).
33.
E.
Parzen
,
Ann. Math. Stat.
33
,
1065
(
1962
).
34.
A.
Smola
,
A.
Gretton
,
L.
Song
, and
B.
Schölkopf
, in International Conference on Algorithmic Learning Theory (Springer, 2007), pp. 13–31.
35.
R. R.
Coifman
and
S.
Lafon
,
Appl. Comput. Harmon. Anal.
21
,
5
(
2006
).
36.
T.
Berry
and
J.
Harlim
,
Appl. Comput. Harmon. Anal.
40
,
68
(
2016
).
37.
T.
Berry
and
J.
Harlim
,
Mon. Wea. Rev.
145
,
2833
(
2017
).
38.
S.
Jiang
and
J.
Harlim
, e-print arXiv:1804.03272.
39.
E.
Jaynes
,
Phys. Rev.
106
,
620
(
1957
).
40.
W.
Hao
and
J.
Harlim
,
Commun. Appl. Math. Comput. Sci.
13
,
189
(
2018
).
41.
W.
Hao
and
J.
Harlim
, “Supplementary material: MATLAB software for the equation-by-equation method for solving the maximum entropy problem,” see https://github.com/whao2008/EBE (2018).
42.
W.
Noid
,
J. Chem. Phys.
139
,
090901
(
2013
).