This paper presents a numerical method to implement the parameter estimation method using response statistics that was recently formulated by the authors. The proposed approach formulates the parameter estimation problem of Itô drift diffusions as a nonlinear least-squares problem. To avoid solving the model repeatedly when using an iterative scheme in solving the resulting least-squares problems, a polynomial surrogate model is employed on appropriate response statistics with smooth dependence on the parameters. The existence of minimizers of the approximate polynomial least-squares problems that converge to the solution of the true least square problem is established under appropriate regularity assumption of the essential statistics as functions of parameters. Numerical implementation of the proposed method is conducted on two prototypical examples that belong to classes of models with a wide range of applications, including the Langevin dynamics and the stochastically forced gradient flows. Several important practical issues, such as the selection of the appropriate response operator to ensure the identifiability of the parameters and the reduction of the parameter space, are discussed. From the numerical experiments, it is found that the proposed approach is superior compared to the conventional approach that uses equilibrium statistics to determine the parameters.
In many dynamical systems, some of the model parameters may not be identifiable from their equilibrium statistics. For problems where the parameters can be identified from their two-point statistics, this paper provides a method to numerically estimate these parameters. Mathematically, we formulate the inverse problem as a well-posed dynamic constrained nonlinear least-squares problem that fits a set of linear response statistics under small external perturbations. Since solving this problem directly is computationally not feasible, we consider a polynomial surrogate modeling to approximate this least-squares problem. Convergence of the solution of the approximate problem to the solution of the original nonlinear least-squares solution is established under appropriate regularity assumptions. Supporting numerical examples on two classes of models with a wide range of applications, the Langevin dynamics and the stochastically forced gradient flows are given.
I. INTRODUCTION
Parameter estimation is ubiquitous in modeling of nature. The goal in this inverse problem is to infer the parameters from observations with sufficient accuracy so that the resulting model becomes a useful predictive tool. Existing parameter estimation techniques often belong to one of the following two classes of approaches, the maximum likelihood estimation1 and the Bayesian inference such as Markov chain Monte-Carlo (MCMC),2 depending on the availability of prior information about the parameters. Since the model parameters are usually not directly measured, the success of any inference method depends crucially on the identifiability of the parameters from the given observations. When the dependence of the observations on the parameters is implicit, that is, through the model, sensitivity analysis (see, e.g., the review article3) is often a useful practical tool to determine the parameter identifiability.
In this paper, we consider the parameter estimation of ergodic stochastic differential equations driven by Brownian noise. When the corresponding invariant measure of the dynamical systems has an explicit dependence on the parameters, then these parameters can usually be inferred from appropriate equilibrium statistical moments. A popular approach that exploits this idea is the reverse Monte Carlo method4 developed in chemistry. In particular, it formulates an appropriate nonlinear least-squares system of integral equations by matching the equilibrium averages of some pre-selected observables. Subsequently, Newton’s iterations are used to estimate the parameters. At each iterative step, samples at the current parameter estimate are generated for Monte-Carlo estimation to approximate the least-squares integral equations and the corresponding Jacobian matrix. In practice, this method can be rather slow due to the repeated sampling procedure. A severe limitation of this approach is that it is restrictive to inference of parameters of the equilibrium density.
This limitation can be overcome by fitting appropriate two-point statistics. As shown in our previous work,5 we formulated the parameter estimation problem based on the linear response statistics subjected to an external forcing, which drives the system out of equilibrium. The fluctuation-dissipation theory (FDT), a hallmark in non-equilibrium statistical physics, suggested that the changes of the average of an observable under small perturbations can be approximated by appropriate two-point statistics, called the FDT response operators.6 The key point is that these FDT response operators can be estimated using the available samples of the equilibrium unperturbed dynamics so long as we know the exact form of the invariant measure of the dynamics, which we will assume to be the case for a large class of problems, such as the Langevin dynamics and stochastic gradient flows. We should point out that the proposed approach relies on the validity of FDT response statistics, which has been studied rigorously for a large class of stochastic system that includes the stochastic differential equation (SDE) setup in this paper.7 For deterministic dynamics, one can use, for example, the statistical technique introduced in Ref. 8 to verify the validity of the FDT linear response. Following the ideas from Refs. 9–11, we developed a parameter inference method using these response operators in Ref. 5. While the method in Refs. 9–11 involves minimizing an information-theoretic functional that depends on both the mean and variance response operators, our approach fits a finite number of quantities, to be referred to as the essential statistics, which would allow us to approximate the FDT response operators, of appropriate observables (beyond just the mean and variance).
In our previous work,5 we showed the well-posedness of the formulation on three examples, in the sense that under infinite sampling, appropriate choices of essential statistics will ensure the identifiability of the model parameters. On one of these examples, the simplified model for turbulence,12 which is a nonlinear system of SDEs with a Gaussian invariant measure, we were able to explicitly determine the choice of observables and external forcings (which in turn determine the FDT response operators) that allow us to identify all of the model parameters uniquely. In this case, one can directly estimate the parameters by solving the system of equations involving these essential statistics. While this result suggests that the choice of observables and external forcings is problem dependent, the FDT theory provides a guideline for choosing the appropriate two-point statistics for a well-posed parameter estimation. On another example, the Langevin dynamics (which will be discussed in Sec. IV), while the parameters can be identified by the essential statistics as shown in Ref. 5, the formulation suggests that one must include essential statistics that involve higher-order derivatives of the FDT response operators. This can be problematic because these higher-order statistics are rarely available in practice unless the available data are solutions of high-order SDE solvers. Given this practical issue, we propose to fit only the essential statistics that include the zeroth- and first-order derivatives of the FDT response operator (if possible, use only the zeroth-order derivative information). The price paid by restricting to only the lower-order essential statistical information is that the dependence of the essential statistics on some of the model parameters becomes implicit through the solutions of the dynamical model and the identifiability of these parameters becomes questionable.
In this paper, we devise a concrete numerical method to estimate the parameters by solving dynamic-constrained least-squares problems of integral equations involving these low-order essential statistics. As one would imagine, the implementation of this approach will face several challenges. First, just like in the reverse Monte-Carlo method, naive implementation of an iterative method for solving this dynamically constrained least-squares problem requires solving the model repeatedly. To avoid solving the model repeatedly in the minimization steps, we employ a polynomial surrogate model approach13,14 on the least-squares cost function that involves the essential statistics. This approach is motivated by the fact that the cost function (or effectively the essential statistics) is subjected to sampling error and that the essential statistics have smooth dependence on the parameters, assuming that the Fokker-Planck operator of the underlying dynamics has smooth dependence on the parameters. Under appropriate regularity assumption on the essential statistics, we will provide a theoretical guarantee for the existence of minimizers of the approximate polynomial least-squares problem that converge to the solution of the true least-squares problem. The second related issue, as mentioned above, is that certain (lower-order) essential statistics might not be sensitive enough to some of the parameters, which will lead to inaccurate estimations. To ensure the practical identifiability of the parameters, we employ an empirical a priori sensitivity analysis based on the training data used in constructing the polynomial surrogate models and a posteriori local sensitivity analysis to ensure the validity of the a prior sensitivity analysis.
While the proposed technique can be used to infer all of the model parameters, in practice, one should use the polynomial approximation to infer parameters that cannot be estimated directly to avoid the curse of dimensionality. This issue is a consequence of using the polynomial expansion in approximating the least-squares problem, where the required training data set for constructing the surrogate model increases exponentially as a function of the parameter dimension. In the numerical examples below, we will apply the approximate least-squares problems to estimate parameters that cannot be directly estimated from matching equilibrium and/or two-point statistics that involve the first-order derivative of the FDT response operator.
The rest of the paper is organized as follows. In Sec. II, we briefly review the concept of essential statistics and parameter inference method using the linear response statistics developed in Ref. 5. In Sec. III, we present the proposed numerical algorithm based on the polynomial based surrogate model and discuss its convergence (with detailed proof in Appendixes A– C). In Secs. IV and V, we show applications on two nonlinear examples, a Langevin model and a stochastic gradient system with a triple-well potential, respectively. In Sec. VI, we conclude the paper with a summary and discussion.
II. A REVIEW OF ESSENTIAL STATISTICS
We begin by reviewing the parameter estimation formulation introduced in our previous paper.5 Consider an -dimensional Itô diffusion
where the vector field denotes the drift and is the diffusion tensor. Both coefficients are assumed to be smooth functions in and ; represents the standard Wiener process, and the variable contains model parameters, where is the parameter domain. In this paper, we only consider the case where is bounded and for simplicity, we assume . Throughout this paper, we denote the underlying true parameter value as and the corresponding estimate as . We assume that Eq. (1) is ergodic (see Ref. 15 for precise conditions) with equilibrium density for all , and smoothly depends on both and . When , we assume that we have the access to the explicit formula of as a function of only, which is shortened as . As we shall see in many applications, only depends on a subset of the parameters .
The main idea of the parameter estimation formulation introduced in Ref. 5 is to infer the parameters from the linear response operator associated with the fluctuation dissipation theory (FDT). Recall that, in the FDT setting, we are interested in an order- () external perturbation of the form , such that the solutions of the perturbed dynamics
which would otherwise remain at the equilibrium state of the unperturbed dynamics (1) and can be characterized by a perturbed density . The density function is governed by the corresponding Fokker-Planck equation under initial condition .
By a standard perturbation technique, e.g., Ref. 1, the difference between the perturbed and unperturbed statistics of any integrable function can be estimated by a convolution integral, that is,
In (2), the term is known as the linear response operator. The FDT formulates the linear response operator as the following two-point statistics:
where and denote the components of and , respectively. We should point out that the validity of the FDT has been studied rigorously under mild conditions.7 A more explicit form is given by
where is the solution of the Fokker-Planck equation
Here, denotes the generator of the unperturbed dynamics (1) and is its adjoint operator in the .
The result in Ref. 16 states that if the coefficients of linear parabolic PDEs are as functions of then the weak solutions are also -time differentiable with respect to . In our case if then the linear response operator (4) is also with respect to the parameters under the mild assumption that is smooth with respect to . In Sec. III B, we will conduct a convergence analysis on the proposed parameter estimation method to determine the necessary value of the smoothness of the linear response operators as functions of .
Notice that can be determined analytically based on our assumption of knowing the explicit formula of . Given , the value of can be computed using a Monte-Carlo sum based on the time series of the unperturbed system, , at . Therefore, the linear response operator can be estimated without knowing the underlying unperturbed system in (1) so long as the time series of at and the explicit formula for this equilibrium density are known.
The FDT, along with the expression of the kernel function (3), has been implemented to predict the change of the expectation of the observable .17–22 The advantage is that the response operator is defined with respect to the equilibrium distribution, which does not involve the prior knowledge of the perturbed density . In statistical physics, the FDT is a crucial formulation to derive transport coefficients, e.g., electron conductivity, viscosity, diffusion coefficients, etc.6 In the current paper, however, we propose to use the response statistics to infer the underlying true parameter value .
Since this linear response operator is in principle infinite-dimensional, it is necessary for practical purposes to introduce a finite-dimensional approximation. The following parametric form of the response operator has been motivated by the rational approximation of the Laplace transform.23 To explain the idea, consider the Laplace transformation of denoted by , which can be approximated by a rational function in the following form:
In the time domain, can be written explicitly as
where denotes the -by- identity matrix. Here, stands for the order of the rational approximation. In Ref. 23, the coefficients in the rational approximations were determined based on certain interpolation conditions, and such a rational approximation has been proven to be an excellent approximation for the time correlation function. For the current problem, will be determined using finitely many essential statistics defined as follows.
The values are called essential statistics if they are sufficient to approximate the response operator up to order- using (6), for and where indicates the order of derivatives.
In Ref. 5, we have considered the derivatives of at for a Langevin dynamics model and explicit formulas involving the model parameters have been found. Theoretically all the parameters can be identified from which to some extent shows the consistency. However in practice we use only the derivative of order no greater than one since estimating higher order derivative of requires time series with sufficient accuracy which is not necessarily available. By using the essential statistics correspond to lower-order derivatives at other than zero we give up the explicit expressions of the higher-order statistics with respect to the parameters. This prompted us to solve the problem in the least-squares sense. Furthermore it requires other more practical methods such as a sensitivity analysis test to determine the identifiability of the parameters from these statistics.
Figure 1 shows the performance of such a discrete representation, where the linear response operator arises from a Langevin model subjected to a constant external forcing (25), which will be discussed in Sec. IV. With such discrete representation, we reduce our problem of inferring from an infinite-dimensional to a finite number of essential statistics. To derive a system of equations involving both and those essential statistics, we introduce
We also define for ,
We should stress that (7) is not the FDT response, since the in (7) is defined with respect to . The key idea in Ref. 5 is to estimate the true parameter values by solving
Here, the term on the left-hand-side is estimated from the available sample at , and the term on the right-hand-side will be computed from the time series generated by solving (1) for a given .
The response function (solid lines) and its order-4 approximation (dashed lines).
The response function (solid lines) and its order-4 approximation (dashed lines).
Since usually depends on a subset of the true parameter value the assumption of knowing the explicit form of as a function of only is too strong. In general we cannot even compute the left-hand-side term in (8) since the term which may depend on the unknown true value is not available to us. We will discuss how to address such issue on two examples in Secs. IV and V, where each of them belongs to a class of models with a wide range of applications.
To ensure the solvability, we assume that the total number of equations in (8) is always greater than the dimension of the unknown parameters . For this over-determined scenario, we propose to solve (8) in the least-squares sense, addressing the practical issue raised in Ref. 5 when the explicit solutions are not available. The main difficulty in solving the nonlinear least-squares problem comes from evaluating , which often requires solving the model (1) repeatedly. In Sec. III, we will propose an efficient numerical method to address this issue.
III. AN EFFICIENT ALGORITHM FOR PARAMETER ESTIMATION
In Sec. II, we formulated a parameter estimation method as a problem of solving a nonlinear system (8) subject to a dynamical constraint in (1). In a compact form, we denote the system (8) as
where , . Notice that we have neglected the derivative constraints in (8). While similar algorithm and convergence result as shown below also hold if the first-order derivative constraints are included in the least-squares procedure, we will only numerically solve the least-squares problem for the zeroth-order derivative constraints as in (9). As we shall see in Secs. IV and V, we will use the derivative constraints to directly estimate some of the parameters.
Our goal is to solve (9) in the sense of least-squares, that is,
However, as we mentioned in Sec. II, we do not necessarily have the explicit expressions for and evaluating for certain values of requires solving the true model (1) to approximate the integral in by a Monte-Carlo sum. For example, if we apply the Gauss-Newton method to solve the nonlinear least-squares problem (10), the overall computation will become very expensive since we have to solve the model (1) repeatedly. As a remedy, we apply the idea of polynomial based surrogate model, which was introduced in Refs. 13 and 14 for solving Bayesian inference problems, to alleviate this part of computation. The main difference of our approach from Refs. 13 and 14 is that they used the orthogonal polynomials to expand the dynamical model, , whereas here, we will use the orthogonal polynomials to expand . While the convergence of the approximate posterior density estimate of the Bayesian inference problem is established in Ref. 14, we will end this section by analyzing the convergence of the approximate least-squares problem.
A. Polynomial based surrogate model
The key idea of our method is to approximate by a polynomial function . This is motivated by the fact that the sampling error cannot be avoided in computing the value of essential statistics, that is, there is uncertainty on the left-hand-side of (8). Thus, it is more reasonable to think of the nonlinear least-squares solution of (10) as a random variable with certain distribution over . Furthermore, since are with respect to (Remark 2.1) whenever and in (1) are with respect to , polynomial chaos expansion provides a natural choice of based on the orthogonal polynomials with respect to .
For instance, if we consider , i.e., the uniform distribution, then the corresponding orthogonal polynomials are the Legendre polynomials . They are given by , , and the recursive formula
which form a basis for . The orthogonality of over is described by
One can normalize by introducing .
For multi-dimensional cases, one can define ( represents the dimension of ), where is a multi-index with and . We can consider an order- approximation of
In practice, there are two approaches to determine the coefficient . The first approach is the Galerkin method, which requires that the errors in the approximation in (11) to be orthogonal to the finite-dimensional subspace of generated by . In our convergence analysis, we will deduce the error estimates based on the coefficients obtained through this Galerkin (or least-squares) fitting. Alternatively, one can determine the coefficient by the collocation method, that is, matching the values of at certain points of in , denoted by , which leads to the following linear equations:
Since (12) is equivalent to a polynomial interpolation, a common choice of is the product space of order- Chebyshev nodes, where [when , (12) turns into a linear least-squares problem]. In the numerical examples in this paper, we will use the collocation method to determine the coefficients.
Replacing in (10) by , we obtain a new least-squares problem for the order- polynomial based surrogate model
For easier notations, let us use and to denote the vector-valued function, , and its Jacobian matrix, respectively.
In summary, we have Algorithm 1 based on the Gauss-Newton method.
Parameter estimation: Polynomial based surrogate model
Let be a set of collocation nodes in . |
1. For each , solve (1). |
2. Estimate using a Monte-Carlo sum over . |
3. Compute the coefficients by solving the linear system (12) and obtain the approximations for |
4. For an initial guess and a threshold . Compute (14) |
while The step length do |
Repeat (14). |
end while |
return |
Let be a set of collocation nodes in . |
1. For each , solve (1). |
2. Estimate using a Monte-Carlo sum over . |
3. Compute the coefficients by solving the linear system (12) and obtain the approximations for |
4. For an initial guess and a threshold . Compute (14) |
while The step length do |
Repeat (14). |
end while |
return |
Computationally, the most expensive step is in solving (1) on the collocation nodes , which can be done in parallel.
B. The convergence of the approximate solutions
In this section, we will provide the convergence analysis of the solutions of the approximate least-squares problem in (13) to the solution of the true least-squares problem in (10) as , under appropriate conditions. We denote the solution of (13) by , that is,
Since each is a multivariate polynomial, the minimum (13) can be attained in ; but such may not be unique. For the case of and sufficiently smooth , assuming that the true parameter value is the unique solution of the true least-squares problem in (10), we can show that there exists a sequence of minimizers of the approximated problem (13) such that as . To be more precise,
Before we apply numerical method to solve the problem, a fundamental question will be: assuming , the true parameter value, is the unique solution of (10) can we find a sequence of minimizers of the approximated problem (13) such that as ? We are able to prove the following theorem, which partially answers this question.
,
Let be the solution of (10) such that
the Jacobian matrix of at denoted by is invertible.
We set since the existence of the minimizers rely on the technical assumption in Lemma 3.6. The second and third assumptions provide the well-posedness of the original nonlinear least-squares problem (10). Since we discuss the parameter estimation problem under the perfect model setting the second assumption naturally holds.
The proof of Theorem 3.1 can be found in Appendix A. Here, we will present the proof for the one-dimensional case () to illustrate the main ideas. Notice that in the one-dimensional case, the formula in (11) is reduced to
where are normalized Legendre polynomials. We start with two Lemmas, which pertain to the pointwise convergence of , and the proof can be found in Ref. 24.
Lemma 3.3 connects the smoothness of and the decay rate of the generalized Fourier coefficients . The following Lemma is a direct result of Lemma 3.3.
Here, denotes the -norm on . In the proof of Proposition 3.7, we also need as . To obtain such convergence, one needs higher regularity on . In particular, we have
Let and be the corresponding least-squares approximation given by (15). We have as .
So far, we have shown that with enough regularity of , we have the uniform convergence of both and on . To show the existence of minimizer near , the following lemma is crucial.
Let be continuously differentiable in the domain and suppose that there is an open ball and a positive such that and . Then, has a solution in .
The proof can be found in Ref. 25. In the one-dimensional case, Lemma 3.6 basically says that if a function changes with a minimum speed () on the interval and the function value at is small enough [], then the function must reach zero in this interval. With Lemmas 3.4–3.6, we are able to prove the following result, which is the one-dimensional analogy of Theorem 3.1.
This result guarantees the existence of minimizers of the approximate polynomial least-squares problem that converges to the solution of the true least-squares problem under a fairly strong condition, that is, is smooth enough. In practice (see Sec. IV), we shall see reasonably accurate estimates even when the essential statistics are not as regular as required by this theoretical estimate.
C. Convergence of the algorithm
In Algorithm 1, we are solving a least-squares problem (13). For the Gauss-Newton method, a standard local convergence result under a full rank assumption of the Jacobian matrix is well known (e.g., see Chapter 2 of Ref. 26 for details). To achieve such a local convergence in our numerical algorithm, we will now verify the full rank condition in the polynomial chaos setting.
For our problem, the Jacobian matrix is given by
where is an element of the finite-dimensional polynomial space defined by
Using linear independence over the function space and some fundamental results in algebra, we can prove the following theorem.
See Appendix B.
Here, is the set of for which is not full rank in the matrix sense. In our application is the dimension of the parameter is the total number of equations in (9), and is the order of approximation used in (11). In practice, since the orthogonal polynomials involved in (11) form a basis for , it is enough to check the rank of the coefficient matrix , where the column consists of the coefficients in (11) for all possible . The linear independence hypothesis is indeed sensible in practice since we want to avoid solving underdetermined least-squares problems.
IV. EXAMPLE I: THE LANGEVIN MODEL
For the first example, we consider a classical model in statistical mechanics: the dynamics of a particle driven by a conservative force, a damping force, and a stochastic force. In particular, we choose the conservative force based on the Morse potential
where the last quadratic term in acts as a retaining potential (also known as a confining potential), preventing the particle from moving to infinity. For this one-dimensional model, we rescale the mass to unity and write the dynamics as follows:
where is a white noise. The generator of the system (21), denoted by , is given by
The smooth retaining potential guarantees the ergodicity of the Langevin system (21) (see Appendix C for details). Namely, there is an equilibrium distribution (Gibbs measure) , given by
which is independent of . In particular, we have at equilibrium. For this Langevin model, there are a total of five unknown parameters with true values .
For this specific problem, notice that all parameters except appear in the equilibrium density function. Intuitively, this suggests that one can estimate four parameters, , from equilibrium statistics and from two-point statistics, respectively. We will present a conventional estimation method based on this guideline for diagnostic purpose. In particular, the parameter will be directly estimated from a two-point statistic as shown in Sec. IV A. Subsequently, the remaining four parameters are obtained via one-point statistics, as shown in (27) and Sec. IV B 1.
As for the proposed approach, we will consider estimating all the parameters using the essential statistics in the most efficient manner. First, two of the parameters, , will be estimated using the essential statistics in (27). Then, via a sensitivity analysis discussed in Sec. IV B 2, we will show that parameter is independent of the essential statistics. As a consequence, we formulate a least-square problem to estimate and use equilibrium statistics to estimate .
A. Reduction of the parameter space
In our previous work,5 we have suggested an approach to estimate and directly from the essential statistics. To construct such essential statistics, we consider a constant external forcing with . The corresponding perturbed system is given by
that is, in the FDT formula (3). By selecting the observable , we work with the entry of the response operator given by
where is the solution of corresponding Fokker-Planck equation (5) associated with the Langevin dynamics. Taking the time derivative of for , we obtain
Let in (26) and recall that at equilibrium, we have
where both the left-hand-sides can be computed from the sample. Thus, (27) provides direct estimates for and . As a result, the original parameter estimation problem can be reduced into estimating in the potential function , which is a non-trivial task since the dependence of on these parameters is quite complicated. With these estimates, becomes available since it only depends on , which allows one to compute in (8) and avoid the issue pointed out in Remark 2.4.
B. Parameter estimation approaches
In this subsection, we focus on the three-dimensional parameter estimation problem with . First, we review the conventional approach, which matches the equilibrium statistics of . Then, we discuss the proposed new approach using the essential statistics. In particular, we perform a sensitivity analysis to determine the identifiability of the parameters from the essential statistics. Finally, we present the numerical results including a comparison between the conventional and the new approaches.
1. Conventional method
We first look at what we can learn from the equilibrium statistics. Since the marginal distribution of at equilibrium state is proportional to , a natural idea is to match the moments of with respect to this equilibrium density. In particular, one can introduce the following three equations:
where the left-hand-sides are computed from the given data, while the right-hand-sides are treated as functions of , obtained from solving the model (24). The following pre-computation simplifies Eq. (28) into a one-dimensional problem.
To begin with, we define the probability density functions
in terms of and , respectively. With a change of variables , the normalizing constants and satisfy
As a result, the first two equations in (28) can be written into
with a unique solution
where stands for the variance of with respect to the rescaled equilibrium density . In practice, both and can be empirically estimated directly from the available data, while is a function of only. Thus, (29) can be denoted as
and (28) is reduced to a one-dimension problem for ,
Note that the left-hand-side of (30) will be estimated by Monte-Carlo averaging from the unperturbed data, whereas the right-hand-side integral will be approximated using a numerical quadrature.
2. Sensitivity analysis of the essential statistics
To ensure a successful parameter estimation, it is useful to gather some a priori knowledge of the parameter identifiability from the proposed essential statistics. This is important especially since there are non-unique choices of essential statistics to be fit and we want to ensure that we can identify the parameters from appropriate essential statistics: Recall that the essential statistics are defined based on the choices of external forcing and observable.
While local sensitivity analysis such as the pathwise derivative method described in Ref. 27 is desirable, it requires knowledge of the true parameters, which are not available to us. Essentially, the pathwise derivative method is to compute , where and the expectation is defined with respect to the density of the Itô diffusion (written in its integral form)
Since we have no explicit solutions of the density , one way to approximate is using an ensemble average of the solutions of the following equation:
Notice that this equation depends on (31) and the unknown parameters, . The dependence on implies that one cannot use this method for a priori sensitivity analysis. However, it can be used for a posteriori sensitivity analysis, evaluated at the estimates, to verify the difficulties of the particular parameter regimes.
As an empirical method for a priori sensitivity analysis, we simply check the deviation of as a function of the training collocation nodes, , which are available from Step 2 of Algorithm 1. Of course, a more elaborate global sensitivity analysis technique such as the Sobol index28 can be performed as well, but we will not pursue this here. From this empirical a priori sensitivity analysis, we found that the corresponding essential statistics, is not sensitive to . However, it is sensitive to and strongly sensitive to . Also, the sensitivity of the low damping case () is stronger than the high damping case (). To verify the validity of this empirical method, we compute the local sensitivity analysis indices , where solves (32) with the true parameters (again this is not feasible in practice since the true parameters are not available).
For our Langevin model, we have the corresponding drift and diffusion coefficients
Suppose we are interested in the dependence on parameter . Introducing the notations and , the joint system of (31) and (32) (written in differential form) is given as follows:
There is no stochastic term in the last equation, since the noise in the Langevin model is additive with coefficient independent of . Notice that is a fixed point for the last two equations in (33) because of
As a result, by choosing the initial condition to be for an arbitrary constant , one can claim that the solution of the Langevin model (21) is independent of . Thus, under this circumstance, is independent of , confirming the conclusion from our empirical a priori sensitivity analysis.
While this parameter insensitivity may seem discouraging, we can use it to our advantage to simplify the problem. That is, we can assign an arbitrary value to in applying Algorithm 1 and consider estimating only . Once these two parameters are estimated, we can use the formula in (29) to estimate .
We also use this local sensitivity analysis to verify the validity of the a priori sensitivity analysis with respect to and . Since it is difficult to analyze explicit behaviors as above, for a given realization of the Langevin model, we solve the remaining ODEs by the fourth order Runge-Kutta method. Figure 2 shows the numerical results of and , where the average is estimated over 3000 realizations of the Langevin model. Based on the scales of and in Fig. 2, we confirm the validity of the sensitivity of the two-point statistics that was found empirically. Namely, the identifiability of is stronger than and the identifiability of the low damping case () is stronger than the high damping case ().
C. Numerical results
We now present several numerical results. Based on a time series of size (with temporal lag ), we compare the conventional method and the proposed scheme that fits the essential statistics under and , representing a low damping case and a high damping case, respectively. The true values of the remaining four parameters are set to . The data, in the form of a time series, are generated from the model (24) using an operator-splitting method,29 which is a strong second-order method.
1. Low damping case
We start with the conventional method. To examine the sensitivity of the estimation for to the estimates , as a comparison, we also include the estimation in which we use the true value in solving (30) (we name the results of this scenario as partial estimates). The results have been listed in Table I. We clearly observe that the results of the conventional method are sensitive to the value of . To explain the sensitivity, recall that in solving (30), and are viewed as functions of . This suggests that the high sensitivity would occur if
where and are given by (29) and the implicit function theorem has been used. By direct computation, we get
Since the explicit formula for the partial derivative with respect to is not easy to compute, its value will be computed using a finite difference formula. Evaluating it at , we have
which supports our observation.
Full and partial estimates of the conventional method using the equilibrium statistics (above) and the estimates using essential statistics (below): the low damping case.
. | . | . | . | . | . |
---|---|---|---|---|---|
True | 1.0000 | 0.50000 | 0.20000 | 10.000 | 0.0000 |
Eq. stat. full estimates | 0.99903 | 0.50003 | 0.16870 | 10.888 | 0.001120 |
Eq. stat. partial estimates | 0.50003 | 0.20039 | 10.005 | 0.011263 | |
Essential stat estimates | 0.99903 | 0.50003 | 0.20004 | 9.9741 | 0.008265 |
. | . | . | . | . | . |
---|---|---|---|---|---|
True | 1.0000 | 0.50000 | 0.20000 | 10.000 | 0.0000 |
Eq. stat. full estimates | 0.99903 | 0.50003 | 0.16870 | 10.888 | 0.001120 |
Eq. stat. partial estimates | 0.50003 | 0.20039 | 10.005 | 0.011263 | |
Essential stat estimates | 0.99903 | 0.50003 | 0.20004 | 9.9741 | 0.008265 |
Next, we report the estimations obtained from the new approach using essential statistics, and the results are listed in Table I. The improvement in accuracy for and is noticeable compared with the full estimates of the conventional method. Here, we used a total of essential statistics given by for , . We applied Algorithm 1 to solve an ensemble of two-dimensional ( and ) nonlinear least-squares problems based on uniformly generated random initial conditions. Figure 3 (left) shows the contour plot of the cost function together with all of the estimates. Notice that except for few outliers, most of the estimates lie along the low value of the contour of the cost function. The estimate reported is the average of all the estimates (excluding the outliers). To satisfy the equilibrium constraint, given by (29) is used as the estimates of .
The contour plot of the cost function together with the estimates: low damping case (left) and high damping case (right).
The contour plot of the cost function together with the estimates: low damping case (left) and high damping case (right).
The results from this test indicate that for the conventional method, it is difficult to obtain very accurate estimates for , unless can be estimated accurately, e.g., by using longer time series. In contrast, the proposed approach of using essential statistics is much less sensitive to the error in . This approach, however, requires a pre-computing step as noted in Algorithm 1. In our numerical test, we solved the model (21) on 64 collocation nodes to evaluate the essential statistics over different values of , that is, an order Chebyshev nodes were used to construct the used in (12). However, as we have previously alluded to, this can be done in parallel, and it will not become a serious issue as long as the dimension of the parameter is relatively small. As for the value of in (12), we picked . (The same scheme was applied to the high damping case.)
Another interesting issue arises when the damping parameter is large. Since the conventional method fully relies on one-point statistics with respect to equilibrium density, it is important to have high-quality independent samples. In Fig. 4(left), we show the time correlation of for both the low damping regimes () and high damping regime (). We observe that in the latter case, the auto-correlation function of decays much slower, indicating a strong correlation among the samples with small lags. In this case, the estimates from the conventional method will deteriorate due to the difficulty in obtaining high quality independent sampling.
2. High damping case
In this section, we focus on the high damping regime . We will also make a connection between Fig. 2 and the estimation using the essential statistics. As shown in Fig. 4 (left), we no longer have a fast decay of the time auto-correlation of . As a result, without changing the sample size, significant error will occur in estimating the moment of used in the conventional method. This can be clearly seen from the results listed in Table II. In particular, besides suffering from the sensitivity to , the error in estimating for also leads to inaccurate estimates for , , and . We should point out an interesting fact, that is, although the marginal distribution of at the equilibrium state is independent of , a large value of causes difficulties in estimating the moments of in practice.
Full and partial estimates of the conventional method using the equilibrium statistics (above) and the estimates using essential statistics (below): the high damping case.
. | . | . | . | . | . |
---|---|---|---|---|---|
True | 1.0000 | 5.0000 | 0.20000 | 10.000 | 0.0000 |
Eq. stat. full estimates | 0.99949 | 4.9993 | 0.69579 | 5.4717 | 0.12990 |
Eq. stat. partial estimates | 4.9993 | 0.25931 | 8.9295 | −0.0034460 | |
Essential stat estimates | 0.99949 | 4.9993 | 0.21160 | 9.8663 | −0.01822 |
. | . | . | . | . | . |
---|---|---|---|---|---|
True | 1.0000 | 5.0000 | 0.20000 | 10.000 | 0.0000 |
Eq. stat. full estimates | 0.99949 | 4.9993 | 0.69579 | 5.4717 | 0.12990 |
Eq. stat. partial estimates | 4.9993 | 0.25931 | 8.9295 | −0.0034460 | |
Essential stat estimates | 0.99949 | 4.9993 | 0.21160 | 9.8663 | −0.01822 |
Table II and Fig. 3 (right) show the numerical results using the essential statistics. Although the relative error is not as small as in the low damping case, it is still lower than the estimates from the conventional method. Here is how our method is implemented: Similar to the low damping case, (29) is used in estimating the value of based on the estimates . The 20 essential statistics here are given by for , [suggested by the time auto-correlation of shown in Fig. 4 (right)], which are in a much shorter time interval. Compared to the low damping case, the loss of accuracy can be verified based on the sensitivity analysis (Fig. 2), which indicated that the parameters in this regime are less identifiable when is large.
V. EXAMPLE II: A GRADIENT SYSTEM WITH A TRIPLE-WELL POTENTIAL
Our next example is a gradient system driven by white noise. We consider a two-dimensional stochastic system as follows:
where is a two-dimensional Wiener process, is a potential energy, and is a matrix defined by
The generator of the system (34) is given by
We choose a triple-well potential function similar to the model in Ref. 30,
with
where denotes the characteristic function over the interval . Notice that the matrix is positive definite. The additional quadratic term in the triple-well potential (36) is, again, a smooth retaining potential. It is well known1 that the triple-well model (34) yields an equilibrium distribution given by
which is independent of parameter , the off-diagonal element of .
In the numerical tests, we set as the true values of the parameters. Figure 5 shows the contour plot of the potential and the scatter plot of the time series under this set of parameters. To generate the data from (34), we applied the weak trapezoidal method introduced in Ref. 31, which is a weak second-order method. In the remainder of this section, we will choose appropriate essential statistics from which can be directly estimated and the remaining parameters, , will be estimated using an appropriate least-square problem.
The contour plot of the potential (left) and the scatter plot of the solution with sample points (right).
The contour plot of the potential (left) and the scatter plot of the solution with sample points (right).
A. The essential statistics and reduction of the parameter space
Similar to the Langevin model, we consider an external forcing that is constant in . Subsequently, the perturbed dynamics is given by
where . If we select as the observable, the corresponding linear response operator reads
where with
given by (3). As a result, the entries of the linear response operator satisfy
where denotes the partial derivative of with respect to . Using integration by parts, one can show that
which leads to
This is known as the equipartition of the energy in statistical mechanics.
Notice that the response operators are not accessible due to the fact that the function depends on unknown true parameter values . This is precisely one of the issues raised in Remark 2.4. For gradient flow systems, this problem can be overcome by introducing a linear transformation to another set of two-point statistics. We define
and consider their time derivatives. Following the same calculations that led to (26), we obtain the following identities for :
It is more helpful to rewrite (42) into a linear system for ,
where the coefficient matrix is non-singular since . This linear relationship suggests that one can consider fitting in place of , since the former is numerically accessible.
Let and apply (40) to the first and the third equations of (43) under , we obtain the following estimates for and :
where is computable from the available sample. Equation (44) reduces the problem into a two-dimensional problem of estimating only.
B. Numerical results
As we have pointed out earlier, the linear response operator, (39), is not directly accessible. Motivated by the linear relation in (43), we consider to infer the values from the two-point statistics (41). In particular, we choose with , , as the essential statistics in our numerical test.
In light of the fact that is the two-point statistics of , we can apply the empirical a priori sensitivity test by checking the dependence of over the training collocation nodes , which are available to us from Step 2 of Algorithm 1. In this numerical experiment, the training used 25 collocation nodes, . To validate this empirical sensitivity test, we also perform the local sensitivity analysis as described in Sec. IV B 2 to parameters and (which, again, is not possible in practice since the true parameters are unknown). The results are shown in Fig. 6 (left). From the scales of and (average over realizations of ), we can see that this two-point statistic, , is quite sensitive to the parameters and , with stronger dependence on the parameter .
Local sensitivity test of with respect to parameter and (left) and the contour plot of the cost function together with the estimates (right).
Local sensitivity test of with respect to parameter and (left) and the contour plot of the cost function together with the estimates (right).
Based on a time series of size (with time lag ), we implemented Algorithm 1 with and collocation nodes, , constructed from order Chebyshev nodes in (12).
The estimates are shown in Table III. We also show in Fig. 6 (right) the contour plot of the cost function together with the estimates based on uniformly generated initial guesses. Notice that the true parameter value does not lie on the lowest contour value and these discrepancies are due to the surrogate modeling and quality of the samples. However, the estimates and are still reasonably accurate, as reported in Table III. We should point out that the estimates and reported in this table are the average of all the estimates (excluding the outliers), while the estimates and are obtained by solving (44). The contour plot also confirms the sensitivity analysis, which suggested that estimating is easier than .
The estimation results: the Triple-Well model.
. | . | . | . | . |
---|---|---|---|---|
True | 1.0000 | 0.2500 | 1.5000 | 0.5000 |
Estimates | 1.0249 | 0.2463 | 1.4946 | 0.5077 |
. | . | . | . | . |
---|---|---|---|---|
True | 1.0000 | 0.2500 | 1.5000 | 0.5000 |
Estimates | 1.0249 | 0.2463 | 1.4946 | 0.5077 |
One interesting question is whether the approximate values of the parameters can reproduce the essential statistics, which is useful in predicting non-equilibrium averages in the presence of external forces. Using the estimates reported in Table III, we compare the resulting two-point statistics used in estimating the parameters, , with the corresponding true statistics, (see Fig. 7, left). In Fig. 7 (right), we also compare the estimated response operator, , and the true response operator, . Excellent agreement is found.
The recovery of the response statistics (left) and the two-point statistics (right). All the trajectories have been normalized such that they start at point .
The recovery of the response statistics (left) and the two-point statistics (right). All the trajectories have been normalized such that they start at point .
VI. SUMMARY AND FURTHER DISCUSSION
This paper presented a parameter estimation method for stochastic models in the form of Itô drift diffusions. To infer parameters that do not appear in the equilibrium density function, we proposed to fit two-point essential statistics formulated using the fluctuation dissipation theory. Building upon the framework established in our previous work,5 we formulated the problem as a nonlinear least-squares problem subjected to a dynamical constraint when the parameters cannot be estimated directly by solving the corresponding statistical constraints. To avoid expensive computational cost in evaluating the essential statistics at each iteration when the Gauss-Newton method is used, we proposed to solve an approximate least-squares problem based on the polynomial surrogate modeling approach. This approach is motivated by the fact that the sampling error cannot be avoided in computing the value of essential statistics and the essential statistics are smooth functions of the parameters. We guaranteed the existence of minimizers of the approximate least-squares problem that converge to the solution of the true least-squares problem that involves the essential statistics under the assumption that these statistics are smooth functions of the parameters. We also showed that the polynomial approximate least-squares problem has a Jacobian that is full rank almost everywhere, which implies the local convergence of Gauss-Newton solutions. We tested the proposed methods on two examples that belong to two large classes of stochastic models—a Langevin dynamics model and a stochastic gradient system. In general, we expect that the parameter estimation procedure should be carried out as follows:
Reduce the parameter space by direct estimation using appropriate statistics that are easily computable, whenever this is possible.
Based on the observable of interest, the functional form of the equilibrium density, the external forcing term, and the available data, identify appropriate essential statistics for the remaining parameters using the a priori sensitivity analysis test. In our implementation, we compare the essential statistics computed on the training parameter values (e.g., collocation nodes). A more elaborate global sensitivity analysis technique such as the Sobol index28 can be used to determine the parameter identifiability.
For the remaining parameters, formulate a nonlinear least-squares problem using the appropriate essential statistics at some training parameter values.
Apply Algorithm 1 to obtain the estimates of the true parameter values .
Apply a sensitivity analysis, such as the local sensitivity analysis as discussed in Sec. IV B 2, as a posteriori confidence check for the estimates. If the parameters are found to be insensitive to the selected response statistics, go back to Step 3 and choose a different set of response statistics.
One of the restrictions with our formulation is to be able to compute in (7), which may require knowledge of (some of) the true values and this is not feasible in general. As a result, the non-linear system (8) cannot be evaluated without knowing the true parameters. For the Langevin dynamics, we found that the parameter in can be estimated with the equilibrium statistics, variance of . For the gradient flow problem, there exists an invertible linear transformation between the linear response statistics that are not computable and other two-point statistics that can be estimated without knowing the true parameter values. Since numerical Algorithm 1 does not rely on the FDT formulation, one can apply it on any available two-point statistics, e.g., time auto-correlations of the solution. We are not aware of a general routine that bypasses this difficulty. However, as we have demonstrated, at least for general Langevin dynamics models and stochastic gradient systems, this idea will work.
While the proposed method requires the knowledge of the equilibrium density function of the unperturbed system, one can relax this condition and estimate it from the data. For low-dimensional problems, one can use the non-parametric kernel density estimation method.32,33 If the invariant distribution of the dynamical system can be characterized by a density function defined on a smooth manifold (embedded in a high-dimensional phase space) that the data lie on or are close to, then one can estimate the density function by applying the theory of the kernel embedding of distribution34 on a Hilbert space with basis functions defined on the data manifold. Analogous to the conditional density estimation proposed in Refs. 37 and 38, these data-driven basis functions can be obtained via the diffusion maps algorithm.35,36 Another alternative to nonparametric methods is to consider a parametric density function estimation using the moment constrained maximum entropy principle.39 For densities with moderate, say five to seven, dimensions, one can use a recently developed Equation-By-Equation (EBE) method40 for solving the moment constrained maximum entropy problems. The source code of the EBE scheme is available at Ref. 41.
One of our ultimate goals is to use this method for parameter estimation of molecular modeling, in which a Langevin type of model is usually used. The potential energy typically involves a large set of parameters, e.g., the stiffness constants associated with bond stretching and deformation of bond angles, as well as the coefficients in the van der Waals and screened electro-static interactions. The selection of damping coefficients is also non-trivial; see the review paper42 for various perspectives. Compared to the existing methods, the novelty of our approach is that we use the response statistics to formulate the parameter estimation problem, which can reveal parameters that do not appear in the equilibrium density. In addition, the polynomial surrogate model provides an efficient mean to solve the nonlinear least-squares problem.
To achieve this goal, however, there are some remaining challenges. For large parameter space, we suspect that one needs to combine the existing methods to compute the parameters associated with the equilibrium density as one way to reduce the problem (as suggested in Step 1 above) with the proposed method to estimate the remaining parameters. Second, the proposed method requires high quality and possibly large amount of samples for accurate evaluation of the essential statistics. However, since the essential statistics depend on the choice of the observable and external forcing, approximating the high-dimensional integral can be avoided so long as the dimensions of the ranges of these two functions are small. Finally, the underlying model might be subject to modeling error. For example, the model may be derived from a multiscale expansion or just empirically postulated. The formulation of the response statistics in the presence of modeling error will be investigated in separate works.
ACKNOWLEDGMENTS
This research was partially supported by the National Science Foundation (NSF) (Grant No. DMS-1619661). X.L. is also supported by the NSF (Grant No. DMS-1522617). J.H. is also supported by the Office of Naval Research (ONR) (Grant No. N00014-16-1-2888).
APPENDIX A: PROOF OF THEOREM 3.1
In Sec. III B, we presented the convergence result Theorem 3.1 and included the proof for the one-dimensional case. Here, we show the proof for the general -dimensional case. We will first generalize Lemmas 3.3–3.5 to the -dimensional case.
Let with and consider its least-squares approximation given by (11). Then, we have as .
To generalize Lemma 3.5, it is enough to consider the partial derivative case, that is, finding the condition for as .
Now, we are ready to prove Theorem 3.1.
APPENDIX B: PROOF OF THEOREM 3.8
The full rank condition of the Jacobian matrix is usually required in the local convergence theorem of Newton-like methods. We are going to discuss this issue over a finite dimensional function space defined by
Since the zero element in is the function that is identically zero, we can introduce the following definition of linear independence.
We define the differential operator , which is as follows:
Now, let us deduce some relevant properties of such an operator.
The operators satisfy
there exists a permutation satisfying .
The following lemma will also be useful to prove our main result.
To gain intuitions, one can consider the following example with . One can select the following ordered basis for
We can derive the matrix representation of and
where is the identity matrix and is defined as follows:
The linear combination satisfies
and rank for .
We now turn to our discussion of the full rank condition of the Jacobian matrix,
where and . Let be the column vector , then , which is the -time product space of . We can rewrite the Jacobian matrix in the following way:
where is defined by operating on each component of . With the above observation, we are ready to prove Theorem 3.8.
APPENDIX C: PROOF OF THE ERGODICITY OF THE LANGEVIN DYNAMICS (21)
With a change of variables [, ], we reduce (21) into
To verify the ergodicity of (C1), we apply Theorem 3.2 in Ref. 15, which requires the potential satisfying the following two conditions:
for all .
- There exists an and such that(C2)
For the first condition, we simply notice that adding a constant term in does not affect the dynamic or the equilibrium distribution. Replacing in (C1) by
we have
It is enough to show that
the function has a lower bound, and
- for any given , such that(C4)
Since is continuous, by checking its limits on both sides