This study presents a general framework, namely, Sparse Spatiotemporal System Discovery ( S 3 d), for discovering dynamical models given by Partial Differential Equations (PDEs) from spatiotemporal data. S 3 d is built on the recent development of sparse Bayesian learning, which enforces sparsity in the estimated PDEs. This approach enables a balance between model complexity and fitting error with theoretical guarantees. The proposed framework integrates Bayesian inference and a sparse priori distribution with the sparse regression method. It also introduces a principled iterative re-weighted algorithm to select dominant features in PDEs and solve for the sparse coefficients. We have demonstrated the discovery of the complex Ginzburg–Landau equation from a traveling-wave convection experiment, as well as several other PDEs, including the important cases of Navier–Stokes and sine-Gordon equations, from simulated data.

Data-driven methods provide a viable alternative to more traditional approaches, such as first-principles derivation, for accelerating the discovery of physical laws. This work aims to develop a sparse Bayesian learning-based framework for modeling spatiotemporal dynamical systems described by Partial Differential Equations (PDEs). The proposed framework, based on sparse Bayesian learning (SBL), can reconstruct a variety of prototypical PDEs, including the Navier–Stokes equation, the FitzHugh–Nagumo equation for nerve conduction, and the Schr o ¨dinger equation in quantum mechanics, using a small number of samples compared to state-of-the-art (SOTA) methods. In particular, we demonstrate that the data-driven discovery method can automatically extract the first-principles-based coupled complex Ginzburg–Landau equations from data collected in a convection experiment.

Recently, there has been a surge of interest in the data-driven discovery of physical laws, which applies computational methods to discover representations from data. There have been reports of many successes using data-driven methods, such as learning a nonlinear mapping from time series,1 predicting catastrophes in nonlinear systems,2 creating autonomous smart systems capable of self-adaption and self-regulation for short- and long-term production management,3 equation-free modeling multi-scale/complex systems,4 inferring models of nonlinear coupled dynamical systems,5 reconstructing networks with stochastic dynamical processes,6 and extracting Koopman modes from data.7–10 

In the system identification community, a standard way is to transform the model discovery problem into a regression problem and design a good optimization algorithm to solve the problem.11,12 There are several numerical algorithms to find solutions to problems. The first kind of such a method builds on the ordinary least-squares regression. Examples include orthogonal reduction,13,14 the backward elimination scheme,15,16 least-squares estimator,17 and the alternating conditional expectation algorithm.18–20 These approaches do have a good fitting accuracy as described by Ref. 21 which, however, requires a relatively large number of data to train. The second type of method is the sparse regression approach, where a sparse prior assumption of underlying governing equations is incorporated to tackle the outstanding issue, overfitting. By converting the discovery problem of differential equations to a sparse regression problem, Ref. 22, and later, Brunton et al.23 developed a sparse identification of nonlinear dynamics framework to extract ODEs from noisy measurement data. Rudy et al.24 and Schaeffer25 consequently extended the method in Ref. 23 to recover PDEs from spatiotemporal data.

On the development of sparse optimization problem, sparse Bayesian learning (SBL) method, originally proposed in Ref. 26 and further developed in Refs. 27–29, adopts a Bayesian probabilistic framework to find sparse solutions. By treating the parameter as a random variable with sparse-inducing prior distributions determined by a set of hyperparameters and then calculating the posterior distribution of the to-be-estimated parameter given the data, it has been shown to be more general and powerful in finding a maximally sparse solution. For instance, the maximum a posteriori (MAP) estimation with a fixed weight prior is its special case;28 the Lasso problem can be interpreted as the explicit MAP with Laplace prior.30,31 Specifically, Wipf et al.32 established the equivalence of the SBL cost function and the objectives in canonical MAP-based sparse methods (e.g., the Lasso) via a dual-space analysis.

In this work, we propose a SBL framework for data-driven PDE discovery problems. This framework integrates Bayesian inference and sparse priori distribution with the sparse regression method. In our sparse Bayesian method derivation, an iterative re-weighted 1-minimization algorithm is derived to approximate the original 0-minimization. Compared with PDE-FIND with the Ridge Regression, which uses a thresholding 2-minimization algorithm, 1-minimization algorithm is known to pick the sparser solution and, therefore, can better approximate the solution to the original problem. Additionally, we also analyze spatiotemporal data observed from an experiment on convection in an ethanol water mixture in a long, narrow, annular container which is heated from below and demonstrate that the proposed data-driven discovery method is able to automatically extract the governing PDEs from spatiotemporal binary-fluid convection data. This is distinct from the previous works,24,33 in which the data used for discovering the PDEs are generated from the numerical solution to the known PDEs. Our key empirical findings are the following:

  • We demonstrate that an interpretable first-principle model can be discovered from inherently noisy experimental data. The proposed SBL method discovers a coupled complex Ginzburg–Landau equation (CGLE) using spatiotemporal binary-fluid convection data. Furthermore, it has been proven that the discovered values of various coefficients in the CGLE, such as the group velocity, the dispersion coefficients, the saturation parameter and the coupling coefficient, can be accurately determined in terms of the theoretical values and experimental ones provided in Refs. 19,34, and 35. Moreover, the numerical solution to the discovered CGLE matches the observed experimental phenomenon.

  • We demonstrate that the proposed SBL framework is capable of reconstructing a variety of prototypical PDEs, including the Navier–Stokes equation in fluid mechanics,36 the FitzHugh–Nagumo for nerve conduction,37 the Schr o ¨dinger equation in quantum mechanics,38 the sine-Gordon equation, and the Korteweg–de Vries equation. By comparing with the state-of-the-art PDE-FIND method in Ref. 24 and the Douglas-Rachford method in Ref. 25, the proposed algorithm is robust to measurement noise and is able to discover the underlying PDEs using a much smaller number of samples.

The rest of this paper is organized as follows: In Sec. II, a Bayesian framework for discovering PDEs from spatiotemporal data is presented. In Secs. III and IV, the proposed framework has been applied to the discovery of PDE models using both experimental and synthetic data. Conclusions and perspectives are drawn in Sec. V.

Notations: Throughout this paper, we denote vectors by lowercase letters, e.g., x, and matrices by uppercase letters, e.g., X. For a vector x and a matrix X, we denote its transpose by x T and X T, respectively. Furthermore, x i refers to its ith element of x. x 2 = x T x and x 1 = | x | denote its Euclidean norm and 1 norm, respectively. For a differentiable function f : R n R, we call f and k f / x i k (abbreviated as f x i) the derivative of f and the kth partial derivative of f with respect to x i, respectively. For a convex function f : R n R, we say vector v R n is a sub-gradient of f at x R n if for all y R n, f ( y ) f ( x ) + v T ( y x ). The set of all subgradients at x is called the subdifferential at x and is denoted f ( x ).

This section proposes a framework of the S 3 d method for data-driven discovery of PDEs. The proposed S 3 d framework is illustrated in Fig. 1, which includes the following three steps: data collection, model specification, and SBL solver. We will consider the dynamics of a spatiotemporal system, which is governed by a PDE of the general form
u t ( x , t ) = F ( x , t , u ( x , t ) , u ( x , t ) x 1 , , u ( x , t ) x d , × 2 u ( x , t ) x 1 2 , , 2 u ( x , t ) x 1 x d , , Θ ) ,
(2.1)
where the dynamical variable, u, is N-component with temporal and multi-dimensional spatial variables, t and x = ( x 1 , , x d ) T R d, respectively, defining the state. Θ is a parameter vector and F is an unknown, N-dimensional function, which depends on the dynamical variable u and its derivatives. In the following, we shall restrict the discussion to the case d = 1  and  N = 1 for notational simplicity, yet the proposed framework can be applied to general cases when d > 1  or  N > 1.
FIG. 1.

Schematics of the S 3 d method used to infer the CGLE from data. (a) Data collection. Data are collected from experiments of a real system (or from a numerical solution to a PDE). The collected data are formed into a matrix. (b) Model specification. A large library of candidate terms including the potential features for the to-be-identified PDE is constructed. The candidate terms include the state u, its spatial derivatives up to two orders, and nonlinear coupling terms between them. The time derivative u t is regarded as the model output. The collected data are used to approximate the time derivative u t of the state u and all the candidate terms. A set of linear equations (i.e., y = Φ θ) is then defined to represent the underlying PDE. (c) Sparse Bayesian learning. Sparse regression problem is solved using an iterative reweighted 1-minimization algorithm.

FIG. 1.

Schematics of the S 3 d method used to infer the CGLE from data. (a) Data collection. Data are collected from experiments of a real system (or from a numerical solution to a PDE). The collected data are formed into a matrix. (b) Model specification. A large library of candidate terms including the potential features for the to-be-identified PDE is constructed. The candidate terms include the state u, its spatial derivatives up to two orders, and nonlinear coupling terms between them. The time derivative u t is regarded as the model output. The collected data are used to approximate the time derivative u t of the state u and all the candidate terms. A set of linear equations (i.e., y = Φ θ) is then defined to represent the underlying PDE. (c) Sparse Bayesian learning. Sparse regression problem is solved using an iterative reweighted 1-minimization algorithm.

Close modal
To identify the form of F and the corresponding parameter Θ in Eq. (2.1), we construct a large library of candidate terms that may appear in the function F,
u t = F ( k u x k , k 1 u x k 1 , , u x , x ; Θ ) = j = 1 m F j ( k u x k , k 1 u x k 1 , , u x , x ) θ j ,
(2.2)
where F j ( ) is a candidate feature term, such as the polynomial, u 2 , u u 2, partial derivatives, u x , u x x , u x x x, or their combinations, u 2 u x , u 3 u x x, and θ = ( θ 1 , θ 2 , , θ m ) T is the corresponding parameter vector. A key observation we had is that θ should be sparse.22 For instance, consider that the preselected candidate terms include the state u and the derivative of u up to the third order. Then, Eq. (2.2) can be rewritten into the following regression form:
u t = [ 1 u u x u x x u x x x u 2 u 2 u x x x ] θ .
(2.3)
Given the dataset sampled on a spacetime mesh of n x spatial points by n t time steps, the identification problem for F is to solve the following linear equations for the parameter θ:
[ u t ( x 1 , t 1 ) u t ( x 2 , t 1 ) u t ( x n x , t n t ) ] y = [ 1 u u x u 2 u x x x 1 u u x u 2 u x x x 1 u u x u 2 u x x x ] Dictionary\,matrix Φ [ θ 1 θ 2 θ m ] θ
(2.4)
where each column of Φ is a compilation of all the values of a specific candidate function on the right side of Eq. (2.3) over all grid points, and the output vector y contains the values of u t on the left side of Eq. (2.3) over all grid points. One sequent task is to compute the time and spatial derivatives of u in Eq. (2.4). Various numerical methods including the numerical differentiation39–41 and the symbolic or automatic differentiation42 can be used for the estimation of the derivatives in the dictionary matrix Φ. In our framework, the polynomial approximation method is selected to estimate the derivatives from the noise-contained measurements. Indeed, the polynomial approximation method has been proved to be capable of effectively estimating the derivatives from the noise-contained data in Ref. 24. For completeness, we explain the polynomial approximation method in detail in the  Appendix.
Given these approximation error, we consider the following matrix form with the noise vector ν:
y = Φ θ + ν ,
(2.5)
where the model error either arises from experimental measurements or results from numerical approximation of the derivatives included in the model output y and the dictionary matrix Φ. Thus, the type of the model error is unknown for the PDE discovery problem in practice. For simplicity, we assume that ν is i.i.d. Gaussian distributed noise with zero mean and variance σ 2, N ( 0 , σ 2 I n ), with I n being an n × n identity matrix.
A general solution to Eq. (2.5) can be obtained by using the least-squares method. For example, θ ^ is the minimizer of the following optimization problem:
θ ^ = arg min θ y Φ θ 2 2 .
(2.6)
The solution, θ ^, is uniquely identified only if the number of grid points, n = n x n t, is larger than the number of columns in Φ and the matrix, Φ, has full column rank. However, for the PDE discovery problem, these two conditions, in general, are not always satisfied. Besides, if the dictionary is over-determined, the resulting optimization easily leads to overfitting.
To tackle these issues, we take a Bayesian perspective to Eq. (2.5) by considering the likelihood of the output y is
p ( y | θ ) = ( 2 π σ 2 ) n 2 exp ( 1 2 σ 2 y Φ θ 2 ) .
(2.7)
To enforce the sparsity, we assume the following prior distribution:
p ( θ ; γ ) = i = 1 m ( 2 π γ i ) 1 2 exp ( θ i 2 2 γ i ) ,
(2.8)
where γ := [ γ 1 , γ 2 , , γ m ] T is a vector of m hyperparameters. Although the hyperparametric vector γ is unknown, it determines the variance of each entry of θ and ultimately produces sparsity properties. In particular, if γ i = 0, one can get P ( θ i = 0 ) = 1 immediately. Moreover, the hyperparametric vector can be estimated by type- Π maximum likelihood. To this end, we first compute the marginalized probability distribution
p ( y ; γ ) = p ( y | θ ) p ( θ ; γ ) d θ = ( 2 π ) n 2 | Σ y | 1 2 exp ( 1 2 y T Σ y 1 y ) ,
(2.9)
where Σ y := σ 2 I n + Φ Γ Φ T and Γ := diag ( γ ). Then, we estimate γ by maximizing the marginalized probability distribution function. This is equivalent to minimizing log p ( y ; γ ), giving the cost function
L γ ( γ ) = log | Σ y | + y T Σ y 1 y ,
(2.10)
that is,
L γ ( γ ) = log | σ 2 I n + Φ Γ Φ T | + y T ( σ 2 I n + Φ Γ Φ T ) 1 y .
(2.11)
On the other hand, for fixed hyperparameters, one can compute the posterior distribution of θ from the Bayes formula. Then, one has
p ( θ | y ; γ ) = N ( μ θ , Σ θ ) ,
(2.12)
with μ θ = σ 2 Σ θ Φ T y and Σ θ = ( σ 2 Φ T Φ + Γ 1 ) 1. After obtaining the estimate γ ^ of the hyperparameter vector, the optimal estimation of coefficient vector θ is
θ ^ = μ ^ θ = ( Φ T Φ + σ 2 Γ ^ 1 ) 1 Φ T y = Γ ^ Φ T ( σ 2 I n + Φ Γ ^ Φ T ) 1 y ,
(2.13)
where Γ ^ = diag ( γ ^ ). Consider that
y T ( σ 2 I n + Φ Γ Φ T ) 1 y = 1 σ 2 y T y 1 σ 4 y T Φ Σ θ Φ T y = 1 σ 2 y Φ μ θ 2 + μ θ T Γ 1 μ θ = min θ { 1 σ 2 y Φ θ 2 + θ T Γ 1 θ } .
Thereby, we define the cost function with respect to θ and γ as follows:
L ( θ , γ ) = 1 σ 2 y Φ θ 2 + θ T Γ 1 θ + log | σ 2 I n + Φ Γ Φ T | = Δ f ( θ , γ ) + g ( θ , γ ) ,
(2.14)
where
f ( θ , γ ) = 1 σ 2 y Φ θ 2 + θ T Γ 1 θ , g ( θ , γ ) = log | σ 2 I n + Φ Γ Φ T | .
It can be proven that f ( θ , γ ) is convex with respect to ( θ , γ ) and g ( θ , γ ) is concave with respect to ( θ , γ ).
The problem to determine the sparse coefficient θ is now equivalent to finding
( θ ^ , γ ^ ) arg min ( θ , γ ) Ω L ( θ , γ ) .
(2.15)
This is a DC (difference in convex functions) programming problem that can be generally solved by the CCP (concave–convex procedure) methods.
The CCP-based SBL algorithm is first derived in Ref. 43, where an principled iterative reweighted 1 approach is designed to optimize the constructed auxiliary cost functions as shown in Eq. (2.14). Here, we take a slightly different formulation to present the used CCP-based SBL algorithm. Start with the CCP procedure, it is an iterative algorithm that solves the following sequence of convex programs,
( θ ( k + 1 ) , γ ( k + 1 ) ) arg min ( θ , γ ) Ω L ^ ( θ , γ ; θ ( k ) , γ ( k ) ) ,
(2.16)
or equivalently,
( θ ( k + 1 ) , γ ( k + 1 ) ) arg min ( θ , γ ) Ω { 1 σ 2 y Φ θ 2 + i ( θ i 2 γ i + c i ( k ) γ i ) } ,
where
L ^ ( θ , γ ; θ ( k ) , γ ( k ) ) := f ( θ , γ ) + g ( θ ( k ) , γ ( k ) ) + g γ ( θ ( k ) , γ ( k ) ) , γ γ ( k ) , c ( k ) := γ log | σ 2 I n + Φ Γ ( k ) Φ T | , Γ ( k ) := diag ( γ 1 ( k ) , γ 2 ( k ) , , γ m ( k ) ) .
Note that the supplementary definition for L can be extended to the function L ^. It follows that
θ ( k + 1 ) arg min θ { y Φ θ 2 + 2 σ 2 i c i ( k ) | θ i | } ,
(2.17a)
γ i ( k + 1 ) = | θ i ( k + 1 ) | c i ( k ) , i = 1 , 2 , , m .
(2.17b)
Given an initial point ( θ ( 0 ) , γ ( 0 ) ) int ( Ω ), we can obtain an iterative sequence { ( θ ( k ) , γ ( k ) ) } k = 0 . Moreover, c ( k ) can be analytically calculated by
c ( k ) = γ log | σ 2 I n + Φ Γ ( k ) Φ T | = diag [ Φ T ( σ 2 I n + Φ Γ ( k ) Φ T ) 1 Φ ] .
(2.18)
By the property of concave function, one has
L ( θ , γ ) L ^ ( θ , γ ; θ ( k ) , γ ( k ) ) .
Combining with (2.16) yields
L ( θ ( k + 1 ) , γ ( k + 1 ) ) L ^ ( θ ( k + 1 ) , γ ( k + 1 ) ; θ ( k ) , γ ( k ) ) L ^ ( θ ( k ) , γ ( k ) ; θ ( k ) , γ ( k ) ) = L ( θ ( k ) , γ ( k ) ) .
(2.19)
Then, some simple computations give
L ( θ ( k ) , γ ( k ) ) log | σ 2 I n | = 2 n log σ > .
which shows that { L ( θ ( k ) , γ ( k ) ) } k = 0 is a bounded monotonic non-increasing sequence. In fact, it is strictly monotonic before the iteration (2.17) reaches a local minimum. Therefore, the iterative process should be terminated when
L ( θ ( k ) , γ ( k ) ) L ( θ ( k + 1 ) , γ ( k + 1 ) ) τ ,
(2.20)
where τ is a tolerance. The above iterative process is summarized in Algorithm 1.

ALGORITHM 1

SBL algorithm based on CCP.

Input: Φ R n × m: design matrix; y R n: observation vector; τ
tolerance; 
Output: θ ( k + 1 ) 
1:  initial Γ ( 0 ) = I m, k = 0
2:  repeat 
3:       c ( k ) = diag [ Φ T ( σ 2 I n + Φ Γ ( k ) Φ T ) 1 Φ ]
4:       θ ( k + 1 ) arg min θ { y Φ θ 2 + 2 σ 2 i c i ( k ) | θ i | }
5:       γ j ( k + 1 ) = | θ j ( k + 1 ) | c j ( k ), Γ ( k + 1 ) = diag ( γ ( k + 1 ) )
6:       k = k + 1 
7:  until ( L ( θ ( k ) , γ ( k ) ) L ( θ ( k + 1 ) , γ ( k + 1 ) ) τ ) 
Input: Φ R n × m: design matrix; y R n: observation vector; τ
tolerance; 
Output: θ ( k + 1 ) 
1:  initial Γ ( 0 ) = I m, k = 0
2:  repeat 
3:       c ( k ) = diag [ Φ T ( σ 2 I n + Φ Γ ( k ) Φ T ) 1 Φ ]
4:       θ ( k + 1 ) arg min θ { y Φ θ 2 + 2 σ 2 i c i ( k ) | θ i | }
5:       γ j ( k + 1 ) = | θ j ( k + 1 ) | c j ( k ), Γ ( k + 1 ) = diag ( γ ( k + 1 ) )
6:       k = k + 1 
7:  until ( L ( θ ( k ) , γ ( k ) ) L ( θ ( k + 1 ) , γ ( k + 1 ) ) τ ) 

In this section, we apply the proposed S 3 d method to discover the complex Ginzburg–Landau equation (CGLE) from a binary-fluid convection experiment. Traveling-wave (TW) convection in binary fluids is a known, mainstream ansatz technique for studying the physical mechanism of non-equilibrium pattern-forming systems.44 Substantial experiments on TW convection unfold many different dynamical states of TW. A major challenge in the scientific study of TW is quantitatively understanding and predicting the underlying dynamics. First-principle models based on the CGLE and its variants45,46 have long been used to quantitatively explain experimental observations.

The experimental data are collected from an experiment on convection in an ethanol/water mixture in a long, narrow, annular container which is heated from below.19,34 The objective of the experiment is to probe into the counterpropagating wave packet formation regime. As reported in Refs. 34,47, and 48, a regular, small-amplitude traveling-wave (TW) state is observed in this experiment, which consists of pairs of quasilinear wave packets which propagate around an annular cell in opposite directions, referred to as “left” and “right.” In Fig. 2, we show the dynamics of the TW state for the bifurcation parameter ε (scaled by the characteristic time τ 0 defined below): ε τ 0 = 1.77 × 10 3, where the left-TW rapidly moves to the left with small amplitudes [denoted by A L ( x , t )] and collides periodically with the right-TW [denoted by A R ( x , t )].

FIG. 2.

The dynamics of the observed TW state for ε τ 0 = 1.77 × 10 3. The top line shows the physical change of the amplitudes of the left-going (blue full line and dashed line) and right-going (red full curve and dashed line) wave components at a particular time and at the next time’s data. (a) The behavior of left-going amplitudes; (b) the behavior of right-going amplitudes; (c) the behavior of the sum of left- and right-going amplitudes.

FIG. 2.

The dynamics of the observed TW state for ε τ 0 = 1.77 × 10 3. The top line shows the physical change of the amplitudes of the left-going (blue full line and dashed line) and right-going (red full curve and dashed line) wave components at a particular time and at the next time’s data. (a) The behavior of left-going amplitudes; (b) the behavior of right-going amplitudes; (c) the behavior of the sum of left- and right-going amplitudes.

Close modal
The data are the complex amplitudes, A L ( x , t ) and A R ( x , t ) of the TW state observed above onset ( ε = 0), which are independently measured by applying full complex demodulation techniques49 to the signals from the annular photodiode array, which records the flow-pattern image. The sampling period of these data is t = 1.5625, and the space sample interval is x = 0.458 167. There are seven sets of data collected from the same experimental setup with seven different bifurcation parameters ε shown in Table I. For each dataset, a total of 180 × 760 data points is used for model discovery. Due to the use of the complex amplitudes, which is different from Ref. 19, where the real amplitudes and phases, a L , R ( x , t ) and ϕ L , R ( x , t ) defined by A L , R ( x , t ) = a L , R ( x , t ) e i ϕ L , R ( x , t ), are used, Eq. (2.4) takes a complex form, that is, y ~ = Φ ~ θ ~, where y ~ , θ ~ are complex vectors and Φ ~ is a complex matrix. For this, we consider the following one-to-one mapping from the complex vector y ~ to a real vector y and construct the following real matrix Φ:
y = [ Re y ~ Im y ~ ] , Φ = [ Re ( Φ ~ ) Im ( Φ ~ ) Im ( Φ ~ ) Re ( Φ ~ ) ] .
With the above mapping and matrix construction, estimating the parameter θ ~ can be equivalently converted to determine the θ via
y ~ = Φ ~ θ ~ y = Φ θ .
This one-shot optimization for complex variables can reduce the influence of the inherent noise on the discovery result. This is particularly important for model discovery due to the TW amplitudes are small ( 10 3 magnitude as shown in Fig. 2). This directly affects the computing error of the real amplitudes and phases by a L , R ( x , t ) = ( Re A L , R ) 2 + ( Im A L , R ) 2 as well as ϕ L , R = arctan Im A L , R Re A L , R and aggravates the estimation error of their derivatives as demonstrated in Ref. 19.
TABLE I.

Overview of the TW data in the designed experiment. The special letter “cal” and “car,” respectively, represent the left-going waves and right-going ones. The files record the experimental data. Seven bifurcation parameters, ε, scaled by the characteristic time, τ0 (i.e., γ = ε τ 0 1), represent the same experiments but conducted with different bifurcation parameters. All samples used for analysis are truncated from the 21st sampling point to the 780th sampling point19 such that a total of 180 × 760 data points is used for the analysis. Then, we form the snapshot matrices, U C 180 × 760.

Bifurcation Sampling Space Number of points
Experiment label parameter Sample number interval interval after truncation
cal06172/car06172  γ = 9.32 × 10−3  988 × 180  1.5625  0.458 167  228 
cal06182/car06182  γ = 4.22 × 10−3  800 × 180  1.5625  0.458 167  40 
cal06192/car06192  γ = 1.77 × 10−3  1000 × 180  1.5625  0.458 167  240 
cal06212/car06212  γ = 6.38 × 10−3  1000 × 180  1.5625  0.458 167  240 
cal06222/car06222  γ = 12.07 × 10−3  1031 × 180  1.5625  0.458 167  271 
cal06242/car06242  γ = 14.03 × 10−3  1000 × 180  1.5625  0.458 167  240 
cal06252/car06252  γ = 16.28 × 10−3  1000 × 180  1.5625  0.458 167  240 
Bifurcation Sampling Space Number of points
Experiment label parameter Sample number interval interval after truncation
cal06172/car06172  γ = 9.32 × 10−3  988 × 180  1.5625  0.458 167  228 
cal06182/car06182  γ = 4.22 × 10−3  800 × 180  1.5625  0.458 167  40 
cal06192/car06192  γ = 1.77 × 10−3  1000 × 180  1.5625  0.458 167  240 
cal06212/car06212  γ = 6.38 × 10−3  1000 × 180  1.5625  0.458 167  240 
cal06222/car06222  γ = 12.07 × 10−3  1031 × 180  1.5625  0.458 167  271 
cal06242/car06242  γ = 14.03 × 10−3  1000 × 180  1.5625  0.458 167  240 
cal06252/car06252  γ = 16.28 × 10−3  1000 × 180  1.5625  0.458 167  240 

We proceed the discovery process by two steps: first, the proposed S 3 d method intensively determines the dominant physical terms from a large library of candidate terms. We construct the function dictionary using brief domain knowledge in fluid dynamics theory. Possible terms may include the terms appearing in the Navier–Stokes equations (such as A A , A) or the ones appearing in the first principles-based derived complex Ginzburg–Landau equations (such as the higher-order nonlinear terms A 2 x A , | A | 2 A). In general, the amplitude itself and its derivative up to some custom order k should be added in order to properly model the physical system. Furthermore, to correctly model states of counterpropagating TW, the coupling terms like | A L | 2 A R and | A R | 2 A L must be added to account for the interactions between the left- and right-going TW. In this example, we take the third-order Volterra expansions of A L , R and | A L , R | and use the fourth-order compact Pade scheme50 for the derivative estimation. It is shown that the Pade scheme is much more robust than the explicit fourth-order difference scheme for the noise. We investigate the performance of the S 3 d method on the last three sets of data with a larger signal-to-noise ratio ( that ε τ 0 1 is , ε τ 0 1 12.07 × 10 3 ) in Table I. The second step is to identify coefficients for each identified dominant term, where all the data displayed in Table I are used for the coefficient estimations. The used candidate terms include A, | A |, A 2, | A | 2, A 3, A 2 | A |, A | A | 2, | A | 3, A x x, A A x, A A x x, A 2 A x, A 2 A x x, A 3 A x, A 3 A x x, and | A R | 2 A for the right-going TW.

The first finding is that the proposed S 3 d method stably extracts five dominant feature terms to compose the following governing coupled equations:
t A R = ( 0.0079 + 0.0640 i ) A R 0.5982 x A R + ( 0.9210 0.1243 i ) x 2 A R + ( 17.7800 206.3442 i ) | A R | 2 A R + ( 25.2965 150.6041 i ) | A L | 2 A R , t A L = ( 0.0065 + 0.0443 i ) A L + 0.5895 x A L + ( 0.3392 0.2185 i ) x 2 A L + ( 17.1600 89.0851 i ) | A L | 2 A L + ( 6.4579 139.7492 i ) | A R | 2 A L ,
(3.1)
where the estimated coefficients correspond to the experimental data labeled “cal/car06172.” In Fig. 3, we show the relationship between the evolution of the fitting error and the change of the number of feature terms. It can be seen that for all used dataset, there exists a critical value before which the fitting-error slowly increases and after which the error begins to grow rapidly. The critical value to some extent illustrates that the retained five feature terms are best for describing the data.
FIG. 3.

The accelerations of fitting error due to a change in the number of feature terms. We empirically adjust the regulation parameter, λ, while observing the change in the fitting error. Three sets of data were chosen to test S 3 d: car/cal06242, car/cal06222, and car/cal06252 from top to bottom. For each dataset, the fitting error slowly increased and tended to a relatively stable value [five features in Eq. (3.1) emerge], then that error began to rapidly grow.

FIG. 3.

The accelerations of fitting error due to a change in the number of feature terms. We empirically adjust the regulation parameter, λ, while observing the change in the fitting error. Three sets of data were chosen to test S 3 d: car/cal06242, car/cal06222, and car/cal06252 from top to bottom. For each dataset, the fitting error slowly increased and tended to a relatively stable value [five features in Eq. (3.1) emerge], then that error began to rapidly grow.

Close modal
In Fig. 4, we outline all the used candidate terms and enumerate the key feature terms for the bifurcation parameter, γ = 14.03 × 10 3, which include the amplitude itself A, x A, x x A, | A | 2 A and the coupling term | A L | 2 A with the “R” subscript neglected. The same results also hold for two other datasets with bifurcation parameter γ = 12.07 × 10 3 and bifurcation parameter γ = 16.28 × 10 3. In fact, the discovered governing equations agree well with the simplest model of coupled complex Ginzburg–Landau equations (CGLEs) derived from first principles,
τ 0 ( t + s x ) A = ε ( 1 + i c 0 ) A + ω 0 2 ( 1 + i c 1 ) x 2 A + g ( 1 + i c 2 ) | A | 2 A + h ( 1 + i c 3 ) | A L | 2 A ,
where A is the complex amplitude of a right-going wave with group velocity s, parameter ω 0 is a characteristic length scale, and τ 0 is a characteristic time, which is determined experimentally by measuring the growth rate, γ = ε τ 0 1, at several values of ε and fitting the slope, c 0 c 3 are dispersion coefficients, g is a nonlinear saturation parameter, and h is a nonlinear coupling coefficient, which reflects the stabilizing interaction between oppositely propagating traveling-wave components (see Ref. 19 and the reference therein). According to substantial theoretical51–53 and experimental47,48 works, the CGLE model has been proven to be potential to qualitatively explain weakly nonlinear dynamics near onset and closely parallel the experimental observations. Here, we apply the sparse Bayesian learning method to make a quantitive comparison of the experimental data with the simplest CGLE model.
FIG. 4.

All the used candidate terms for the experimental data labeled car/cal06242, the proposed method identifies the feature terms with different λ.

FIG. 4.

All the used candidate terms for the experimental data labeled car/cal06242, the proposed method identifies the feature terms with different λ.

Close modal

Based on the analysis above, we further apply the proposed S 3 d method to fine-tune the coefficients of CGLE. For TW convection, most parameters in CGLE (the coupling coefficient is an exception) have been measured in Ref. 35. Here, to show our identified results in an accessible form, we introduce the concept of the leave-one-out in machine learning to discuss the best method to choose the regularization parameters λ and obtain the best-fit coefficients. We accept the parameters that lead to the smallest fitting error on the test sets: e r r = Y Φ ω 2 Y 2. In Table II, we summarized the identified coefficients, from which it can be seen that the estimated coefficients mostly resemble the theoretical values and experimental ones (Refs. 19,34, and 35 and reference therein), except for some obscure parameters in theory and experiment. Particularly, we observe that for smaller bifurcation parameters, the estimated coupling coefficients possess the same characteristic, that is, h τ 0 1 for the right-going TW component A R is smaller than ones for the A L, and the value of h c 3 τ 0 1 approximate 150. But for the three datasets with the highest signal-to-noise ratio, we clearly find that the estimated coefficients are unstable, which may arise from the bad estimates of the derivatives. Additionally, it is interesting to note that all the seven analyzed datasets have one thing in common, that is, the negativity of the coupling coefficients, which indicates a stabilizing nonlinear competition between the two TW and a slowing down of the TW group velocity, which is also similar to Ref. 19 and agrees with the experimental observation.34 

TABLE II.

Summary of the identified parameters of seven sets of real data: The data are labeled according to “right”(car) and “left”(cal). Each row represents the identified coefficients by S3d.

Coefficients in the discovered complex Ginzburg–Landau equation
Experiment label s ε τ 0 1 ξ 0 2 τ 0 1 g τ 0 1 h τ 0 1 ε τ 0 1 c 0 ξ 0 2 c 1 τ 0 1 g c 2 τ 0 1 h c 3 τ 0 1
cal06172cb  0.5895  0.0065  0.3392  −17.1619  −6.4595  0.0443  −0.2185  −89.0851  −139.7500 
car06172cb  0.5982  0.0079  0.9210  17.7800  −25.2965  0.0640  −0.1243  −206.3442  −150.6041 
cal06182cb  0.6268  0.0057  0.0000  −78.0836  −0.8076  −0.0094  −0.3774  0.0000  −154.4844 
car06182cb  0.6308  0.0081  0.0935  −90.2362  −35.0632  −0.0059  −0.4397  −55.8157  −157.3323 
cal06192cb  0.5930  0.0054  0.0148  −139.4494  −2.5230  −0.0066  −0.6457  −37.6346  −152.7833 
car06192cb  0.5916  0.0087  0.0158  −198.6799  −34.9115  −0.0066  −0.9189  −68.1182  −141.3105 
cal06212cb  0.6063  0.0057  0.0639  −45.0095  −7.9602  0.0033  −0.2732  −35.3580  −154.5495 
car06212cb  0.6131  0.0042  0.0202  −26.1553  −33.0555  0.0019  −0.1489  −17.7879  −168.7832 
cal06222cb  0.5741  0.0173  1.7492  40.1657  −15.1882  0.0281  −0.0424  −379.5853  −136.5903 
car06222cb  0.5771  0.0127  1.3000  29.9928  −16.6945  0.0114  −0.0894  −272.3181  −161.5083 
cal06242cb  0.5509  0.0138  1.0701  16.2302  −17.3846  0.0555  −0.1505  −234.8376  −144.0718 
car06242cb  0.5597  0.0150  1.1368  13.1359  −4.5687  0.0567  −0.2108  −237.9784  −149.1559 
cal06252cb  0.5634  0.0199  1.1690  2.1061  −5.2084  0.0867  −0.2690  −267.3739  −169.0035 
car06252cb  0.5481  0.0131  1.1075  22.3429  −3.9619  0.0794  −0.1245  −234.6182  −179.6777 
Coefficients in the discovered complex Ginzburg–Landau equation
Experiment label s ε τ 0 1 ξ 0 2 τ 0 1 g τ 0 1 h τ 0 1 ε τ 0 1 c 0 ξ 0 2 c 1 τ 0 1 g c 2 τ 0 1 h c 3 τ 0 1
cal06172cb  0.5895  0.0065  0.3392  −17.1619  −6.4595  0.0443  −0.2185  −89.0851  −139.7500 
car06172cb  0.5982  0.0079  0.9210  17.7800  −25.2965  0.0640  −0.1243  −206.3442  −150.6041 
cal06182cb  0.6268  0.0057  0.0000  −78.0836  −0.8076  −0.0094  −0.3774  0.0000  −154.4844 
car06182cb  0.6308  0.0081  0.0935  −90.2362  −35.0632  −0.0059  −0.4397  −55.8157  −157.3323 
cal06192cb  0.5930  0.0054  0.0148  −139.4494  −2.5230  −0.0066  −0.6457  −37.6346  −152.7833 
car06192cb  0.5916  0.0087  0.0158  −198.6799  −34.9115  −0.0066  −0.9189  −68.1182  −141.3105 
cal06212cb  0.6063  0.0057  0.0639  −45.0095  −7.9602  0.0033  −0.2732  −35.3580  −154.5495 
car06212cb  0.6131  0.0042  0.0202  −26.1553  −33.0555  0.0019  −0.1489  −17.7879  −168.7832 
cal06222cb  0.5741  0.0173  1.7492  40.1657  −15.1882  0.0281  −0.0424  −379.5853  −136.5903 
car06222cb  0.5771  0.0127  1.3000  29.9928  −16.6945  0.0114  −0.0894  −272.3181  −161.5083 
cal06242cb  0.5509  0.0138  1.0701  16.2302  −17.3846  0.0555  −0.1505  −234.8376  −144.0718 
car06242cb  0.5597  0.0150  1.1368  13.1359  −4.5687  0.0567  −0.2108  −237.9784  −149.1559 
cal06252cb  0.5634  0.0199  1.1690  2.1061  −5.2084  0.0867  −0.2690  −267.3739  −169.0035 
car06252cb  0.5481  0.0131  1.1075  22.3429  −3.9619  0.0794  −0.1245  −234.6182  −179.6777 
One important benefit of data-driven PDE discovery is to use the discovered model for simulation, analysis, and prediction. Here, we make an attempt to solve the identified CGLE and reconstruct the TW convection behaviors with the appropriate initial boundary data. Note that the considered experiments are conducted in an annular container, which implies that the physical system comprises periodic boundary conditions,19 that is,
A L ( x , t ) = A L ( x + p , t ) , A R ( x , t ) = A R ( x + p , t ) ,
where p is one periodic. We employ the Fourier spectral method presented in Ref. 41 to simulate the identified CGLE model, for instance, for the bifurcation parameter γ = 6.38 × 10 3,
t A R = 0.6131 x A R + ( 0.0042 + 0.0019 i ) A R + ( 0.0158 0.9189 i ) x 2 A R ( 26.1553 + 17.7879 i ) | A R | 2 A R ( 7.9602 + 154.5495 i ) | A L | 2 A R , t A L = 0.6063 x A L + ( 0.0057 + 0.0033 i ) A L + ( 0.0639 i 0.2723 ) x 2 A L ( 45.0095 + 35.3580 i ) | A L | 2 A L ( 33.0555 + 168.7832 i ) | A R | 2 A L ,
where 0 < x < l , t > 0, with the system length l = 82.0119 and the computing time T = 625 (i.e., t 0 = 0 , t f = 625). The simulation proceeds by discretizing the left-going complex wave amplitude, A L, and the right-going complex wave amplitude, A R, in space on a uniform 180 mesh with spectral approximations of the spatial derivatives, and integrating in time using an explicit Runge–Kutta formula with 401 time points. We use any column of the snapshot matrix, U, as an initial value. Figure 5 shows the simulation results for different initial values: the 622th column of U for data labeled “cal/car06172,” the 600th column for data labeled “cal/car06192,” the 600th column for data labeled “cal/car06212,” and the 1st column for data labeled “cal/car06182.” Therein, the first three rows have a resolution of 180 × 400 samples and the fourth row has a resolution of 180 × 200. Both resolutions correspond to reconstruction times T = 625 and T = 312.5, respectively. It can be seen that the TW behavior by simulation agrees well with the experimental observation. In simulation, we also find that the reconstruction results change with different initial values. One notable attempt is to take the 658th column for data labeled “cal/car06192” as the initial value. It can be observed that with the simulation running to T = 1187 both the left- and right-going TW possess regular bursts, and the amplitudes of wave approach the real values, as shown in Fig. 6. Thus, the reconstruction from our identified CGLE model bears a striking resemblance to the experimentally observed nonlinear states of TW convection on binary-fluid convection.
FIG. 5.

Comparison of spatiotemporal evolution of the superposition of the left- and right-TW amplitude between the simulation and the experiment. The different initial values, the 622th column of the snapshot matrix U for data labeled “cal/car06172,” the 600th column for data labeled “cal/car06192,” the 600th column for data labeled “cal/car06212,” and the 1st column for data labeled “cal/car06182,” are selected for simulation (the first column). The first 400 (or 200) snapshots of the real data and the simulation data are shown with a resolution of 180 × 400 (column 2 and column 3).

FIG. 5.

Comparison of spatiotemporal evolution of the superposition of the left- and right-TW amplitude between the simulation and the experiment. The different initial values, the 622th column of the snapshot matrix U for data labeled “cal/car06172,” the 600th column for data labeled “cal/car06192,” the 600th column for data labeled “cal/car06212,” and the 1st column for data labeled “cal/car06182,” are selected for simulation (the first column). The first 400 (or 200) snapshots of the real data and the simulation data are shown with a resolution of 180 × 400 (column 2 and column 3).

Close modal
FIG. 6.

Comparison of spatiotemporal evolution of the superposition of the left- and right- TW amplitude between the simulation and the experiment. Using the extended time T = 1187 and initial value (i.e., 658th column of snapshot matrix), a regular bust of TW is seen .

FIG. 6.

Comparison of spatiotemporal evolution of the superposition of the left- and right- TW amplitude between the simulation and the experiment. Using the extended time T = 1187 and initial value (i.e., 658th column of snapshot matrix), a regular bust of TW is seen .

Close modal
In this section, we validate the practicality of the proposed S 3 d method by using various synthetic data from numerical solutions to canonical PDEs as well as the dataset originally collected in Ref. 24. Similar to Ref. 24, we add the Gaussian noise into the synthetic data for seeing the robustness of the SBL method. The error from the noisy data is quantified by means of the root mean square error (RMSE),
E r r ( y ) = 1 n x n t j = 1 n x i = 1 n t ( u ( x j , t i ) y ( x j , t i ) ) 2 .
The mean-square error (MSE) and Standard Deviation (STD) are computed and used for assessing the discovery results. Additionally, we also use non-Gaussian noise to verify the robustness of the present method, which include the Gaussian mixture noise and the noise that stems from an Ornstein–Uhlenbeck process proposed in Refs. 54 and 55. Finally, we would like to mention that the assumption on the noise type for our theoretical analysis is unnecessary for the following experiments. The error analysis for data-driven PDE method is important, but which goes beyond our current research.

Table III demonstrates that S 3 d method has successfully discovered the sine-Gordon equation, the Korteweg–de Vries equation, the FitzHugh–Nagumo equation, the Schrödinger’s equation, and the Navier–Stokes equation. Due to page limit, we only demonstrate the discovery results for the Navier–Stokes equation.

TABLE III.

Summary of the discovery results for the sine-Gordon equation, the Korteweg–de Vries equation, the FitzHugh–Nagumo equation, the Schrödinger’s equation, and the NS equation.

PDE Noise case Points Identified PDE Mean(err)±STD(err)
Sine-Gordon equation  No noise  10  utt = 0.9999uxx − 0.9986sin (u 0.0732 % ± 0.0893 % 
  With noise  50  utt = 0.9934uxx − 0.9986sin (u 0.3999 % ± 0.3738 % 
Korteweg–de Vries equation  No noise  10 000  ut = −0.000 484uxxx − 0.999 246uux  0.0857 % ± 0.0146 % 
  With noise  18 750  ut = −0.000 474uxxx − 0.974 795uux  2.3186 % ± 0.2856 % 
FitzHugh–Nagumo equation  No noise  10000 ( u ) 10 000 ( w )  u t = 0.1989 u + 1.1991 u 2 1.0010 u 3 + 1.0023 u x x 1.0005 ω ω t = 0.000 990 ω + 0.001 998 u  0.2952 % ± 0.3502 % 
  With noise  18 000 ( u ) 27 000 ( w )  u t = 0.1983 u + 1.1812 u 2 0.9611 u 3 + 1.0025 u x x 0.9950 ω ω t = 0.001 000 ω + 0.001 998 u  1.0232 % ± 1.3728 % 
Schrödinger’s equation  No noise  10 000  u t = i 0.3 u x x + i 3.3333 u | u | 2 i 3.3333 u  0.0011 % ± 0.0007 % 
  With noise  10 000  u t = i 0.3013 u x x + i 3.2946 u | u | 2 i 3.3186 u  0.6788 % ± 0.4183 % 
Navier–Stokes equation  No noise  10 000  ω t ( x , t ) = 0.0101 ω x x + 0.0100 ω y y 1.0014 u ω x 0.9914 v ω y  0.3830 % ± 0.3781 % 
  With noise  20 000  ω t ( x , t ) = 0.0101 ω x x + 0.0097 ω y y 0.9988 u ω x 0.9996 v ω y  1.0306 % ± 1.5734 % 
PDE Noise case Points Identified PDE Mean(err)±STD(err)
Sine-Gordon equation  No noise  10  utt = 0.9999uxx − 0.9986sin (u 0.0732 % ± 0.0893 % 
  With noise  50  utt = 0.9934uxx − 0.9986sin (u 0.3999 % ± 0.3738 % 
Korteweg–de Vries equation  No noise  10 000  ut = −0.000 484uxxx − 0.999 246uux  0.0857 % ± 0.0146 % 
  With noise  18 750  ut = −0.000 474uxxx − 0.974 795uux  2.3186 % ± 0.2856 % 
FitzHugh–Nagumo equation  No noise  10000 ( u ) 10 000 ( w )  u t = 0.1989 u + 1.1991 u 2 1.0010 u 3 + 1.0023 u x x 1.0005 ω ω t = 0.000 990 ω + 0.001 998 u  0.2952 % ± 0.3502 % 
  With noise  18 000 ( u ) 27 000 ( w )  u t = 0.1983 u + 1.1812 u 2 0.9611 u 3 + 1.0025 u x x 0.9950 ω ω t = 0.001 000 ω + 0.001 998 u  1.0232 % ± 1.3728 % 
Schrödinger’s equation  No noise  10 000  u t = i 0.3 u x x + i 3.3333 u | u | 2 i 3.3333 u  0.0011 % ± 0.0007 % 
  With noise  10 000  u t = i 0.3013 u x x + i 3.2946 u | u | 2 i 3.3186 u  0.6788 % ± 0.4183 % 
Navier–Stokes equation  No noise  10 000  ω t ( x , t ) = 0.0101 ω x x + 0.0100 ω y y 1.0014 u ω x 0.9914 v ω y  0.3830 % ± 0.3781 % 
  With noise  20 000  ω t ( x , t ) = 0.0101 ω x x + 0.0097 ω y y 0.9988 u ω x 0.9996 v ω y  1.0306 % ± 1.5734 % 
Consider the incompressible Navier–Stokes (NS) equations for the unsteady two-dimensional flows on torus with vorticity/stream function formulation,36,
ω t + u ω x + v ω y = ( ω x x + ω y y ) / R e , ψ x x + ψ y y = ω ,
(4.1)
where u = ψ y represents the horizontal velocity component, v = ψ x represents the vertical velocity component, ω = u y v x represents the vorticity, and R e is the Reynolds number (based on the radius of the cylinder and the free-stream velocity, V ). The vorticity formulation is attractive for the accurate solution of high Reynolds number planar or axisymmetric NS equations.56 In this example with the NS equations, the flow field is described by four field quantities, each of which needs to be measured for data.
We use a combination of the Fourier spectral method and the Crank–Nicolson method56 to solve Eq. (4.1) with initial condition in x × y [ 0 , 2 π ] × [ 0 , 2 π ],
ω ( x , y ) = exp ( 1 5 [ x 2 + ( y + π 4 ) 2 ] ) + exp ( 1 5 [ x 2 + ( y π 4 ) 2 ] ) 1 2 exp ( 2 5 [ ( x π 4 ) 2 + ( y π 4 ) 2 ] ) .
We perform the spatial discretization on a uniform grid with x = 0.0628 and y = 0.0628. By substituting the Fourier approximation of the solution, u , v , ω , into Eq. (4.1), we further integrate the resultant ODEs using a time step of t = 0.1 using the Crank–Nicolson scheme. In Fig. 7, we plot the 2D NS pseudo-spectral solver on the torus with Reynolds number, R e = 100. The discrete version of the numerical solution forms the snapshot matrices, U , V , W R 100 × 100 × 1001.
FIG. 7.

Flow field for 2D NS equation (4.1) on the torus depicted in time t = 6 , t = 16 , t = 32, and t = 80.

FIG. 7.

Flow field for 2D NS equation (4.1) on the torus depicted in time t = 6 , t = 16 , t = 32, and t = 80.

Close modal

The dictionary matrix, Φ, consists of m = 60 basis functions. The underlying dynamics is described with the three state variables, u , v , w. Thus, our dictionary of basis functions include the linear combinations of the variables, u , v , ω , u x , v x , u x x , v x x. With snapshots, U , V , W, we use the explicit difference scheme to approximate the first- or second-order derivatives in the noise-free case. We see from the discovery results that our method has strong performance with the noise-free dataset, a small MSE, and a small STD. We enumerate the re-sampling region in Table III.

In the noisy case, we use the fourth-order compact Pade scheme with the S 3 d method. We add 1 % Gaussian noise to the original snapshot and the resulting RMSE is E r r = 0.4823 for u, E r r = 0.4338 for v, and E r r = 1.7111 for w. We apply the POD method used in Refs. 57 and 58 to de-noise our data. We reshape the snapshot U into a 10 000 × 1001 matrix, U ¯; we do the same shaping for W and V. By projecting the data onto 33 POD modes obtained using a threshold of 99.99 fixed, we generate a new snapshot, U ~ R 10000 × 1001, from the matrix, U ¯: U ~ = Ψ A with Ψ consisting of 33 POD modes and the corresponding POD coefficients, A. Then, we reverse the process such that snapshots, U ¯ , V ¯ , W ¯ R 100 × 100 × 1001. The computed RMSE is 0.0056 for u, 0.0051 for v, and 0.0412 for w. We pay special attention to the Pade scheme. With the de-noised dataset, we approximate the first-order derivatives in each grid point and then we compute the second-order derivatives using the Pade scheme. After we compute the dictionary matrix, we correctly discover the NS equation as shown in Table III.

We further verify the proposed S 3 d method on the dataset originally collected in Ref. 24. Therein, these datasets are generated by simulating the KdV equation with single- and double-soliton solutions, the Burger’s equation, the Quantum Harmonic Oscillator, and the nonlinear Schrödinger equation. Here, we neglect the simulation details and refer readers to Ref. 24. Table IV shows the identified results of applying the S 3 d method to the dataset. It can be seen that the proposed S 3 d method is predominant for small sample and obtains a better discovery of these PDEs in terms of a smaller parametric error. It is worthy noting that the dictionary matrices used for all PDEs in Table IV are different from Table III. For instance, the dictionary for the sine-Gordon equation is that
1 , u , u 2 , u 3 , u 4 , u x , u x x , u x x x , u u x , u u x x , u u x x x , u 2 u x , u 2 u x x , u 2 u x x x , u 3 u x , u 3 u x x , u 3 u x x x , u 4 u x , u 4 u x x , u 4 u x x x , sin ( u ) , cos ( u ) , sin ( u ) cos ( u ) , sin 2 ( u ) , cos 2 ( u ) ,
which are constructed by any combination of the state, u ( x , t ), the corresponding derivatives up to the third-order and the periodic trigonometric functions. The dictionary constructed for the FitzHugh–Nagumo equation is that
1 , u , u 2 , u 3 , u 3 u , u x , u x x , u x x x , u u x , u u x x , u u x x x , u 2 u x , u 2 u x x , u 2 u x x x , u 3 u x , u 3 u x x , u 3 u x x x , u 4 u x , u 4 u x x , u 4 u x x x , w , w 2 , w x , w w x , w 2 w x ,
which are constructed by any combination of the states, u ( x , t ) and w ( x , t ), and the corresponding derivatives up to the third order. The main difference between them is the number of states, the order of derivatives, and the degree of polynomials. In our proposed method, we always suggest that the constructed library is sufficiently large to have a sparse representation of the dynamics underlying the dataset, and the well-developed PDE theory can be integrated to add terms with physical meaning, such as convection terms and dissipative terms.
TABLE IV.

Comparison of identified results using PDE-FIND method and S3d method with the original dataset in Ref. 24.

Example S3d method PDE-FIND method
Identified results No. of samples MSE(err)-STD(err) No. of samples MSE(err)-STD(err)
Korteweg–de Vries (single-soliton) 
u t = 4.9999 u x ( c = 5 ) u t = 0.9997 u x ( c = 1 )  750  0.0028 % ± 0.0 % 0.0303 % ± 0.0 %  12 800  0.3745 % ± 0.0 % 0.0820 % ± 0.0 % 
ut = −6.0011uux − 1.0005uxxx  1500  0.0346 % ± 0.0239 %  25 600  2.5931 % ± 1.3601 % 
Korteweg–de Vries (double-soliton) 
ut = −5.9909uux − 0.9995uxxx  9000  0.0995 % ± 0.0742 %  102 912  0.9572 % ± 0.2322 % 
ut = −5.9250uux − 0.9867uxxx  16 600  1.2894 % ± 0.0548 %  85 432  7.4727 % ± 4.9306 % 
Burgers’ equation 
ut = −0.9999uux + 0.1000uxx  2000  0.0051 % ± 0.0054 %  25 856  0.1595 % ± 0.0608 % 
ut = −1.0011uux + 0.1001uxx  5000  0.0878 % ± 0.0323 %  19 116  1.9655 % ± 1.0000 % 
Quantum harmonic oscillator 
u t = 0.5000 i u x x 1.0000 i x 2 2 u  2000  0.0090 % ± 0.0082 %  205 312  0.2486 % ± 0.0128 % 
u t = 0.4996 i u x x 1.0005 i x 2 2 u  2000  0.0693 % ± 0.0269 %  187 452  9.6889 % ± 6.9705 % 
Nonlinear Schrödinger equation 
ut = 0.4998iuxx + 1.0000i|u|2u  4500  0.0185 % ± 0.0204 %  256 512  0.0473 % ± 0.0147 % 
ut = 0.5034iuxx + 0.9987i|u|2u  7200  0.4096 % ± 0.3934 %  236 652  3.0546 % ± 1.2193 % 
Example S3d method PDE-FIND method
Identified results No. of samples MSE(err)-STD(err) No. of samples MSE(err)-STD(err)
Korteweg–de Vries (single-soliton) 
u t = 4.9999 u x ( c = 5 ) u t = 0.9997 u x ( c = 1 )  750  0.0028 % ± 0.0 % 0.0303 % ± 0.0 %  12 800  0.3745 % ± 0.0 % 0.0820 % ± 0.0 % 
ut = −6.0011uux − 1.0005uxxx  1500  0.0346 % ± 0.0239 %  25 600  2.5931 % ± 1.3601 % 
Korteweg–de Vries (double-soliton) 
ut = −5.9909uux − 0.9995uxxx  9000  0.0995 % ± 0.0742 %  102 912  0.9572 % ± 0.2322 % 
ut = −5.9250uux − 0.9867uxxx  16 600  1.2894 % ± 0.0548 %  85 432  7.4727 % ± 4.9306 % 
Burgers’ equation 
ut = −0.9999uux + 0.1000uxx  2000  0.0051 % ± 0.0054 %  25 856  0.1595 % ± 0.0608 % 
ut = −1.0011uux + 0.1001uxx  5000  0.0878 % ± 0.0323 %  19 116  1.9655 % ± 1.0000 % 
Quantum harmonic oscillator 
u t = 0.5000 i u x x 1.0000 i x 2 2 u  2000  0.0090 % ± 0.0082 %  205 312  0.2486 % ± 0.0128 % 
u t = 0.4996 i u x x 1.0005 i x 2 2 u  2000  0.0693 % ± 0.0269 %  187 452  9.6889 % ± 6.9705 % 
Nonlinear Schrödinger equation 
ut = 0.4998iuxx + 1.0000i|u|2u  4500  0.0185 % ± 0.0204 %  256 512  0.0473 % ± 0.0147 % 
ut = 0.5034iuxx + 0.9987i|u|2u  7200  0.4096 % ± 0.3934 %  236 652  3.0546 % ± 1.2193 % 
To understand the influence of the choice of dictionary matrix on discovery results, consider the one-dimensional Schrödinger equation with a cubic nonlinear term,
z t = i z x x + i ν | z | 2 z ,
(4.2)
where the Schrödinger operator i z x x represents the advection (wave propagation) and the cubic nonlinear term | z | 2 z with the dissipation coefficient ν represents the diffusion. The actual data fields we analyze are the real representation and the imaginary one of z, denoted by u ( x , t ) and v ( x , t ), respectively. Inserting z ( x , t ) = u ( x , t ) + i v ( x , t ) into (4.2), the dynamics of u ( x , t ) and v ( x , t ) are described by
u t = v x x ν v ( u 2 + v 2 ) , v t = u x x + ν u ( u 2 + v 2 ) ,
which possess the traveling-wave solution with arbitrary constants c 1 , c 2
u ( x , t ) = c 1 cos ( x + ( c 1 2 ν 1 ) t + c 2 ) , v ( x , t ) = c 1 sin ( x + ( c 1 2 ν 1 ) t + c 2 ) .
(4.3)
We take c 1 = 5 , ν = 0.1 , c 2 = 6. In Fig. 8, we display the evolution of solution, where the uniform space grid possesses 512 elements and the time interval is t = 0.003. By computation, one can prove that the traveling-wave solution in (4.3) also satisfies
u t = v x x 25 ν v , v t = u x x + 25 ν u .
We construct a dictionary consisting of 25 basis functions. The basis functions include any combination of the state, u ( x , t ), and the corresponding derivatives up to the third order. By applying the S 3d algorithm to the data, the identified results are as follows:
u t = 0.75 v x x 0.75 v , v t = 0.75 u x x + 0.75 u .
It can be seen that our algorithm is prone to select the coefficients minimizing the 1 norm. Further, if the true term u x x is not included in the candidate terms, the identified results are
u t = 1.5 v , v t = 1.5 u ,
which demonstrates that our algorithm tends to select the parsimonious model.
FIG. 8.

Traveling solution of the Schrödinger equation (4.2).

FIG. 8.

Traveling solution of the Schrödinger equation (4.2).

Close modal
We further verify the robustness of our S 3d algorithm by using the Gaussian mixture noise and the noise that stems from an Ornstein–Uhlenbeck process proposed in Refs. 54 and 55 to construct two new sets of noisy data. The Gaussian mixture noise obeys the following distribution:
( 1 ξ ) N ( 0 , σ 2 ) + ξ N ( 0 , κ σ 2 ) ,
where ξ = 0.5 , σ = 1, and κ = 2. The magnitude of the noise is set to 1 %. We randomly sub-sample 10 000 data points from the domain [ 0.9940 , 0.2209 ] × [ 2.56 , 4.144 ]. Our algorithm correctly discovers the nonlinear Schrodinger’s equation with an accuracy of M S E = 0.6907 % , S T D = 0.4139 %. The identified result is as follows:
u t = i 0.3014 u x x + i 3.2944 u | u | 2 i 3.3184 u .
The noise stemming from an Ornstein–Uhlenbeck process violates the i.i.d. Gaussian noise. We write the measurements with added colored noise as
y ( x j , t i ) = u ( x j , t i ) + η j , i ,
in which
η j , i = η j , i 1 + Δ t θ ( μ η j , i 1 ) + Δ t ϵ ,
where Δ t is the time stepsize, μ is the long-term expectation of the mean of noise, θ indicates the desire for the noise to migrate back toward the mean, and ϵ conforms to a normal distribution N ( 0 , σ 2 ) with standard deviation σ. We set η j , 1 = 0 at all spatial grid points at t 1, θ = 5, μ = 0, and σ = 0.01. We use the same snapshots and the basis functions. The experiment shows that our algorithm can correctly identify the nonlinear Schrödinger’s equation from the data corrupted by the non-Gaussian noise. The experimental result is presented as follows:
u t = i 0.3000 u x x + i 3.2809 u | u | 2 i 3.3181 u ,
where mean = 0.6818 %, STD = 0.8018 %.

This work proposes a framework to discover PDEs from experimental observations and synthetic spatiotemporal data. The proposed S 3 d method discovers the dynamics underlying a state of traveling-wave convection from experimental data, i.e., the simplest complex Ginzburg–Landau equations. Additionally, many canonical PDEs are also re-discovered from synthetic spatiotemporal data. The merit of the proposed method is its ability to freely construct a model class with candidate functions and to automatically select the key ones that reproduce the observed spatiotemporal patterns. Benefiting from sparsity, the inferred PDEs are parsimonious and accurate, enabling interpretability. In the various examples, we observe that we are additionally able to robustly handle different types of noise and the measurement data with small sample size. We expect the S 3 d method to be useful for the modeling of spatiotemporal dynamics from experimental data. This framework, as demonstrated through numerous examples, could potentially accelerate the discovery of new laws and stimulate physical explanations for the discovered equations, which lead to the discovery of the underlying mechanisms.

Although the dynamics we refer to in this work are Eulerian dynamics described by PDEs, S 3 d, as a general method, is also able to infer ODEs and static functional relations using datasets in Refs. 24 and 59. S 3 d unifies results for the discovery of natural laws.5,59 Future work will focus on three important aspects of extension of our method for a wider range of practical applications. First, the currently proposed method relies on estimating temporal and spatial derivatives of the measured state variables from data, while there are cases that the system variables are not necessarily observable. Second, the current method can reconstruct equations that are linear with respect to parameters. We will extend this proposed method to the cases that are nonlinear in parameters. Finally, we assume that the constructed library is sufficiently large to have a sparse representation of the dynamics underlying the dataset. However, when there is no any prior knowledge on the to-be-identified governing equation, it is possible that the constructed library is insufficient. In these cases, genetic symbolic regression works much better. We may integrate genetic symbolic regression and the sparse regression into a unified framework, which could overcome this limitation.

The authors acknowledge financial support from the National Natural Science Foundation of China under Grant Nos. 92167201 and 61903148.

We wish to thank Dr. Paul Kolodner for useful discussion and allowing us to use the experimental data.

The authors have no conflicts to disclose.

Ye Yuan: Conceptualization (lead); Data curation (equal); Formal analysis (lead); Funding acquisition (lead); Investigation (lead); Methodology (lead); Project administration (lead); Resources (equal); Software (equal); Supervision (lead); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead). Xiuting Li: Data curation (equal); Formal analysis (lead); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Writing – review & editing (equal). Liang Li: Formal analysis (equal); Investigation (equal); Software (equal); Validation (equal). Frank Jiang: Formal analysis (equal); Investigation (equal); Software (equal); Validation (equal). Xiuchuan Tang: Formal analysis (equal); Investigation (equal); Validation (equal). Fumin Zhang: Formal analysis (equal); Project administration (equal); Supervision (equal). Jorge Goncalves: Formal analysis (equal); Project administration (equal); Writing – review & editing (equal). Henning U. Voss: Data curation (equal); Investigation (equal); Supervision (equal). Han Ding: Funding acquisition (equal); Project administration (equal); Supervision (equal). Jürgen Kurths: Data curation (equal); Methodology (equal); Project administration (equal); Resources (equal); Writing – review & editing (equal).

All synthetic data and codes used in this paper are publicly available in GitHub at https://github.com/HAIRLAB/S3d, Ref. 60.

For noise-contaminated data, polynomial approximation is a better choice to alleviate effects due to noise. With sampled data, u ( x j , t i ), at time, t i, with j = 1 , , n x, we construct an approximation of the q-order derivative q u ( x , t ) x q by selecting the following sequence of polynomials of degree p N + with q < p:
L p ( x ) = a 0 + a 1 x + a 2 x 2 + + a p x p ,
subject to L p ( x j ) = u ( x j , t i ). For example,
a 0 + a 1 x 1 + a 2 x 1 2 + + a p x 1 p = u ( x 1 , t i ) , a 0 + a 1 x 2 + a 2 x 2 2 + + a p x 2 p = u ( x 2 , t i ) , a 0 + a 1 x n x + a 2 x n x 2 + + a p x n x p = u ( x n x , t i ) .
(A1)
Further, we write Eq. (A1) into matrix form and solve for parameters using a QR factorization. Then, with u ( x , t i ) = L p ( x ), we compute the qth-order derivative, q u ( x , t ) x q. We demonstrate in our experiment that the error values caused by noise can be removed using polynomial approximation, leading a closer-to-real estimation of the derivative.
1
J. D.
Farmer
and
J. J.
Sidorowich
,
Phys. Rev. Lett.
59
,
845
(
1987
).
2
W.-X.
Wang
,
R.
Yang
,
Y.-C.
Lai
,
V.
Kovanis
, and
C.
Grebogi
,
Phys. Rev. Lett.
106
,
154101
(
2011
).
3
B.
Dorneanu
,
S.
Zhang
,
H.
Ruan
,
M.
Heshmat
,
R.
Chen
,
V. S.
Vassiliadis
, and
H.
Arellano-Garcia
,
Front. Eng. Manag.
9
,
623
(
2022
).
4
I. G.
Kevrekidis
,
C. W.
Gear
,
J. M.
Hyman
,
P. G.
Kevrekidid
,
O.
Runborg
, and
C.
Theodoropoulos
,
Commun. Math. Sci.
1
,
715
(
2003
).
5
J.
Bongard
and
H.
Lipson
,
Proc. Natl. Acad. Sci.
104
,
9943
(
2007
).
6
Z.
Shen
,
W.-X.
Wang
,
Y.
Fan
,
Z.
Di
, and
Y.-C.
Lai
,
Nat. Commun.
5
,
4323
(
2014
).
7
I.
Mezić
,
Annu. Rev. Fluid Mech.
45
,
357
(
2013
).
8
M. O.
Williams
,
I. G.
Kevrekidis
, and
C. W.
Rowley
,
J. Nonlinear Sci.
25
,
1307
(
2015
).
9
J. H.
Tu
,
C. W.
Rowley
,
D. M.
Luchtenburg
,
S. L.
Brunton
, and
J. N.
Kutz
,
J. Comput. Dyn.
1
,
391
(
2014
).
10
D.
Giannakis
,
Appl. Comput. Harmon.
(
2019
).
11
L.
Ljung
,
System Identification: Theory for the User
(
Prentice-Hall, Inc.
,
Upper Saddle River, NJ
,
1986
). ISBN 0-138-81640-9.
12
A.
Lindquist
and
G.
Picci
,
Linear Stochastic Systems: A Geometric Approach to Modeling, Estimation and Identification
(
Springer
,
2015
), Vol. 1.
13
D.
Xu
and
O.
Khanmohamadi
,
Chaos
18
,
043122
(
2008
).
14
O.
Khanmohamadi
and
D.
Xu
,
Chaos
19
,
033117
(
2009
).
15
M.
Bär
,
R.
Hegger
, and
H.
Kantz
,
Phys. Rev. E
59
,
337
(
1999
).
16
L.
Guo
and
S. A.
Billings
,
IEEE Trans. Circuits Syst. II: Express Briefs
53
,
657
(
2006
).
17
T.
Müller
and
J.
Timmer
,
Int. J. Bifurcation Chaos
14
,
2053
(
2004
).
18
L.
Breiman
and
J. H.
Friedman
,
J. Am. Stat. Assoc.
80
,
580
(
1985
).
19
H. U.
Voss
,
P.
Kolodner
,
M.
Abel
, and
J.
Kurths
,
Phys. Rev. Lett.
83
,
3422
(
1999
).
20
H.
Voss
,
M.
Bünner
, and
M.
Abel
,
Phys. Rev. E
57
,
2820
(
1998
).
21
R.
Tibshirani
,
M.
Wainwright
, and
T.
Hastie
,
Statistical Learning with Sparsity: The Lasso and Generalizations
(
Chapman and Hall/CRC
,
2015
).
22
W.
Pan
,
Y.
Yuan
,
J.
Gonçalves
, and
G.-B.
Stan
, in 2012 IEEE 51st Annual Conference on Decision and Control (CDC) (IEEE, 2012), pp. 2334–2339.
23
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
,
Proc. Natl. Acad. Sci.
113
,
312
(
2016
).
24
S. H.
Rudy
,
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
,
Sci. Adv.
3
,
e1602614
(
2017
).
25
H.
Schaeffer
,
Proc. R. Soc. A
473
,
20160446
(
2017
).
26
M. E.
Tipping
,
J. Mach. Learn. Res.
1
,
211
(
2001
).
27
A. C.
Faul
and
M. E.
Tipping
, in Advances in Neural Information Processing Systems (The MIT Press, 2002), pp. 383–389.
28
B. D.
Rao
,
K.
Engan
,
S. F.
Cotter
,
J.
Palmer
, and
K.
Kreutz-Delgado
,
IEEE Trans. Signal Process.
51
,
760
(
2003
).
29
J.
Palmer
,
K.
Kreutz-Delgado
,
B. D.
Rao
, and
D. P.
Wipf
, in Advances in Neural Information Processing Systems (The MIT Press, 2006), pp. 1059–1066.
30
M.
Figueiredo
, in Advances in Neural Information Processing Systems (The MIT Press, 2002), pp. 697–704.
31
D. P.
Wipf
and
B. D.
Rao
,
IEEE Trans. Signal Process.
52
,
2153
(
2004
).
32
D. P.
Wipf
,
B. D.
Rao
, and
S.
Nagarajan
,
IEEE Trans. Inf. Theory
57
,
6236
(
2011
).
33
M.
Raissi
,
P.
Perdikaris
, and
G. E.
Karniadakis
,
J. Comput. Phys.
378
,
686
(
2019
).
34
P.
Kolodner
,
Phys. Rev. Lett.
69
,
2519
(
1992
).
35
P.
Kolodner
,
S.
Slimani
,
N.
Aubry
, and
R.
Lima
,
Physica D
85
,
165
(
1995
).
36
C.
Shu
and
B. E.
Richards
,
Int. J. Numer. Methods Fluids
15
,
791
(
1992
).
38
E.
Schrödinger
,
Phys. Rev.
28
,
1049
(
1926
).
39
R. J.
LeVeque
,
Finite Difference Methods for Ordinary and Partial Differential Equations: Steady-State and Time-Dependent Problems
(
SIAM
,
2007
), Vol. 98.
40
J.
Li
and
Y.-T.
Chen
,
Computational Partial Differential Equations Using MATLAB
(
Chapman and Hall/CRC
,
2008
).
41
L. N.
Trefethen
,
Spectral Methods in MATLAB
(
SIAM
,
2000
), Vol. 10.
42
A. G.
Baydin
,
B. A.
Pearlmutter
,
A. A.
Radul
, and
J. M.
Siskind
,
J. Mach. Learn. Res.
18
,
5595
(
2017
).
43
D. P.
Wipf
and
S. S.
Nagarajan
, in Advances in Neural Information Processing Systems (The MIT Press, 2008), pp. 1625–1632.
44
M. C.
Cross
and
P. C.
Hohenberg
,
Rev. Mod. Phys.
65
,
851
(
1993
).
45
M. C.
Cross
,
Phys. Rev. A
38
,
3593
(
1988
).
46
A. C.
Newell
, in Lectures in Applied Mathematics (American Mathematical Society, Providence, RI, 1974), Vol. 15, p. 237.
47
P.
Kolodner
,
J. A.
Glazier
, and
H.
Williams
,
Phys. Rev. Lett.
65
,
1579
(
1990
).
48
P.
Kolodner
,
Phys. Rev. A
44
,
6466
(
1991
).
49
P.
Kolodner
and
H.
Williams
, in Proceedings of the NATO Advanced Research Workshop on Nonlinear Evolution of Spatio-temporal Structures in Dissipative Continuous Systems (
Springer Science & Business Media
,
1990
).
50
G.
Strang
,
Computational Science and Engineering
(
Wellesley-Cambridge Press, Wellesley
,
2007
), Vol. 791.
51
S.
Linz
and
M.
Lücke
,
Phys. Rev. A
35
,
3997
(
1987
).
52
W.
Schöpf
and
W.
Zimmermann
,
EPL (Europhys. Lett.)
8
,
41
(
1989
).
53
W.
Schöpf
and
L.
Kramer
,
Phys. Rev. Lett.
66
,
2316
(
1991
).
54
G. E.
Uhlenbeck
and
L. S.
Ornstein
,
Phys. Rev.
36
,
823
(
1930
).
55
D. J.
Higham
,
SIAM Rev.
43
,
525
(
2001
).
56
G.
Strang
,
Computational Science and Engineering
(
Wellesley-Cambridge Press, Wellesley
,
2007
), Vol. 791.
57
G.
Berkooz
,
P.
Holmes
, and
J. L.
Lumley
,
Annu. Rev. Fluid Mech.
25
,
539
(
1993
).
58
L.
Sirovich
,
Q. Appl. Math.
45
,
561
(
1987
).
59
M.
Schmidt
and
H.
Lipson
,
Science
324
,
81
(
2009
).
60
Y.
Yuan
, et al. (
2018
)
Matlab code for 'Machine discovery of partial differential equations from spatiotemporal data
Github.
https://github.com/HAIRLAB/S3d