This study presents a general framework, namely, Sparse Spatiotemporal System Discovery ( $ S 3d$), for discovering dynamical models given by Partial Differential Equations (PDEs) from spatiotemporal data. $ S 3d$ is built on the recent development of sparse Bayesian learning, which enforces sparsity in the estimated PDEs. This approach enables a balance between model complexity and fitting error with theoretical guarantees. The proposed framework integrates Bayesian inference and a sparse *priori* distribution with the sparse regression method. It also introduces a principled iterative re-weighted algorithm to select dominant features in PDEs and solve for the sparse coefficients. We have demonstrated the discovery of the complex Ginzburg–Landau equation from a traveling-wave convection experiment, as well as several other PDEs, including the important cases of Navier–Stokes and sine-Gordon equations, from simulated data.

Data-driven methods provide a viable alternative to more traditional approaches, such as first-principles derivation, for accelerating the discovery of physical laws. This work aims to develop a sparse Bayesian learning-based framework for modeling spatiotemporal dynamical systems described by Partial Differential Equations (PDEs). The proposed framework, based on sparse Bayesian learning (SBL), can reconstruct a variety of prototypical PDEs, including the Navier–Stokes equation, the FitzHugh–Nagumo equation for nerve conduction, and the Schr $ o \xa8$dinger equation in quantum mechanics, using a small number of samples compared to state-of-the-art (SOTA) methods. In particular, we demonstrate that the data-driven discovery method can automatically extract the first-principles-based coupled complex Ginzburg–Landau equations from data collected in a convection experiment.

## I. INTRODUCTION

Recently, there has been a surge of interest in the data-driven discovery of physical laws, which applies computational methods to discover representations from data. There have been reports of many successes using data-driven methods, such as learning a nonlinear mapping from time series,^{1} predicting catastrophes in nonlinear systems,^{2} creating autonomous smart systems capable of self-adaption and self-regulation for short- and long-term production management,^{3} equation-free modeling multi-scale/complex systems,^{4} inferring models of nonlinear coupled dynamical systems,^{5} reconstructing networks with stochastic dynamical processes,^{6} and extracting Koopman modes from data.^{7–10}

In the system identification community, a standard way is to transform the model discovery problem into a regression problem and design a good optimization algorithm to solve the problem.^{11,12} There are several numerical algorithms to find solutions to problems. The first kind of such a method builds on the ordinary least-squares regression. Examples include orthogonal reduction,^{13,14} the backward elimination scheme,^{15,16} least-squares estimator,^{17} and the alternating conditional expectation algorithm.^{18–20} These approaches do have a good fitting accuracy as described by Ref. 21 which, however, requires a relatively large number of data to train. The second type of method is the sparse regression approach, where a sparse prior assumption of underlying governing equations is incorporated to tackle the outstanding issue, overfitting. By converting the discovery problem of differential equations to a sparse regression problem, Ref. 22, and later, Brunton *et al.*^{23} developed a sparse identification of nonlinear dynamics framework to extract ODEs from noisy measurement data. Rudy *et al.*^{24} and Schaeffer^{25} consequently extended the method in Ref. 23 to recover PDEs from spatiotemporal data.

On the development of sparse optimization problem, sparse Bayesian learning (SBL) method, originally proposed in Ref. 26 and further developed in Refs. ^{27–29}, adopts a Bayesian probabilistic framework to find sparse solutions. By treating the parameter as a random variable with sparse-inducing prior distributions determined by a set of hyperparameters and then calculating the posterior distribution of the to-be-estimated parameter given the data, it has been shown to be more general and powerful in finding a maximally sparse solution. For instance, the maximum a posteriori (MAP) estimation with a fixed weight prior is its special case;^{28} the Lasso problem can be interpreted as the explicit MAP with Laplace prior.^{30,31} Specifically, Wipf *et al.*^{32} established the equivalence of the SBL cost function and the objectives in canonical MAP-based sparse methods (e.g., the Lasso) via a dual-space analysis.

In this work, we propose a SBL framework for data-driven PDE discovery problems. This framework integrates Bayesian inference and sparse *priori* distribution with the sparse regression method. In our sparse Bayesian method derivation, an iterative re-weighted $ \u2113 1$-minimization algorithm is derived to approximate the original $ \u2113 0$-minimization. Compared with PDE-FIND with the Ridge Regression, which uses a thresholding $ \u2113 2$-minimization algorithm, $ \u2113 1$-minimization algorithm is known to pick the sparser solution and, therefore, can better approximate the solution to the original problem. Additionally, we also analyze spatiotemporal data observed from an experiment on convection in an ethanol water mixture in a long, narrow, annular container which is heated from below and demonstrate that the proposed data-driven discovery method is able to automatically extract the governing PDEs from spatiotemporal binary-fluid convection data. This is distinct from the previous works,^{24,33} in which the data used for discovering the PDEs are generated from the numerical solution to the known PDEs. Our key empirical findings are the following:

We demonstrate that an interpretable first-principle model can be discovered from inherently noisy experimental data. The proposed SBL method discovers a coupled complex Ginzburg–Landau equation (CGLE) using spatiotemporal binary-fluid convection data. Furthermore, it has been proven that the discovered values of various coefficients in the CGLE, such as the group velocity, the dispersion coefficients, the saturation parameter and the coupling coefficient, can be accurately determined in terms of the theoretical values and experimental ones provided in Refs.

^{19,34}, and 35. Moreover, the numerical solution to the discovered CGLE matches the observed experimental phenomenon.We demonstrate that the proposed SBL framework is capable of reconstructing a variety of prototypical PDEs, including the Navier–Stokes equation in fluid mechanics,

^{36}the FitzHugh–Nagumo for nerve conduction,^{37}the Schr $ o \xa8$dinger equation in quantum mechanics,^{38}the sine-Gordon equation, and the Korteweg–de Vries equation. By comparing with the state-of-the-art PDE-FIND method in Ref. 24 and the Douglas-Rachford method in Ref. 25, the proposed algorithm is robust to measurement noise and is able to discover the underlying PDEs using a much smaller number of samples.

The rest of this paper is organized as follows: In Sec. II, a Bayesian framework for discovering PDEs from spatiotemporal data is presented. In Secs. III and IV, the proposed framework has been applied to the discovery of PDE models using both experimental and synthetic data. Conclusions and perspectives are drawn in Sec. V.

**Notations:** Throughout this paper, we denote vectors by lowercase letters, e.g., $x$, and matrices by uppercase letters, e.g., $X$. For a vector $x$ and a matrix $X$, we denote its transpose by $ x T$ and $ X T$, respectively. Furthermore, $ x i$ refers to its $i$th element of $x$. $\Vert x \Vert 2= x T x$ and $\Vert x \Vert 1= \u2211 \u2113 | x \u2113 |$ denote its Euclidean norm and $ \u2113 1$ norm, respectively. For a differentiable function $f: R n\u2192 R$, we call $\u2207f$ and $ \u2202 kf/\u2202 x i k$ (abbreviated as $ f x i$) the derivative of $f$ and the $k$th partial derivative of $f$ with respect to $ x i$, respectively. For a convex function $f: R n\u2192 R$, we say vector $v\u2208 R n$ is a sub-gradient of $f$ at $x\u2208 R n$ if for all $y\u2208 R n$, $f(y)\u2265f(x)+ v T(y\u2212x)$. The set of all subgradients at $x$ is called the subdifferential at $x$ and is denoted $\u2202f(x)$.

## II. S^{3}d: A BAYESIAN FRAMEWORK TO DISCOVER PDES FROM SPATIOTEMPORAL DATA

### A. Model specification

^{22}For instance, consider that the preselected candidate terms include the state $u$ and the derivative of $u$ up to the third order. Then, Eq. (2.2) can be rewritten into the following regression form:

^{39–41}and the symbolic or automatic differentiation

^{42}can be used for the estimation of the derivatives in the dictionary matrix $\Phi $. In our framework, the polynomial approximation method is selected to estimate the derivatives from the noise-contained measurements. Indeed, the polynomial approximation method has been proved to be capable of effectively estimating the derivatives from the noise-contained data in Ref. 24. For completeness, we explain the polynomial approximation method in detail in the Appendix.

### B. Sparse Bayesian learning for S^{3}d

### C. Convergence guarantee of the sparse Bayesian learning algorithm

Input: $\Phi \u2208 R n \xd7 m$: design matrix; $y\u2208 R n$: observation vector; $\tau $: |

tolerance; |

Output: $ \theta ( k + 1 )$ |

1: initial $ \Gamma ( 0 )= I m$, $k=0$; |

2: repeat |

3: $ c ( k )= diag [ \Phi T ( \sigma 2 I n + \Phi \Gamma ( k ) \Phi T ) \u2212 1 \Phi ]$; |

4: $ \theta ( k + 1 )\u2208arg\u2061 min \theta {\Vert y\u2212\Phi \theta \Vert 2+2 \sigma 2 \u2211 i c i ( k ) | \theta i |}$; |

5: $ \gamma j ( k + 1 )= | \theta j ( k + 1 ) | c j ( k )$, $ \Gamma ( k + 1 )= diag ( \gamma ( k + 1 ) )$; |

6: $k=k+1$ |

7: until $ ( L ( \theta ( k ) , \gamma ( k ) ) \u2212 L ( \theta ( k + 1 ) , \gamma ( k + 1 ) ) \u2264 \tau )$ |

Input: $\Phi \u2208 R n \xd7 m$: design matrix; $y\u2208 R n$: observation vector; $\tau $: |

tolerance; |

Output: $ \theta ( k + 1 )$ |

1: initial $ \Gamma ( 0 )= I m$, $k=0$; |

2: repeat |

3: $ c ( k )= diag [ \Phi T ( \sigma 2 I n + \Phi \Gamma ( k ) \Phi T ) \u2212 1 \Phi ]$; |

4: $ \theta ( k + 1 )\u2208arg\u2061 min \theta {\Vert y\u2212\Phi \theta \Vert 2+2 \sigma 2 \u2211 i c i ( k ) | \theta i |}$; |

5: $ \gamma j ( k + 1 )= | \theta j ( k + 1 ) | c j ( k )$, $ \Gamma ( k + 1 )= diag ( \gamma ( k + 1 ) )$; |

6: $k=k+1$ |

7: until $ ( L ( \theta ( k ) , \gamma ( k ) ) \u2212 L ( \theta ( k + 1 ) , \gamma ( k + 1 ) ) \u2264 \tau )$ |

## III. DISCOVERY OF COMPLEX GINZBURG–LANDAU EQUATIONS FROM BINARY-FLUID CONVECTION EXPERIMENT

In this section, we apply the proposed $ S 3d$ method to discover the complex Ginzburg–Landau equation (CGLE) from a binary-fluid convection experiment. Traveling-wave (TW) convection in binary fluids is a known, mainstream ansatz technique for studying the physical mechanism of non-equilibrium pattern-forming systems.^{44} Substantial experiments on TW convection unfold many different dynamical states of TW. A major challenge in the scientific study of TW is quantitatively understanding and predicting the underlying dynamics. First-principle models based on the CGLE and its variants^{45,46} have long been used to quantitatively explain experimental observations.

### A. Experimental data description

The experimental data are collected from an experiment on convection in an ethanol/water mixture in a long, narrow, annular container which is heated from below.^{19,34} The objective of the experiment is to probe into the counterpropagating wave packet formation regime. As reported in Refs. 34,47, and 48, a regular, small-amplitude traveling-wave (TW) state is observed in this experiment, which consists of pairs of quasilinear wave packets which propagate around an annular cell in opposite directions, referred to as “left” and “right.” In Fig. 2, we show the dynamics of the TW state for the bifurcation parameter $\epsilon $ (scaled by the characteristic time $ \tau 0$ defined below): $\epsilon \tau 0=1.77\xd7 10 \u2212 3$, where the left-TW rapidly moves to the left with small amplitudes [denoted by $ A L(x,t)$] and collides periodically with the right-TW [denoted by $ A R(x,t)$].

^{49}to the signals from the annular photodiode array, which records the flow-pattern image. The sampling period of these data is $\u25b3t=1.5625$, and the space sample interval is $\u25b3x= 0.458 167$. There are seven sets of data collected from the same experimental setup with seven different bifurcation parameters $\epsilon $ shown in Table I. For each dataset, a total of $180\xd7760$ data points is used for model discovery. Due to the use of the complex amplitudes, which is different from Ref. 19, where the real amplitudes and phases, $ a L , R(x,t)$ and $ \varphi L , R(x,t)$ defined by $ A L , R(x,t)= a L , R(x,t) e i \varphi L , R ( x , t )$, are used, Eq. (2.4) takes a complex form, that is, $ y ~= \Phi ~ \theta ~$, where $ y ~, \theta ~$ are complex vectors and $ \Phi ~$ is a complex matrix. For this, we consider the following one-to-one mapping from the complex vector $ y ~$ to a real vector $y$ and construct the following real matrix $\Phi $:

. | Bifurcation . | . | Sampling . | Space . | Number of points . |
---|---|---|---|---|---|

Experiment label . | parameter . | Sample number . | interval . | interval . | after truncation . |

cal06172/car06172 | γ = 9.32 × 10^{−3} | 988 × 180 | 1.5625 | 0.458 167 | 228 |

cal06182/car06182 | γ = 4.22 × 10^{−3} | 800 × 180 | 1.5625 | 0.458 167 | 40 |

cal06192/car06192 | γ = 1.77 × 10^{−3} | 1000 × 180 | 1.5625 | 0.458 167 | 240 |

cal06212/car06212 | γ = 6.38 × 10^{−3} | 1000 × 180 | 1.5625 | 0.458 167 | 240 |

cal06222/car06222 | γ = 12.07 × 10^{−3} | 1031 × 180 | 1.5625 | 0.458 167 | 271 |

cal06242/car06242 | γ = 14.03 × 10^{−3} | 1000 × 180 | 1.5625 | 0.458 167 | 240 |

cal06252/car06252 | γ = 16.28 × 10^{−3} | 1000 × 180 | 1.5625 | 0.458 167 | 240 |

. | Bifurcation . | . | Sampling . | Space . | Number of points . |
---|---|---|---|---|---|

Experiment label . | parameter . | Sample number . | interval . | interval . | after truncation . |

cal06172/car06172 | γ = 9.32 × 10^{−3} | 988 × 180 | 1.5625 | 0.458 167 | 228 |

cal06182/car06182 | γ = 4.22 × 10^{−3} | 800 × 180 | 1.5625 | 0.458 167 | 40 |

cal06192/car06192 | γ = 1.77 × 10^{−3} | 1000 × 180 | 1.5625 | 0.458 167 | 240 |

cal06212/car06212 | γ = 6.38 × 10^{−3} | 1000 × 180 | 1.5625 | 0.458 167 | 240 |

cal06222/car06222 | γ = 12.07 × 10^{−3} | 1031 × 180 | 1.5625 | 0.458 167 | 271 |

cal06242/car06242 | γ = 14.03 × 10^{−3} | 1000 × 180 | 1.5625 | 0.458 167 | 240 |

cal06252/car06252 | γ = 16.28 × 10^{−3} | 1000 × 180 | 1.5625 | 0.458 167 | 240 |

### B. Discovery of CGLE

We proceed the discovery process by two steps: first, the proposed $ S 3d$ method intensively determines the dominant physical terms from a large library of candidate terms. We construct the function dictionary using brief domain knowledge in fluid dynamics theory. Possible terms may include the terms appearing in the Navier–Stokes equations (such as $A\u22c5\u2207A,\u25b3A$) or the ones appearing in the first principles-based derived complex Ginzburg–Landau equations (such as the higher-order nonlinear terms $ A 2 \u2202 xA, |A | 2A$). In general, the amplitude itself and its derivative up to some custom order $k$ should be added in order to properly model the physical system. Furthermore, to correctly model states of counterpropagating TW, the coupling terms like $ | A L | 2 A R$ and $ | A R | 2 A L$ must be added to account for the interactions between the left- and right-going TW. In this example, we take the third-order Volterra expansions of $ A L , R$ and $ | A L , R |$ and use the fourth-order compact Pade scheme^{50} for the derivative estimation. It is shown that the Pade scheme is much more robust than the explicit fourth-order difference scheme for the noise. We investigate the performance of the $ S 3d$ method on the last three sets of data with a larger signal-to-noise ratio $ ( that \epsilon \tau 0 \u2212 1 is , \epsilon \tau 0 \u2212 1 \u2265 12.07 \xd7 10 \u2212 3 )$ in Table I. The second step is to identify coefficients for each identified dominant term, where all the data displayed in Table I are used for the coefficient estimations. The used candidate terms include $A$, $ |A |$, $ A 2$, $ |A | 2$, $ A 3$, $ A 2 |A |$, $A |A | 2$, $ |A | 3$, $ A x x$, $A A x$, $A A x x$, $ A 2 A x$, $ A 2 A x x$, $ A 3 A x$, $ A 3 A x x$, and $ | A R | 2A$ for the right-going TW.

^{51–53}and experimental

^{47,48}works, the CGLE model has been proven to be potential to qualitatively explain weakly nonlinear dynamics near onset and closely parallel the experimental observations. Here, we apply the sparse Bayesian learning method to make a quantitive comparison of the experimental data with the simplest CGLE model.

Based on the analysis above, we further apply the proposed $ S 3d$ method to fine-tune the coefficients of CGLE. For TW convection, most parameters in CGLE (the coupling coefficient is an exception) have been measured in Ref. 35. Here, to show our identified results in an accessible form, we introduce the concept of the leave-one-out in machine learning to discuss the best method to choose the regularization parameters $\lambda $ and obtain the best-fit coefficients. We accept the parameters that lead to the smallest fitting error on the test sets: $err= \Vert Y \u2212 \Phi \omega \Vert 2 \Vert Y \Vert 2$. In Table II, we summarized the identified coefficients, from which it can be seen that the estimated coefficients mostly resemble the theoretical values and experimental ones (Refs. 19,34, and 35 and reference therein), except for some obscure parameters in theory and experiment. Particularly, we observe that for smaller bifurcation parameters, the estimated coupling coefficients possess the same characteristic, that is, $h \tau 0 \u2212 1$ for the right-going TW component $ A R$ is smaller than ones for the $ A L$, and the value of $h c 3 \tau 0 \u2212 1$ approximate $\u2212150$. But for the three datasets with the highest signal-to-noise ratio, we clearly find that the estimated coefficients are unstable, which may arise from the bad estimates of the derivatives. Additionally, it is interesting to note that all the seven analyzed datasets have one thing in common, that is, the negativity of the coupling coefficients, which indicates a stabilizing nonlinear competition between the two TW and a slowing down of the TW group velocity, which is also similar to Ref. 19 and agrees with the experimental observation.^{34}

Coefficients in the discovered complex Ginzburg–Landau equation . | |||||||||
---|---|---|---|---|---|---|---|---|---|

Experiment label . | s
. | $\epsilon \tau 0 \u2212 1$ . | $ \xi 0 2 \tau 0 \u2212 1$ . | $g \tau 0 \u2212 1$ . | $h \tau 0 \u2212 1$ . | $\epsilon \tau 0 \u2212 1 c 0$ . | $ \xi 0 2 c 1 \tau 0 \u2212 1$ . | $g c 2 \tau 0 \u2212 1$ . | $h c 3 \tau 0 \u2212 1$ . |

cal06172cb | 0.5895 | 0.0065 | 0.3392 | −17.1619 | −6.4595 | 0.0443 | −0.2185 | −89.0851 | −139.7500 |

car06172cb | 0.5982 | 0.0079 | 0.9210 | 17.7800 | −25.2965 | 0.0640 | −0.1243 | −206.3442 | −150.6041 |

cal06182cb | 0.6268 | 0.0057 | 0.0000 | −78.0836 | −0.8076 | −0.0094 | −0.3774 | 0.0000 | −154.4844 |

car06182cb | 0.6308 | 0.0081 | 0.0935 | −90.2362 | −35.0632 | −0.0059 | −0.4397 | −55.8157 | −157.3323 |

cal06192cb | 0.5930 | 0.0054 | 0.0148 | −139.4494 | −2.5230 | −0.0066 | −0.6457 | −37.6346 | −152.7833 |

car06192cb | 0.5916 | 0.0087 | 0.0158 | −198.6799 | −34.9115 | −0.0066 | −0.9189 | −68.1182 | −141.3105 |

cal06212cb | 0.6063 | 0.0057 | 0.0639 | −45.0095 | −7.9602 | 0.0033 | −0.2732 | −35.3580 | −154.5495 |

car06212cb | 0.6131 | 0.0042 | 0.0202 | −26.1553 | −33.0555 | 0.0019 | −0.1489 | −17.7879 | −168.7832 |

cal06222cb | 0.5741 | 0.0173 | 1.7492 | 40.1657 | −15.1882 | 0.0281 | −0.0424 | −379.5853 | −136.5903 |

car06222cb | 0.5771 | 0.0127 | 1.3000 | 29.9928 | −16.6945 | 0.0114 | −0.0894 | −272.3181 | −161.5083 |

cal06242cb | 0.5509 | 0.0138 | 1.0701 | 16.2302 | −17.3846 | 0.0555 | −0.1505 | −234.8376 | −144.0718 |

car06242cb | 0.5597 | 0.0150 | 1.1368 | 13.1359 | −4.5687 | 0.0567 | −0.2108 | −237.9784 | −149.1559 |

cal06252cb | 0.5634 | 0.0199 | 1.1690 | 2.1061 | −5.2084 | 0.0867 | −0.2690 | −267.3739 | −169.0035 |

car06252cb | 0.5481 | 0.0131 | 1.1075 | 22.3429 | −3.9619 | 0.0794 | −0.1245 | −234.6182 | −179.6777 |

Coefficients in the discovered complex Ginzburg–Landau equation . | |||||||||
---|---|---|---|---|---|---|---|---|---|

Experiment label . | s
. | $\epsilon \tau 0 \u2212 1$ . | $ \xi 0 2 \tau 0 \u2212 1$ . | $g \tau 0 \u2212 1$ . | $h \tau 0 \u2212 1$ . | $\epsilon \tau 0 \u2212 1 c 0$ . | $ \xi 0 2 c 1 \tau 0 \u2212 1$ . | $g c 2 \tau 0 \u2212 1$ . | $h c 3 \tau 0 \u2212 1$ . |

cal06172cb | 0.5895 | 0.0065 | 0.3392 | −17.1619 | −6.4595 | 0.0443 | −0.2185 | −89.0851 | −139.7500 |

car06172cb | 0.5982 | 0.0079 | 0.9210 | 17.7800 | −25.2965 | 0.0640 | −0.1243 | −206.3442 | −150.6041 |

cal06182cb | 0.6268 | 0.0057 | 0.0000 | −78.0836 | −0.8076 | −0.0094 | −0.3774 | 0.0000 | −154.4844 |

car06182cb | 0.6308 | 0.0081 | 0.0935 | −90.2362 | −35.0632 | −0.0059 | −0.4397 | −55.8157 | −157.3323 |

cal06192cb | 0.5930 | 0.0054 | 0.0148 | −139.4494 | −2.5230 | −0.0066 | −0.6457 | −37.6346 | −152.7833 |

car06192cb | 0.5916 | 0.0087 | 0.0158 | −198.6799 | −34.9115 | −0.0066 | −0.9189 | −68.1182 | −141.3105 |

cal06212cb | 0.6063 | 0.0057 | 0.0639 | −45.0095 | −7.9602 | 0.0033 | −0.2732 | −35.3580 | −154.5495 |

car06212cb | 0.6131 | 0.0042 | 0.0202 | −26.1553 | −33.0555 | 0.0019 | −0.1489 | −17.7879 | −168.7832 |

cal06222cb | 0.5741 | 0.0173 | 1.7492 | 40.1657 | −15.1882 | 0.0281 | −0.0424 | −379.5853 | −136.5903 |

car06222cb | 0.5771 | 0.0127 | 1.3000 | 29.9928 | −16.6945 | 0.0114 | −0.0894 | −272.3181 | −161.5083 |

cal06242cb | 0.5509 | 0.0138 | 1.0701 | 16.2302 | −17.3846 | 0.0555 | −0.1505 | −234.8376 | −144.0718 |

car06242cb | 0.5597 | 0.0150 | 1.1368 | 13.1359 | −4.5687 | 0.0567 | −0.2108 | −237.9784 | −149.1559 |

cal06252cb | 0.5634 | 0.0199 | 1.1690 | 2.1061 | −5.2084 | 0.0867 | −0.2690 | −267.3739 | −169.0035 |

car06252cb | 0.5481 | 0.0131 | 1.1075 | 22.3429 | −3.9619 | 0.0794 | −0.1245 | −234.6182 | −179.6777 |

### C. Model validation

^{19}that is,

## IV. DISCOVERY OF PROTOTYPICAL PDES

Table III demonstrates that $ S 3d$ method has successfully discovered the sine-Gordon equation, the Korteweg–de Vries equation, the FitzHugh–Nagumo equation, the Schrödinger’s equation, and the Navier–Stokes equation. Due to page limit, we only demonstrate the discovery results for the Navier–Stokes equation.

PDE . | Noise case . | Points . | Identified PDE . | Mean(err)±STD(err) . |
---|---|---|---|---|

Sine-Gordon equation | No noise | 10 | u_{tt} = 0.9999u_{xx} − 0.9986sin (u) | $0.0732%\xb10.0893%$ |

With noise | 50 | u_{tt} = 0.9934u_{xx} − 0.9986sin (u) | $0.3999%\xb10.3738%$ | |

Korteweg–de Vries equation | No noise | 10 000 | u_{t} = −0.000 484u_{xxx} − 0.999 246uu_{x} | $0.0857%\xb10.0146%$ |

With noise | 18 750 | u_{t} = −0.000 474u_{xxx} − 0.974 795uu_{x} | $2.3186%\xb10.2856%$ | |

FitzHugh–Nagumo equation | No noise | $ 10000 ( u ) 10 000 ( w )$ | $ u t = \u2212 0.1989 u + 1.1991 u 2 \u2212 1.0010 u 3 + 1.0023 u x x \u2212 1.0005 \omega \omega t = \u2212 0.000 990 \omega + 0.001 998 u$ | $0.2952%\xb10.3502%$ |

With noise | $ 18 000 ( u ) 27 000 ( w )$ | $ u t = \u2212 0.1983 u + 1.1812 u 2 \u2212 0.9611 u 3 + 1.0025 u x x \u2212 0.9950 \omega \omega t = \u2212 0.001 000 \omega + 0.001 998 u$ | $1.0232%\xb11.3728%$ | |

Schrödinger’s equation | No noise | 10 000 | $ u t = i 0.3 u x x + i 3.3333 u | u | 2 \u2212 i 3.3333 u$ | $0.0011%\xb10.0007%$ |

With noise | 10 000 | $ u t = i 0.3013 u x x + i 3.2946 u | u | 2 \u2212 i 3.3186 u$ | $0.6788%\xb10.4183%$ | |

Navier–Stokes equation | No noise | 10 000 | $ \omega t ( x , t ) = 0.0101 \omega x x + 0.0100 \omega y y \u2212 1.0014 u \omega x \u2212 0.9914 v \omega y$ | $0.3830%\xb10.3781%$ |

With noise | 20 000 | $ \omega t ( x , t ) = 0.0101 \omega x x + 0.0097 \omega y y \u2212 0.9988 u \omega x \u2212 0.9996 v \omega y$ | $1.0306%\xb11.5734%$ |

PDE . | Noise case . | Points . | Identified PDE . | Mean(err)±STD(err) . |
---|---|---|---|---|

Sine-Gordon equation | No noise | 10 | u_{tt} = 0.9999u_{xx} − 0.9986sin (u) | $0.0732%\xb10.0893%$ |

With noise | 50 | u_{tt} = 0.9934u_{xx} − 0.9986sin (u) | $0.3999%\xb10.3738%$ | |

Korteweg–de Vries equation | No noise | 10 000 | u_{t} = −0.000 484u_{xxx} − 0.999 246uu_{x} | $0.0857%\xb10.0146%$ |

With noise | 18 750 | u_{t} = −0.000 474u_{xxx} − 0.974 795uu_{x} | $2.3186%\xb10.2856%$ | |

FitzHugh–Nagumo equation | No noise | $ 10000 ( u ) 10 000 ( w )$ | $ u t = \u2212 0.1989 u + 1.1991 u 2 \u2212 1.0010 u 3 + 1.0023 u x x \u2212 1.0005 \omega \omega t = \u2212 0.000 990 \omega + 0.001 998 u$ | $0.2952%\xb10.3502%$ |

With noise | $ 18 000 ( u ) 27 000 ( w )$ | $ u t = \u2212 0.1983 u + 1.1812 u 2 \u2212 0.9611 u 3 + 1.0025 u x x \u2212 0.9950 \omega \omega t = \u2212 0.001 000 \omega + 0.001 998 u$ | $1.0232%\xb11.3728%$ | |

Schrödinger’s equation | No noise | 10 000 | $ u t = i 0.3 u x x + i 3.3333 u | u | 2 \u2212 i 3.3333 u$ | $0.0011%\xb10.0007%$ |

With noise | 10 000 | $ u t = i 0.3013 u x x + i 3.2946 u | u | 2 \u2212 i 3.3186 u$ | $0.6788%\xb10.4183%$ | |

Navier–Stokes equation | No noise | 10 000 | $ \omega t ( x , t ) = 0.0101 \omega x x + 0.0100 \omega y y \u2212 1.0014 u \omega x \u2212 0.9914 v \omega y$ | $0.3830%\xb10.3781%$ |

With noise | 20 000 | $ \omega t ( x , t ) = 0.0101 \omega x x + 0.0097 \omega y y \u2212 0.9988 u \omega x \u2212 0.9996 v \omega y$ | $1.0306%\xb11.5734%$ |

### A. Navier–Stokes equation

^{36}

^{,}

^{56}In this example with the NS equations, the flow field is described by four field quantities, each of which needs to be measured for data.

^{56}to solve Eq. (4.1) with initial condition in $x\xd7y\u2208[0,2\pi ]\xd7[0,2\pi ]$,

The dictionary matrix, $\Phi $, consists of $m=60$ basis functions. The underlying dynamics is described with the three state variables, $u,v,w$. Thus, our dictionary of basis functions include the linear combinations of the variables, $u,v,\omega , u x, v x, u x x, v x x$. With snapshots, $U,V,W$, we use the explicit difference scheme to approximate the first- or second-order derivatives in the noise-free case. We see from the discovery results that our method has strong performance with the noise-free dataset, a small MSE, and a small STD. We enumerate the re-sampling region in Table III.

In the noisy case, we use the fourth-order compact Pade scheme with the $ S 3d$ method. We add $1%$ Gaussian noise to the original snapshot and the resulting RMSE is $Err=0.4823$ for $u$, $Err=0.4338$ for $v$, and $Err=1.7111$ for $w$. We apply the POD method used in Refs. 57 and 58 to de-noise our data. We reshape the snapshot $U$ into a $ 10 000\xd71001$ matrix, $ U \xaf$; we do the same shaping for $W$ and $V$. By projecting the data onto 33 POD modes obtained using a threshold of 99.99 fixed, we generate a new snapshot, $ U ~\u2208 R 10000 \xd7 1001$, from the matrix, $ U \xaf$: $ U ~=\Psi A$ with $\Psi $ consisting of 33 POD modes and the corresponding POD coefficients, $A$. Then, we reverse the process such that snapshots, $ U \xaf, V \xaf, W \xaf\u2208 R 100 \xd7 100 \xd7 1001$. The computed RMSE is $0.0056$ for $u$, $0.0051$ for $v$, and $0.0412$ for $w$. We pay special attention to the Pade scheme. With the de-noised dataset, we approximate the first-order derivatives in each grid point and then we compute the second-order derivatives using the Pade scheme. After we compute the dictionary matrix, we correctly discover the NS equation as shown in Table III.

### B. Comparison with SOTA methods on the prototypical examples from PDE-FIND^{24}

Example . | S^{3}d method
. | PDE-FIND method . | ||
---|---|---|---|---|

Identified results . | No. of samples . | MSE(err)-STD(err) . | No. of samples . | MSE(err)-STD(err) . |

Korteweg–de Vries (single-soliton) | ||||

$ u t = \u2212 4.9999 u x ( c = 5 ) u t = \u2212 0.9997 u x ( c = 1 )$ | 750 | $ 0.0028 % \xb1 0.0 % 0.0303 % \xb1 0.0 %$ | 12 800 | $ 0.3745 % \xb1 0.0 % 0.0820 % \xb1 0.0 %$ |

u_{t} = −6.0011uu_{x} − 1.0005u_{xxx} | 1500 | $0.0346%\xb10.0239%$ | 25 600 | $2.5931%\xb11.3601%$ |

Korteweg–de Vries (double-soliton) | ||||

u_{t} = −5.9909uu_{x} − 0.9995u_{xxx} | 9000 | $0.0995%\xb10.0742%$ | 102 912 | $0.9572%\xb10.2322%$ |

u_{t} = −5.9250uu_{x} − 0.9867u_{xxx} | 16 600 | $1.2894%\xb10.0548%$ | 85 432 | $7.4727%\xb14.9306%$ |

Burgers’ equation | ||||

u_{t} = −0.9999uu_{x} + 0.1000u_{xx} | 2000 | $0.0051%\xb10.0054%$ | 25 856 | $0.1595%\xb10.0608%$ |

u_{t} = −1.0011uu_{x} + 0.1001u_{xx} | 5000 | $0.0878%\xb10.0323%$ | 19 116 | $1.9655%\xb11.0000%$ |

Quantum harmonic oscillator | ||||

$ u t=0.5000i u x x\u22121.0000i x 2 2u$ | 2000 | $0.0090%\xb10.0082%$ | 205 312 | $0.2486%\xb10.0128%$ |

$ u t=0.4996i u x x\u22121.0005i x 2 2u$ | 2000 | $0.0693%\xb10.0269%$ | 187 452 | $9.6889%\xb16.9705%$ |

Nonlinear Schrödinger equation | ||||

u_{t} = 0.4998iu_{xx} + 1.0000i|u|^{2}u | 4500 | $0.0185%\xb10.0204%$ | 256 512 | $0.0473%\xb10.0147%$ |

u_{t} = 0.5034iu_{xx} + 0.9987i|u|^{2}u | 7200 | $0.4096%\xb10.3934%$ | 236 652 | $3.0546%\xb11.2193%$ |

Example . | S^{3}d method
. | PDE-FIND method . | ||
---|---|---|---|---|

Identified results . | No. of samples . | MSE(err)-STD(err) . | No. of samples . | MSE(err)-STD(err) . |

Korteweg–de Vries (single-soliton) | ||||

$ u t = \u2212 4.9999 u x ( c = 5 ) u t = \u2212 0.9997 u x ( c = 1 )$ | 750 | $ 0.0028 % \xb1 0.0 % 0.0303 % \xb1 0.0 %$ | 12 800 | $ 0.3745 % \xb1 0.0 % 0.0820 % \xb1 0.0 %$ |

u_{t} = −6.0011uu_{x} − 1.0005u_{xxx} | 1500 | $0.0346%\xb10.0239%$ | 25 600 | $2.5931%\xb11.3601%$ |

Korteweg–de Vries (double-soliton) | ||||

u_{t} = −5.9909uu_{x} − 0.9995u_{xxx} | 9000 | $0.0995%\xb10.0742%$ | 102 912 | $0.9572%\xb10.2322%$ |

u_{t} = −5.9250uu_{x} − 0.9867u_{xxx} | 16 600 | $1.2894%\xb10.0548%$ | 85 432 | $7.4727%\xb14.9306%$ |

Burgers’ equation | ||||

u_{t} = −0.9999uu_{x} + 0.1000u_{xx} | 2000 | $0.0051%\xb10.0054%$ | 25 856 | $0.1595%\xb10.0608%$ |

u_{t} = −1.0011uu_{x} + 0.1001u_{xx} | 5000 | $0.0878%\xb10.0323%$ | 19 116 | $1.9655%\xb11.0000%$ |

Quantum harmonic oscillator | ||||

$ u t=0.5000i u x x\u22121.0000i x 2 2u$ | 2000 | $0.0090%\xb10.0082%$ | 205 312 | $0.2486%\xb10.0128%$ |

$ u t=0.4996i u x x\u22121.0005i x 2 2u$ | 2000 | $0.0693%\xb10.0269%$ | 187 452 | $9.6889%\xb16.9705%$ |

Nonlinear Schrödinger equation | ||||

u_{t} = 0.4998iu_{xx} + 1.0000i|u|^{2}u | 4500 | $0.0185%\xb10.0204%$ | 256 512 | $0.0473%\xb10.0147%$ |

u_{t} = 0.5034iu_{xx} + 0.9987i|u|^{2}u | 7200 | $0.4096%\xb10.3934%$ | 236 652 | $3.0546%\xb11.2193%$ |

## V. CONCLUSION AND DISCUSSION

This work proposes a framework to discover PDEs from experimental observations and synthetic spatiotemporal data. The proposed $ S 3d$ method discovers the dynamics underlying a state of traveling-wave convection from experimental data, i.e., the simplest complex Ginzburg–Landau equations. Additionally, many canonical PDEs are also re-discovered from synthetic spatiotemporal data. The merit of the proposed method is its ability to freely construct a model class with candidate functions and to automatically select the key ones that reproduce the observed spatiotemporal patterns. Benefiting from sparsity, the inferred PDEs are parsimonious and accurate, enabling interpretability. In the various examples, we observe that we are additionally able to robustly handle different types of noise and the measurement data with small sample size. We expect the $ S 3d$ method to be useful for the modeling of spatiotemporal dynamics from experimental data. This framework, as demonstrated through numerous examples, could potentially accelerate the discovery of new laws and stimulate physical explanations for the discovered equations, which lead to the discovery of the underlying mechanisms.

Although the dynamics we refer to in this work are Eulerian dynamics described by PDEs, $ S 3d$, as a general method, is also able to infer ODEs and static functional relations using datasets in Refs. 24 and 59. $ S 3d$ unifies results for the discovery of natural laws.^{5,59} Future work will focus on three important aspects of extension of our method for a wider range of practical applications. First, the currently proposed method relies on estimating temporal and spatial derivatives of the measured state variables from data, while there are cases that the system variables are not necessarily observable. Second, the current method can reconstruct equations that are linear with respect to parameters. We will extend this proposed method to the cases that are nonlinear in parameters. Finally, we assume that the constructed library is sufficiently large to have a sparse representation of the dynamics underlying the dataset. However, when there is no any prior knowledge on the to-be-identified governing equation, it is possible that the constructed library is insufficient. In these cases, genetic symbolic regression works much better. We may integrate genetic symbolic regression and the sparse regression into a unified framework, which could overcome this limitation.

## ACKNOWLEDGMENTS

The authors acknowledge financial support from the National Natural Science Foundation of China under Grant Nos. 92167201 and 61903148.

We wish to thank Dr. Paul Kolodner for useful discussion and allowing us to use the experimental data.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Ye Yuan:** Conceptualization (lead); Data curation (equal); Formal analysis (lead); Funding acquisition (lead); Investigation (lead); Methodology (lead); Project administration (lead); Resources (equal); Software (equal); Supervision (lead); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead). **Xiuting Li:** Data curation (equal); Formal analysis (lead); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Writing – review & editing (equal). **Liang Li:** Formal analysis (equal); Investigation (equal); Software (equal); Validation (equal). **Frank Jiang:** Formal analysis (equal); Investigation (equal); Software (equal); Validation (equal). **Xiuchuan Tang:** Formal analysis (equal); Investigation (equal); Validation (equal). **Fumin Zhang:** Formal analysis (equal); Project administration (equal); Supervision (equal). **Jorge Goncalves:** Formal analysis (equal); Project administration (equal); Writing – review & editing (equal). **Henning U. Voss:** Data curation (equal); Investigation (equal); Supervision (equal). **Han Ding:** Funding acquisition (equal); Project administration (equal); Supervision (equal). **Jürgen Kurths:** Data curation (equal); Methodology (equal); Project administration (equal); Resources (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

All synthetic data and codes used in this paper are publicly available in GitHub at https://github.com/HAIRLAB/S3d, Ref. 60.

### APPENDIX: POLYNOMIAL APPROXIMATION

## REFERENCES

*System Identification: Theory for the User*

*Linear Stochastic Systems: A Geometric Approach to Modeling, Estimation and Identification*

*Statistical Learning with Sparsity: The Lasso and Generalizations*

*2012 IEEE 51st Annual Conference on Decision and Control (CDC)*(IEEE, 2012), pp. 2334–2339.

*Advances in Neural Information Processing Systems*(The MIT Press, 2002), pp. 383–389.

*Advances in Neural Information Processing Systems*(The MIT Press, 2006), pp. 1059–1066.

*Advances in Neural Information Processing Systems*(The MIT Press, 2002), pp. 697–704.

*Finite Difference Methods for Ordinary and Partial Differential Equations: Steady-State and Time-Dependent Problems*

*Computational Partial Differential Equations Using MATLAB*

*Advances in Neural Information Processing Systems*(The MIT Press, 2008), pp. 1625–1632.

*Lectures in Applied Mathematics*(American Mathematical Society, Providence, RI, 1974), Vol. 15, p. 237.

*Proceedings of the NATO Advanced Research Workshop on Nonlinear Evolution of Spatio-temporal Structures in Dissipative Continuous Systems*(

*Computational Science and Engineering*

*Computational Science and Engineering*

*et al*. (

*Github.*