This paper investigates how models of spatiotemporal dynamics in the form of nonlinear partial differential equations can be identified directly from noisy data using a combination of sparse regression and weak formulation. Using the 4th-order Kuramoto-Sivashinsky equation for illustration, we show how this approach can be optimized in the limits of low and high noise, achieving accuracy that is orders of magnitude better than what existing techniques allow. In particular, we derive the scaling relation between the accuracy of the model, the parameters of the weak formulation, and the properties of the data, such as its spatial and temporal resolution and the level of noise.
In recent years, data-driven discovery of mathematical models of spatially extended systems described by nonlinear partial differential equations (PDEs) has emerged as a promising alternative to more traditional modeling approaches. Existing approaches to model discovery such as sparse regression have several major weaknesses, however. Most notably, they break down for data with high levels of noise and have to be tuned empirically to produce meaningful results, making them ill-suited for analyzing experimental data. We show how these weaknesses can be addressed using a weak formulation of the model PDE. The weak formulation has substantial freedom that makes it extremely powerful and flexible, but the question arises of how this freedom can be used to robustly obtain the most accurate model. This question is addressed here for the first time.
I. INTRODUCTION
Partial differential equations (PDEs) provide a natural description for the temporal evolution of spatially extended systems in various fields of science and engineering. Historically and practically important examples include wave equations arising in many areas of physics, the Schrödinger equation in quantum mechanics, the Navier-Stokes equations in fluid dynamics, and reaction-diffusion equations used to model physical, chemical, or biological systems. In the past, models of such systems were almost always constructed from first principles or using a suitable empirical approach. However, in recent years, a data-driven paradigm for learning the dynamics has emerged, which leverages the modern prevalence of data and computational power to create models when the underlying governing laws have eluded first-principles derivation.
Many indirect methods for learning the dynamics that do not require a PDE have been proposed. Notable examples include equation-free modeling,1 artificial neural networks,2–4 dynamic mode decomposition5 and Koopman operator approaches,6 balanced truncation,7 and resolvent-based analysis.8 While these techniques can provide an economical approximate description of the dynamics, this is done at the cost of losing the mathematical structure that affords physical intuition or interpretability. Symbolic regression, which was originally used to derive nonlinear ordinary differential equations describing low-dimensional systems,9,10 offers an enticing alternative by allowing construction of exact models and discovery of conservation laws. The genetic algorithms used in these earlier studies are however computationally expensive, preventing application of this approach to high-dimensional systems. Thus, the recent emergence of a sparse regression approach for model discovery11–15 has made a significant impact. Applied to spatially extended systems, this approach allows data-driven discovery of governing equations in the form of PDEs by evaluating a library of candidate terms containing partial derivatives at a large number of points and using a regularized regression procedure to compute the coefficients of each term and select a parsimonious model.
Sparse regression has proven computationally efficient and capable of reconstructing numerous canonical PDEs,14,16 but it faces serious difficulties when used for analysis of experimental data. One complication is that the proper choice of parsimonious model is often unclear. In many implementations, it relies on a manual Pareto analysis to balance model accuracy and complexity13 or on an automatic but complex thresholding procedure (e.g., sequential threshold ridge regression14) that tends to be sensitive to the choice of parameters. More importantly, existing sparse regression methods often suffer from low accuracy even in the absence of noise and completely break down at noise levels characteristic of realistic applications. This is because they inherently require explicit numerical evaluation of partial derivatives of the data, which is a notably ill-conditioned problem.
In this paper, we present a weak formulation of the sparse regression problem that eliminates this fundamental issue. We also suggest a simple thresholding procedure that can always identify the correct form of the governing PDE even in the presence of extremely high noise. Finally, we explore how this extremely flexible and robust approach can be optimized and tuned to the properties of the underlying data set to maximize accuracy. This paper has the following structure. Section II describes our approach and the system used to test it, and our results are presented in Sec. III. Sections IV and V contain the discussion and conclusions.
II. METHODS
We consider the problem of using the data to identify a parsimonious mathematical model in the form of a PDE
where each term in the sum is a function of and its partial derivatives in space and time with constant coefficients . In most applications, the form of the basis functions can be restricted based on physical considerations, such as symmetries, conservation laws, etc.15,17 Typically, are taken to be products of powers of independent variables (, ) and dependent variables ( and its various derivatives), although the form can be arbitrary in theory. Generally, in order to allow for a flexible model form, one should start with sufficiently large. Our goal is to determine the coefficients for the terms that should be present in the model while eliminating the dynamically insignificant and thus likely spurious terms, yielding a sparse solution where, with few exceptions, .
Sparse regression aims to convert the PDE (1) to a tractable (and ideally, robust) linear algebra problem. Conventionally, this is done by evaluating all of the terms in the PDE at a random collection of points using finite differences,17,18 spectral methods,11,12 or polynomial approximation.14,15 All of these approaches are extremely sensitive to noise, especially when high-order derivatives are present. We will instead pursue a weak formulation of the problem that can be obtained by multiplying (1) by a weight and then integrating the result over a domain . Repeating the process for distinct combinations of weight functions and integration domains yields the linear system
where and is a “library” matrix, with each column consisting of the integrals of the function with all combinations of weights and domains .
Note that there is an extra degree of freedom in (2) corresponding to the normalization of . Conventionally this is dealt with by assuming that , setting , and solving the overdetermined system that corresponds to the choice using least squares or some regularized version of it.14 This is however not always a valid assumption: it is usually unknown a priori whether any given temporal derivative should be included in the PDE at all, whereas in this case a particular term is forced into the model. Moreover, even if this term should be present in the model, the regression effectively assumes that the time derivative was computed without error, which reduces the practical accuracy of the procedure.
We will, therefore, not make the assumption that the model has the form of an evolution equation and consider the linear problem (2) in its most general form. The normalization of can be fixed by adding an extra row with arbitrary nonzero elements to , after which the resulting Eq. (2) can be solved by ordinary least squares. A more elegant solution pursued in the present study is to instead compute as the right singular vector of corresponding to the smallest singular value. Note that this corresponds to the solution of a constrained least squares problem for ,
Once a suitable solution has been obtained by further constraining the problem, the resulting parsimonious model can be rewritten in the form of an evolution equation by solving for a term such as (or for a wave equation).
To obtain a parsimonious model, we employ an iterative procedure to eliminate unnecessary terms from (1). At each step , singular value decomposition is used to obtain the solution given the matrix , and the residual is computed. We then find the term with the smallest and construct by eliminating the column from . The corresponding term is eliminated from the model if , where is some fixed constant (we use in the present study). The iteration terminates at step if , yielding a parsimonious model. We find that this method compares favorably to alternatives such as sequential threshold ridge regression14 as it robustly eliminates spurious terms without requiring extremely careful choice of parameters. Moreover, the sparsification parameter has a simple interpretation: is the maximum acceptable relative increase in the residual resulting from discarding a single library term.
We illustrate the advantages of our approach by applying it to the Kuramoto-Sivashinsky equation,19,20
which has posed a significant challenge in past studies of sparse regression11,14 because it contains a fourth-order partial derivative that is difficult to evaluate numerically with adequate accuracy. Here are all constants, although our approach can easily be extended even to the case when these coefficients are functions of time and/or space, as discussed below. Since this is a scalar equation in one spatial and one temporal dimension, we use scalar weight functions . If we denote the terms in the model (4) by , then
where . The key feature of the weak formulation is that it can almost always be used to completely eliminate, or at least reduce the order of, the derivatives acting on the noisy data by integrating by parts. In our particular case,
under the assumption that and its first three partial derivatives with respect to vanish on the boundary . In our implementation, we use the composite trapezoidal rule to evaluate the integrals numerically.
Note that although this particular PDE features constant coefficients, terms with variable coefficients can be treated in a similar manner. For instance, suppose that the coefficient of the term is a function of and that can be expanded in some (finite) basis as
with some constants . Then,
where
Sparse regression for a model including such a term would then simply require expanding the library to include additional columns with entries . In this case as well, no derivatives of the noisy are used in finding the elements of .
Although in principle integration domains of any shape can be used, here we will only consider rectangular domains of a fixed size,
where the centers of the rectangles are chosen randomly. Similarly, there are many possible choices for the weight functions satisfying the boundary conditions on ; we focus on functions of the form
where , are nondimensionalized independent variables and , , , and are integers. Note that there are four weight functions (corresponding to the four different choices of the signs in the exponentials) for each pair of nonzero and . The integrals are all of the form
where
so are the coefficients of the two-dimensional Fourier series for . Although is not periodic on , the functions are. Moreover, assuming is smooth enough that, inside , has continuous derivatives in space and continuous derivatives in time, then the function does as well. The Fourier coefficients then decay according to . The powers and , therefore, control the width of the Fourier spectrum of the entries in the library , while the choice of and allows us to tune the frequencies of the weights to the spectral properties of the data. The convergence rate of Fourier series turns out to control the accuracy with which the integrals are evaluated using data that are available only on a discrete grid. For simplicity, we will assume that the same weight functions are integrated on every domain. It is possible to use either weight functions involving only a single pair of frequencies (e.g., and ) or a range of frequencies in space and/or time.
To test our sparse regression approach, we computed a solution of the Kuramoto-Sivashinsky equation, using the integrator described in Ref. 14 to generate data on a physical domain with dimensions and . The numerical integration generated data with spatial resolution using a time step , which was then downsampled to a lower spatial resolution and temporal resolution . Unless noted otherwise, the results presented below are for and . For reference, the solution has a correlation length and correlation time . To test the effects of noise, uncorrelated Gaussian noise with standard deviation was added to the data for various choices of , where is the sample standard deviation of on the whole domain. (We verified that the results do not change if the uncorrelated noise is sampled from a different distribution with zero mean and the same standard deviation.)
To test the ability of the algorithm to eliminate spurious terms, in addition to the terms present in the Kuramoto-Sivashinsky equation (4), we also included terms , , , , , and (which represents a hypothetical forcing) in our library. The corresponding integrals were rewritten using integration by parts to remove derivatives acting on , as described previously. In Sec. III, we quantify the performance of our sparse regression approach using two key metrics: how well the algorithm can discriminate between the essential and spurious terms and how accurately it can determine the coefficients of the essential terms. Since the data were generated using a known model, we know which terms are essential [those contained in the PDE (4)]. If the reference model is unavailable, ensemble regression15 may be used instead to help distinguish essential terms from spurious ones.
III. RESULTS
As discussed previously, the elements of the library matrix are given by the Fourier coefficients of the different terms included in the generalized model [windowed by the envelope on each domain ]; hence knowledge of the Fourier spectrum of the data is crucial for an optimal choice of the size of the integration domains and the weight functions . The power spectrum (or, more precisely, the absolute value of the Fourier coefficients) of the noiseless data on the entire physical domain is shown in Fig. 1. In space, the spectrum is sharply peaked around a wave number . At high wave numbers, the spectrum decays exponentially, where . In time, the spectrum is peaked at zero frequency and decays as a power law, with .
The power spectrum over (a) space and (b) time, normalized so that the maximum is 1. The black dots show the spectrum of the original data. The symbols correspond to the spectra of the windowed data multiplied by envelopes (with different choices of or ) on a “typical” integration domain [i.e., averaged over 1000 uniformly distributed choices of ]. The spatial and temporal frequencies correspond to and for the windowed data.
The power spectrum over (a) space and (b) time, normalized so that the maximum is 1. The black dots show the spectrum of the original data. The symbols correspond to the spectra of the windowed data multiplied by envelopes (with different choices of or ) on a “typical” integration domain [i.e., averaged over 1000 uniformly distributed choices of ]. The spatial and temporal frequencies correspond to and for the windowed data.
Having characterized the data, we turn to the investigation of how the performance of our algorithm depends on the choice of various parameters. Since the number of parameters is quite large, instead of exploring the entire parameter space, we focus on the dependence on one or two parameters at a time, with the remaining parameters staying fixed. Specifically, the noise level is fixed to 3% and we use the following near-optimal parameters in the sparse regression. The dimensions of the integration domain are and . This choice corresponds to an equal number of grid points in both directions, . Unless noted otherwise, we use a single set of weights with , , , and the sparsification parameter is . We generally use every combination of 4 weight functions over 50 integration domains, so that the total number of library rows is . To characterize the stochastic effects, for each set of parameters, we used an ensemble of trials featuring different random distributions of the integration domains and realizations of noise.
First, we tested the ability of the method to reconstruct the correct form of the PDE (4) for various values of with all other parameters fixed at their near-optimal values. Our iterative regression procedure proved very robust for a fairly wide range of values of . In particular, at a noise level of 30%, it performed perfectly for , with the reconstructed model containing no missing or spurious terms in all of the trials. For the highest noise level considered here (100%), we found perfect performance for . In some fraction of the trials, spurious terms appeared at lower and missing terms at higher , as shown in Fig. 2. For reference, without the benefit of the weak formulation, sparse regression failed14 to correctly reconstruct the lambda-omega reaction-diffusion system, which is only second-order, for noise level as low as 1%.
Fraction of identified models with spurious (circles) or missing (squares) terms at maximum noise level () as a function of the sparsification parameter .
Fraction of identified models with spurious (circles) or missing (squares) terms at maximum noise level () as a function of the sparsification parameter .
The accuracy of regression (i.e., model identification) was quantified by computing the relative error in each parameter of the Kuramoto-Sivashinsky equation
where and are the true and estimated values of the model parameters, respectively, for . We normalize the estimated parameters so that . In all of the following figures, we plot the estimated mean value of with 95% error bars, where all of the parameters in the regression procedure are held at their near-optimal values stated previously unless noted otherwise.
In particular, Fig. 3 shows the accuracy of regression as a function of for two different choices of data resolution ( and ). It is worth noting that the average relative error is for all of the parameters for noiseless data with the higher of the two resolutions. However, even for noise, , which is more than three orders of magnitude smaller than what had been achieved in previous studies.14 The results are very similar for all three parameters; as this is generally the case, in subsequent figures, we only show the generally largest error , which corresponds to the term involving the highest-order derivative. We find two distinct regimes. At higher noise levels, the error in evaluating the library matrix entries is due primarily to the averaged effect of noise. Applying the central limit theorem, we find that the relative error scales as
Parameter errors as a function of noise level. The circles, triangles, and squares correspond to respectively, and the empty and filled symbols indicate results for data with double resolution () and half resolution (), respectively. The dashed lines show the predicted scaling.
Parameter errors as a function of noise level. The circles, triangles, and squares correspond to respectively, and the empty and filled symbols indicate results for data with double resolution () and half resolution (), respectively. The dashed lines show the predicted scaling.
At low noise levels, the parameter accuracy is controlled by numerical error, which has two different sources. The first source is a numerical error in the data itself, which is due to the finite accuracy of the integrator that “solves” the Kuramoto-Sivashinsky equation. This source dominates for smaller and . For experimental data, this source would correspond to systematic error. For larger and , the parameter inaccuracy is mainly due to the error in computing the library matrix entries based on data that are available on a discrete grid. Suppose we want to use numerical quadratures to evaluate an integral
where (i.e., has continuous derivatives) and for all . Then, for the composite trapezoidal rule on a grid with spacing , the relative error associated with the discretization can be estimated using exact Euler-Maclaurin formulas21 and is found to scale as for even (or for odd), where a characteristic value of the derivative on the interval is used.
Generalizing this result to two dimensions [and assuming , ], we find an estimate of the relative discretization error for an element of the library matrix that involves a temporal derivative of order and/or spatial derivative of order :
where and . It is easy to check that, due to the conditions on and , we always have , as it should be for the trapezoidal rule. The Kuramoto-Sivashinsky equation features terms that all involve derivatives, with the lowest order being one and the highest being four; hence, for even , the exponent ranges between and . Therefore, the scaling
dominates for lower , while the scaling
dominates for higher .
The error can be found using perturbation theory. Let be the library matrix evaluated using a continuous noiseless solution so that exactly (we assume that corresponds to the parsimonious model). In the presence of measurement noise and/or discretization error, the error in evaluating each entry of the library matrix is proportional to , so
for some matrix whose entries are distributed as white Gaussian noise. Note that the entries of are . The entries of have a more complicated scaling that is determined by the Fourier spectrum of the data (i.e., exponential in space, power law in time). Specifically, we find (cf. Fig. 4)
where and are some positive constants. To leading order in , the least squares solution to (2) is given by
where is the Moore-Penrose pseudoinverse of . Since the elements of can be considered uncorrelated, we have for and
where the numerator and denominator describe the scaling of the entries of and , respectively. Following from (21),
with some positive constants and . For low , we have and, therefore, is independent of . For high , we have , so combining (23) and (15) we find
The predicted scaling of with in both regimes is consistent with the results shown in Fig. 3. In particular, we find that the effect of changing the resolution of the data is quite minor at high , where according to (25). At low , the effect is much stronger: for , we have according to (17). The dependence of the scaling in (17) on and is further confirmed by Fig. 5, which shows results for noiseless data. In the case, we observe the scaling law corresponding to (18) in the entire range of we examined. When , the parameter error scales according to for small and for large , which correspond to the limiting cases (18) and (19), respectively. We should also note that for as large as , the accuracy remains very good. Thus, the method is suitable for fairly sparse data.
Scaling of the first four columns of the library with the size of the integration domain in the (a) spatial and (b) temporal directions. The columns correspond to (squares), (circles), (triangles), and (diamonds).
Scaling of the first four columns of the library with the size of the integration domain in the (a) spatial and (b) temporal directions. The columns correspond to (squares), (circles), (triangles), and (diamonds).
Parameter error as a function of the resolution of noiseless data for (circles) and (squares). The dashed lines show the predicted scaling.
Parameter error as a function of the resolution of noiseless data for (circles) and (squares). The dashed lines show the predicted scaling.
As illustrated in Fig. 6, we also observe the scaling for with predicted by (25). This scaling is expected to break down when the total area of the integration domains exceeds the area of the physical domain due to the loss of statistical independence between the data on different integration domains, leading to an increased linear dependence of the rows of the library matrix . We can expect the error to asymptote to
for , where
is the area ratio. For the reference set of parameters, saturation did not occur over the range of we tested. To more easily observe the saturation effect, we set , so that only one weight function is used and the number of integration domains equals (rather than for nonzero and ). Furthermore, we reduce the size of the physical domain to and , so that is relatively small. As Fig. 6 illustrates, for large , the parameter accuracy indeed asymptotes to a constant.
Parameter error as a function of the number of library rows . Only the weight function is used and the physical domain size is reduced to , . The dashed lines show the predicted scaling.
Parameter error as a function of the number of library rows . Only the weight function is used and the physical domain size is reduced to , . The dashed lines show the predicted scaling.
The scaling described by (26) can also be observed in the dependence of on the size of the physical domain (and hence ) with all other parameters fixed. This dependence is quite important, since it determines how much data need to be collected to identify the model with meaningful precision. As Fig. 7 illustrates, choosing the physical domain to be just double the size of the (optimal) integration domain in both directions (which corresponds to ) already yields a rather acceptable accuracy when only one weight function is used. When and are nonzero, accurate reconstruction is possible even if is only slightly greater than .
Parameter error as a function of for and . Squares correspond to fixing and varying from to . Circles correspond to fixing and varying from to . The dashed line shows the predicted scaling.
Parameter error as a function of for and . Squares correspond to fixing and varying from to . Circles correspond to fixing and varying from to . The dashed line shows the predicted scaling.
Next, we consider how the error in the estimated coefficients depends on the choice of the integration domain size. Figure 8 shows the dependence of the error on the size of the integration domains for two different choices of the weight functions. In panel (a), is fixed to 75 and is taken to vary, and in panel (b), is fixed to 14.73 with varying. In both cases, we find that there is an optimal domain size with and ; moreover, the optimal values remain approximately the same even if we vary the size of the other dimension or the choice of weight functions. For small and/or , the error is large because (a) the integration domain is too small to effectively average out the influence of noise and (b) the numerical quadrature error is large (both and increase as and/or decrease). For large and , we enter the regime described by (23), which predicts that the error should grow exponentially in and as a power of . Indeed, this is exactly what we observe in Fig. 8. Based on (21), it appears that the optimal choice of and corresponds to the crossover between these two regimes, i.e., and . Our numerical results suggest that the optimal choice corresponds to .
Parameter error as a function of the (a) spatial and (b) temporal dimensions of integration domains when only the weight function is used (squares) and for the optimal choice of and (circles).
Parameter error as a function of the (a) spatial and (b) temporal dimensions of integration domains when only the weight function is used (squares) and for the optimal choice of and (circles).
Finally, let us address the optimal choice of frequencies appearing in the weight functions (11). Figure 9 shows the effect of varying either or with all other parameters fixed at their reference values. Specifically, we plot vs and . (Note that when or is 0, the number of distinct weight functions is halved, so we correspondingly double the number of integration domains to keep the number of rows in the library constant.) One could assume that the optimal values would be given by the dominant frequencies of the original data (as we discussed previously, the dominant wave number is and the dominant temporal frequency is ). According to Fig. 1, windowing the data broadens the peaks but leaves both dominant frequencies roughly the same: () and (). Unfortunately, it turns out that we cannot use the spectra to exactly predict the optimal frequencies, which are () and ( or ). However, choosing the frequencies based on the spectra still produces reasonably good accuracy (within a factor of or so of the optimal result).
Parameter error as a function of (a) the wave number and (b) the frequency .
These results suggest that using weight functions with a combination of different frequencies may be more robust and/or accurate. To test this hypothesis, we considered the case in which weight functions with a range of frequencies in space or time were included, with the total number of library rows fixed at . However, this approach yielded a decrease in the accuracy, as the broader choice of weight functions did not compensate for a decrease in the number of integration domains. This suggests that the optimal strategy is to use a large number of integration domains while keeping the frequencies of the weight functions fixed.
IV. DISCUSSION
It is worth considering what modifications may need to be made to reconstruct models other than the Kuramoto-Sivashinsky equation. The choice of the weight functions plays a critical role in determining the accuracy of the model identification. Recall that, under the assumption that the data itself is sufficiently smooth, the behavior of at the boundary of the integration domains determines the accuracy with which the elements of the library matrix are evaluated using numerical quadratures. Our choice of the form of (11) was influenced primarily by the order of the partial derivatives appearing in the model, which determine the lowest possible values of the exponents and . Indeed, weight functions of the form (11), with appropriate choices of the exponents, were found to work well for a variety of PDE models.
A more interesting question arises for models admitting nonsmooth solutions, such as Burgers’ equation, which develops shocks. When the data or their derivatives are discontinuous, the scaling (17) that we have derived does not hold for integrals over integration domains that contain the discontinuity (for instance, if is discontinuous, then ). To preserve the high accuracy associated with the large scaling exponents , we have two choices. The first option is to restrict the sizes and positions of the integration domains such that they never include the discontinuities. In this case, we can still choose weight functions of the form (11). The second option is to modify the weight functions such that the smoothness of the products (13) is maintained across the discontinuity. For instance, if the position of the discontinuity can be parametrized by the equation , then, for the integration domains containing the discontinuity, one may choose weight functions of the form
where is sufficiently large.
V. CONCLUSIONS
We have introduced a robust and flexible approach to data-driven discovery of models in the form of nonlinear PDEs. The approach uses a weak formulation, coupled with a novel sparse regression procedure, to obtain a parsimonious description. We have demonstrated its capability to identify PDEs, even with high-order derivatives, from extremely noisy data with unprecedented accuracy. For instance, with 1% noise, we were able to reduce the error in estimating the parameters of the 4th-order Kuramoto-Sivashinsky equation from 50%14 to just . Furthermore, whereas correct identification of the functional form of the underlying PDE has been far from guaranteed at any noise level using past approaches, our algorithm was able to reconstruct the Kuramoto-Sivashinsky equation accurately in of cases from data with a signal-to-noise ratio of 100%.
This impressive performance is achieved by shifting the partial derivatives from the data onto a known smooth weight function using integration by parts, thus avoiding the large errors incurred by repeated numerical differentiation. Our method also proved to be well-adapted to sparse data, maintaining errors of less than for a grid resolution only times finer than the correlation length/time. Such reliability and high accuracy in the presence of noisy or sparse data is indispensable for analysis of experimental data. Notably, even in the absence of noise, our results compare very favorably with those of previous studies11,14 because the discretization error of the algorithm can be made extremely small: for the Kuramoto-Sivashinsky equation, the relative error in all parameters can easily be reduced to . It is also important to mention that the computational cost of our algorithm is comparable to that of existing sparse regression methods.
We also derived the scaling laws that describe the accuracy of the regression as a function of the parameters used in the algorithm and the properties of the data. These scaling laws can be used to fully exploit the flexibility of the weak formulation approach by tuning its various parameters. In particular, the size of the input used by the regression can be controlled by choosing both the number of different integration domains and the number of different weight functions. We have shown that the number of integration domains plays a much more important role than the number of weight functions: the best results can be obtained by using a set of weight functions with a fixed shape (frequency and envelope) and a large number of integration domains. Furthermore, we have determined the optimal shape of the weights and the optimal size of the integration domains. The latter turned out to be determined by the correlation length and time describing the data (with the size roughly an order of magnitude larger than these characteristic scales). We have also shown that, although the error can be reduced further by using data on ever-larger physical domains, satisfactory results can be obtained for physical domains that are just a factor of two larger than the optimal integration domain in each dimension.
ACKNOWLEDGMENTS
This material is based upon work supported by the National Science Foundation (NSF) under Grant No. CMMI-1725587. D.G. gratefully acknowledges the support of the Letson Undergraduate Research Scholarship.
The reference data and the codes used in this study are freely accessible from the GitHub repository at https://github.com/pakreinbold/PDE_Discovery_Weak_Formulation.