Numerous physics theories are rooted in partial differential equations (PDEs). However, the increasingly intricate physics equations, especially those that lack analytic solutions or closed forms, have impeded the further development of physics. Computationally solving PDEs by classic numerical approaches suffers from the trade-off between accuracy and efficiency and is not applicable to the empirical data generated by unknown latent PDEs. To overcome this challenge, we present KoopmanLab, an efficient module of the Koopman neural operator (KNO) family, for learning PDEs without analytic solutions or closed forms. Our module consists of multiple variants of the KNO, a kind of mesh-independent neural-network-based PDE solvers developed following the dynamic system theory. The compact variants of KNO can accurately solve PDEs with small model sizes, while the large variants of KNO are more competitive in predicting highly complicated dynamic systems govern by unknown, high-dimensional, and non-linear PDEs. All variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier–Stokes equation and the Bateman–Burgers equation in fluid mechanics) and ERA5 (i.e., one of the largest high-resolution global-scale climate datasets in earth physics). These demonstrations suggest the potential of KoopmanLab to be a fundamental tool in diverse physics studies related to equations or dynamic systems.
I. INTRODUCTION
A. The rising of partial differential equation solvers
The absence of analytic solutions of various important PDEs in science and engineering naturally calls for the rapid development of computational solvers, which attempt to approximate a parametric counterpart Fθ ≃ F parameterized by θ to derive solution .1,3,4 To date, the joint efforts of physics, mathematics, and computer science have given birth to two mainstream families of PDE solvers:5
The first family of solvers are classic numerical ones. Typical instances of these solvers include finite element (FEM),6 finite difference (FDM),7 and finite volume (FVM)8 methods. In general, these methods discretize space and time domains following specific mesh designs and solve parameterized PDEs on meshes by certain iterative algorithms. Specifically, FEM subdivides the original domain into a set of sub-domains defined by a collection of element equations and recombines these element equations to derive the global solution.6 FDM approximates derivatives as finite differences measured on local values.7 FVM transforms the original problem into a series of surface flux calculations on local volumes.8
The second family of solvers are neural-network-based ones. With a pursuit of accelerating PDE solving and improving the applicability on real data, three kinds of neural-network-based solvers have been proposed:
One kind of solvers discretize domains D and T into x and y meshes and approximate a finite-dimensional and mesh-dependent solution operator Fθ by a parameterized neural network between finite Euclidean spaces, i.e., (e.g., see Refs. 9–11). Given an arbitrary input , the trained neural network can function as a solution operator to predict for a certain time difference τ.
Another kind of solvers directly parameterize equation solution as a neural network, i.e., (e.g., see Refs. 12–15). These solvers are mesh-independent and accurate in learning a given PDE because they can directly transform arbitrary domain and parameter setting to target equation solution .
The last kind of solvers, including neural operators, attempt to parameterize a mesh-dependent and infinite-dimensional solution operator with neural networks, i.e., Fθ: Φ × Θ → Γ (e.g., see Refs. 5 and 16–21). These mesh-independent solvers can be flexibly implemented on different discretization schemes and only need to be trained once for a given PDE family. The equation solution of different instances of the PDE family can be generated by a computationally reusable forward pass of the network,5,19 which can be further accelerated by fast Fourier transform.19 Representative demonstrations of this kind of solver are Fourier neural operator19 and its variants (e.g., adaptive Fourier neural operator22 and FourCastNet23,24). These frameworks not only solve PDEs with known expressions but also be able to predict complex dynamic systems governed by unknown PDEs on real datasets (e.g., climate system23,24).
B. The limitation of previous partial differential equation solvers
Although substantial progress has been accomplished by existing PDE solvers from various perspectives, there remain critical challenges in this booming direction.
In practice, the mesh-dependent property of classic numerical solvers has implied an inevitable trade-off between computation accuracy and efficiency, i.e., fine-grained meshes ensure accuracy yet coarse-grained meshes are favorable for efficiency.19,25 However, in many cases, the applications of PDE solving (e.g., numerical weather forecasting26,27) require timely and accurate computation. To ensure accuracy and speed, every single time of computation in the downstream applications supported by classic numerical solvers frequently costs large amounts of computing resources. In cases with limited computing power, a significant time delay may occur. Moreover, all numerical solvers require the explicit definitions of target PDEs as a priori knowledge and are less applicable to predict real data generated by unknown PDEs.19
C. Our contributions to partial differential equation solvers
In this paper, we build on Koopman neural operator (KNO), one of our latest works,31 to develop an efficient module of PDE solving and overcome the limitation in characterizing the long-term behaviors of complicated PDE families. As a study on computational physics programs, our research has the following contributions compared with our previous work.31
First, we generalize the original KNO to four kinds of variants. Beyond the original KNO, these differentiated variants offer more possibilities for data-specific and task-oriented solver designs. Specifically, the compact variants of KNO realized by multi-layer perceptrons (MLP) and convolutional neural networks (CNN) can accurately solve PDE with small model sizes. The large variants of KNO implemented on visual transformers can predict highly intricate dynamic systems governed by unknown, high-dimensional, and non-linear PDEs (e.g., climate system).
Second, we propose KoopmanLab, a PyTorch module of Koopman neural operator family, as a self-contained and user-friendly platform for PDE solving. All necessary tools, such as those for data loading, model construction, parameter manipulation, output visualization, and performance quantification, are offered in a user-friendly manner to support customized applications.
Third, we offer comprehensive validation of the proposed module on representative datasets, including those generated by important PDEs in fluid mechanics (e.g., the Navier–Stokes equation and the Bateman–Burgers equation) or obtained by global meteorological recording research (e.g., atmospheric, land, and oceanic climate fields in ERA5 dataset).32 By measuring accuracy, quantifying efficiency, and comparing all KNO variants with other state-of-the-art alternatives (e.g., Fourier neural operator19 and FourCastNet23,24), we suggest the potential of our module to serve as an ideal choice of PDE solving and dynamic system prediction.
II. THE INITIAL VERSION OF KOOPMAN NEURAL OPERATOR
A. The original Koopman neural operator: Objective
B. The original Koopman neural operator: Mathematics
There exist numerous previous studies that pursue to characterize the Koopman operator by machine-learning-based approaches. Some approaches are highly practical but limited to the autonomous case [e.g., the case in Eq. (12)].45–48 Other approaches are more general in application but critically depend on a priori knowledge about the eigenvalue spectrum (e.g., the numbers of real and complex eigenvalues) of Koopman operator to deal with the continuous spectrum problem.49
In practice, a balance should be reached between mathematical completeness and computational practicality. An ideal Koopman-operator-based PDE solver should fit in with both autonomous and non-autonomous cases and limit the dependence of a priori knowledge as much as possible (even though these restraints inevitably reduce mathematical completeness). To explore such a balance, we introduce the Koopman neural operator (KNO), a flexible approach, in our previous work.31
As for the case where the Koopman operator has a continuous spectrum (e.g., has no eigenvalue), there is no finite invariant subspace to support computational approximation. Such an ill-posed situation remains for future exploration.
C. The original Koopman neural operator: Convergence
D. The original Koopman neural operator: Computation
In Ref. 31, we have proposed an architecture to implement the original Koopman neural operator on neural networks. The details of architecture designs are presented below:
- Part 1: Observation. An encoder [e.g., a single non-linear layer with activation function in the original Koopman neural operator] serves as observation function to transform , an arbitrary input of the PDE family (e.g., ϕt can be directly set as γt), into ,See Fig. 1 for illustrations.(52)
- Part 2: Fourier transform. Similar to the Fourier neural operator,19 the original Koopman neural operator also utilizes the Fourier transform during the iterative update of the Green function in Eq. (5). Given , we derive the Fourier transform , where we truncate the Fourier series at ω, a maximum frequencyNote that denotes the indicator function [i.e., if a ∈ A, otherwise ]. Computationally, the above transform is implemented by fast Fourier transform. For convenience, we mark(53)as the transformed result of . Different from Ref. 19, our main motivation for using the truncated Fourier transform is to extract the low-frequency information (i.e., main system components) of the represented equation solution . Certainly, frequency truncation inevitably causes the loss of high-frequency information (i.e., high-frequency perturbations or edges). In the original Koopman neural operator, Part 5 is designed to complement the lost information associated with high-frequency. See Fig. 1 for more details.(54)
- Part 3: Hankel representation and offline Koopman operator. Once we have derived for every , a Hankel matrix of will be generated following , a dimension of delay-embedding (note that is the number of all accessible samples)We train a o × o linear layer to learn a neural network representation of Koopman operator following Eqs. (32) and (33), where is spanned by . The learned can be used to predict the future state of as(55)where notion T denotes the transpose of a matrix. See Fig. 1 for illustrations.(56)
- Part 4: Inverse Fourier transform. After is predicted in Part 3, it is transformed from the Fourier space to by an inverse Fourier transformwhere . For convenience, we mark(57)See Fig. 1 for instances of Part 4.(58)
- Part 5: High-frequency information complement. In the original Koopman neural operator, we use a convolutional layer to extract high-frequency of because convolutional layers can amplify high-frequency components according to Fourier analysis.57 Therefore, we train a convolutional layer C on the outputs of Part 1 to extract their high-frequency information. As a complement of Parts 2–4, the convolutional layer realizes a forward prediction of high-frequency informationSee Fig. 1 for illustrations.(59)
- Part 6: Inverse observation. Once two future states, and , are predicted by Parts 2–4 and Part 5, they are unified in a linear mannerGiven , a non-linear decoder [e.g., a single non-linear layer with activation function in the original neural operator] is trained to approximate the inverse of observation function(60)and derive(61)as the target state of equation solution in space . See Fig. 1 for illustrations.(62)
The one-unit architecture of the original Koopman neural operator is visualized in Fig. 1. A multi-unit architecture can be readily constructed by cascading the copy of Parts 2–5 multiple times.
III. THE KOOPMAN NEURAL OPERATOR FAMILY
Beyond the original Koopman neural operator (KNO),31 we generalize it to four kinds of variants to fit in with different application demands.
A. The compact KNO sub-family: Definition
Our proposed KoopmanLab module offers user-friendly tools to customize an MLP-KNO. A customized instance is presented below.
Similarly, a CNN–KNO can be customized using the following code, where we present one-dimensional and two-dimensional visions.
B. The compact KNO sub-family: Validation
To validate the proposed compact KNO sub-family in PDE solving tasks, we design mesh-independent and long-term prediction tasks on representative PDEs (e.g., the two-dimensional Navier–Stokes equation58 and the one-dimensional Bateman–Burgers equation59). The numerical datasets of these two are provided by Ref. 19.
In Fig. 2(a), we validate the mesh-independent property of the proposed compact KNO sub-family adopting the same setting used in our earlier work.31 The mesh-independent property, as suggested by Refs. 5 and 16–21, arises from the fact that the neural operator is expected to learn the solution operator of an entire PDE family rather than be limited to a concrete parameterized instance. Specifically, we conduct the experiments on the data of one-dimensional Bateman–Burgers equation associated with different mesh granularity conditions (i.e., spatial resolution of meshes). Different versions of the compact KNO sub-family are defined by changing hyper-parameters [e.g., operator size o, frequency mode number f, and the power of the Koopman operator in Eq. (56)]. These models are trained by 1000 randomly selected samples with the lowest spatial resolution and conduct 1-s forward prediction on 200 samples associated with different resolutions. Batch size is set as 10, the learning rate is initialized as 0.001 and halved every 100 epochs, and the weights of prediction and reconstruction in Eq. (64) are set as . All models are trained by the Adam optimizer61 on a single NVIDIA Titan V graphics processing unit (GPU). As shown in Fig. 2(a), the prediction errors of all versions of the compact KNO sub-family maintain constantly across different spatial resolutions, suggesting the capacity of the compact KNO sub-family to be mesh-independent. Mesh-independence is important for PDE solving because it allows one to train a neural-network-based PDE solver on the data with low spatial resolution and directly apply the solver on the data with high spatial resolution, which helps balance between accuracy and efficiency in PDE solving. In our earlier work,31 one can further see a detailed comparison between the original KNO and FNO19 in mesh-independent prediction task, where the original KNO outperforms FNO with a much smaller model size (e.g., a size of 5 × 103 for KNO and a size of 2 × 107 for FNO). Other neural operator models, such as graph neural operator (GNO)5 and multipole graph neural operator (MGNO),62 are no longer considered because they have been demonstrated as less accurate than FNO as reported by Ref. 19.
In Figs. 2(b)–2(e), we validate the compact KNO sub-family by a long-term prediction task designed on the two-dimensional Navier–Stokes equation datasets with viscosity coefficients ν = 10−3 [Fig. 2(b)] and ν = 10−4 [Fig. 2(c)]. A down-sampling scaling factor of 2 is defined to generate the datasets with 212 grids. For comparison, a one-unit FNO is defined following the default setting introduced in Ref. 19. A 40-time-interval prediction task is conducted on the dataset with ν = 10−3, where models are trained on 1000 samples of and tested on 200 samples of . Similarly, a more challenging 10-time-interval prediction task is conducted on the dataset with ν = 10−4 in which models are trained on 8000 samples of and tested on 200 samples of . Figures 2(b) and 2(c) report the prediction performance of all models as the function of increasing prediction duration length. Figures 2(d) and 2(e) visualize predicted instances and errors in the cases with ν = 10−3 [Fig. 2(d)] and ν = 10−4 [Fig. 2(e)] during testing. All experimental results suggest the optimal potential of the compact KNO sub-family in characterizing the long-term evolution of PDE solutions. Combining these results with the model sizes measured in Table I, we suggest that the compact KNO sub-family realizes a better balance between accuracy and efficiency because a KNO variant with a smaller model size can still outperform FNO significantly. Meanwhile, although KNO costs more time in training and testing than FNO, it is significantly faster than the pseudo-spectral method,60 a representative numerical approach for simulating fluids. Because the datasets of the two-dimensional Navier–Stokes equation58 and the one-dimensional Bateman–Burgers equation59 used in our experiments are generated by the pseudo-spectral method60 following Ref. 19, our results suggest the potential of a trained KNO model to achieve the acceleration of multiple orders of magnitude during reproducing the accurate outputs of the pseudo-spectral method60 with small errors.
Models . | Settings . | Parameter number . | Training time cost . | Testing time cost . |
---|---|---|---|---|
FNO19 | Default settings, one-unit | 233 897 | 16.386 s/epoch | 0.0567 s/sample |
MLP-KNO | 206 538 | 89.082 s/epoch | 0.2389 s/sample | |
MLP-KNO | 526 026 | 64.964 s/epoch | 0.1634 s/sample | |
MLP-KNO | 464 170 | 86.740 s/epoch | 0.2380 s/sample | |
CNN–KNO | 206 538 | 89.928 s/epoch | 0.2360 s/sample | |
CNN–KNO | 526 026 | 67.814 s/epoch | 0.1736 s/sample | |
CNN–KNO | 464 170 | 89.042 s/epoch | 0.2287 s/sample | |
Pseudo-spectral method60 | Calculates | ⋯ | ⋯ | 511.09 s/sample |
Models . | Settings . | Parameter number . | Training time cost . | Testing time cost . |
---|---|---|---|---|
FNO19 | Default settings, one-unit | 233 897 | 16.386 s/epoch | 0.0567 s/sample |
MLP-KNO | 206 538 | 89.082 s/epoch | 0.2389 s/sample | |
MLP-KNO | 526 026 | 64.964 s/epoch | 0.1634 s/sample | |
MLP-KNO | 464 170 | 86.740 s/epoch | 0.2380 s/sample | |
CNN–KNO | 206 538 | 89.928 s/epoch | 0.2360 s/sample | |
CNN–KNO | 526 026 | 67.814 s/epoch | 0.1736 s/sample | |
CNN–KNO | 464 170 | 89.042 s/epoch | 0.2287 s/sample | |
Pseudo-spectral method60 | Calculates | ⋯ | ⋯ | 511.09 s/sample |
C. The ViT-KNO sub-family: Definition
Different from the compact KNO sub-family, the ViT-KNO sub-family is proposed for dealing with more intricate situations (here ViT stands for Vision Transformer63). Numerous applications of PDE solving (e.g., global climate forecasting) require the solver to be able to capture the underlying patterns of ultra-large datasets that may be related with certain unknown PDEs. Meanwhile, there may exist multiple variables of interest that govern by a group of coupled PDEs. To fit in with these situations, we follow the main idea of the compact KNO sub-family to develop a kind of transfer-based PDE solver. The mechanism underlying the proposed ViT-KNO sub-family is not completely same as Eqs. (52)–(62) because some mathematical details are modified to improve model applicability on noisy real datasets. We suggest the benefits of our modifications based on an experiment on ERA5, one of the largest dataset of global atmospheric, land, and oceanic climate fields.32 Nevertheless, more in-depth mathematical analyses of these modifications remain for future studies.
Let us consider a case where there exist v coupled variables, {γ1, …, γh}, defined on domain D × T. The dynamics of these variables are govern by a group of PDEs with unknown expressions. The objective is to learn the equation solutions of these latent PDEs such that the dynamics of {γ1, …, γh} can be accurately characterized.
The architecture of ViT-KNO sub-family consists of seven parts. Below, we present a detailed computational implementation of each part.
- Part 1: Observation. Similar to the encoder design in the compact KNO sub-family, an encoder component is implemented in ViT-KNO to serve as observation function and transform into for each s ∈ {1, …, h}. Specifically, the encoder is realized by the token embedding layer in Vision Transformer (ViT).63 Given a joint input , we first transform it into a three-dimensional token tensor Φt by a convolutional layer Ce and an arbitrary non-linear transform Q,where domain D is reorganized into u × v patches (i.e., tokens). The patch is a kind of square and non-overlapping macro-mesh. If domain D has been already discretized into multiple meshes, then the size of a patch equals the number of meshes it covers. Parameter l denotes a customized embedding dimension, which is not necessarily the same as h. In real cases, the definition of the non-linear transform Q can be selected according to different application demands. The derived tensor denotes the joint representation of . See Fig. 3 for illustrations.(72)
- Part 2: Fourier transform. Similar to adaptive Fourier neural operator,23,24,64 a truncated Fourier transform is applied on the first two dimensions of to derive the Fourier series of each embedded variable s ∈ {1, …, l},where . For convenience, we mark(73)(74)in which s ∈ {1, …, l}. Similar to the compact KNO sub-family, frequency truncation leads to the loss of high-frequency information. In the ViT-KNO sub-family, Part 5 is designed for complementing high-frequency information. See Fig. 3 for details.(75)
- Part 3: Koopman-operator-associated component. After deriving for every , a Koopman-operator-associated component is designed to function on the third dimension of every token in and realize the iterative dynamicsin which Koopman operator is learned by a linear transform and layer S is constructed by a non-linear activation function η acting on a linear layer W,(76)Although S is not a part of the original Koopman neural operator,31 including it can efficiently enhance the capacity of this component to characterize intricate large-scale data. In KoopmanLab, the leaky rectified linear unit (Leaky ReLU)65 is suggested as a default choice of S, which can also reduce to the ReLU function as a special case. See Fig. 3 for illustrations of Part 3.(77)
- Part 4: Inverse Fourier transform. Once is derived in Part 3, the inverse Fourier transform is applied on to transform back to the observation spacebased on which we can define(78)See instances in Fig. 3.(79)
- Part 5: High-frequency information complement. Same as the compact KNO sub-family, there is a component for complementing high-frequency information in ViT-KNO. This component is also realized by a convolutional layer C that acts on the outputs of Part 1 to learn the dynamics of high-frequency informationSee Fig. 1 for illustrations.(80)
- Part 6: Variable coupling. Given two predicted states, and , by Parts 2–4 and Part 5, we combine them in a linear formBecause ViT-KNO is designed to learn multi-variate systems governed by unknown coupled PDEs, we need to characterize the coupling relation among variables. Because we lack the a priori knowledge about these underlying PDEs, we suggest to capture the coupling relation by optimizing a non-linear layer M,(81)Following the idea of adaptive Fourier neural operator,23,24,64 we use the Gaussian Error Linear Unit (GELU) as the activation function in this non-linear layer. See Fig. 3 for illustrations.(82)
- Part 7: Inverse observation. Given , a decoder is implemented to function as the inverse of observation functionSimilar to the compact KNO sub-family, there are two kinds of decoders included in our proposed KoopmanLab module(83)in which Wd and Cd denote linear and convolutional layers, respectively. These two kinds of decoder designs distinguish between two variants of the ViT-KNO sub-family. See Fig. 3 for illustrations.(84)
Several computational tricks can be considered in the application. First, a LASSO regularization66 can be included to improve the robustness and sparsity of Koopman operator in Eq. (76). This trick has been demonstrated as effective in adaptive Fourier neural operator23,24,64 and is applicable to the ViT-KNO sub-family as well. Second, the transformer architecture supports a parallel design the ViT-KNO sub-family. Specifically, the third dimension of the output of Part 1 can be subdivided into multiple parts [e.g., is subdivided into k parts such that each part is an element in ]. Then, Parts 2–6 are copied k × j times, where each group of j copies is organized into a sequence. Each sequence of j copies is referred to as a head in the transformer, processes a corresponding part of , and shares parameters during optimization (see Fig. 3). Computationally, parameters k and j are referred to as the number and the depth of heads. The processed outputs of these k parallel heads are unified by Part 7 to derive the final prediction result. In our proposed KoopmanLab, these two tricks are included to improve computational efficiency.
Our KoopmanLab module supports customizing ViT-KNO frameworks. Below, we present an instance of ViT-KNO with a multi-layer perceptron as the decoder
Similarly, a ViT-KNO whose decoder is a convolutional layer can be defined as the following:
Note that there exist some detailed model parameters that are not covered by the above codes because they are highly coupled during computation or less important in our theoretical derivations. Users are suggested to adjust them after loading the source code of ViT-KNO.
D. The ViT-KNO sub-family: Validation
To validate the proposed ViT-KNO sub-family, we implement a large-scale experiment on ERA5, one of the largest high-resolution datasets of global-scale multi-variate climate fields.32 This dataset has been extensively applied in weather forecasting tasks (e.g., see FourCastNet23,24), ensuring the reproducibility and comparability of our results.
Twenty important climate variables are considered in our research, including mean large-scale precipitation (MSLP), relative humidity with 500 hPa (R500), relative humidity with 850 hPa (R850), surface pressure (SP), 2 m temperature (T2M), temperature with 500 hPa (T500), temperature with 850 hPa (T850), total column water vapor (TCWV), the 10 m u-component of wind (U10), the u-component of wind with 500 hPa (U500), the u-component of wind with 850 hPa (U850), the u-component of wind with 1000 hPa (U1000), the 10 m v-component of wind (V10), the v-component of wind with 500 hPa (V500), the v-component of wind with 850 hPa (V850), the v-component of wind with 1000 hPa (V1000), the geopotential with 50 hPa (Z50), the geopotential with 500 hPa (Z500), the geopotential with 850 hPa (Z850), and the geopotential with 1000 hPa (Z1000).
In a long-term prediction task, we test a ViT-KNO whose decoder is a multi-layer perceptron and encoder contains a head structure with a depth of 4 to function as mapping Q in Eq. (72). This model is provided as a default choice of ViT-KNO in the KoopmanLab module. For comparison, we implement the official version of FourCastNet provided in Refs. 23 and 24. Limited by computing resources (the FourCastNet is reported to be trained on 3808 NVIDIA A100 GPUs with numerous computational optimization tricks23), we are unable to reproduce the memory consuming optimization tricks introduced in Refs. 23 and 24 (e.g., the two-stage training procedure). Therefore, we compare between ViT-KNO and FourCastNet in a fair case where there is no extra optimization trick applied on the training of both models.
Given the samples of initial conditions, the implemented ViT-KNO and FourCastNet are required to predict the future states of all 20 climate variables after t ∈ {6, 12, 18, …, 240} h. The training dataset includes the samples recorded from 1979 to 2015. The validation dataset includes the samples recorded during 2016 and 2017. The test dataset includes the samples recorded in 2018. The spatial resolution of all samples is set as 1440 × 720. All data are pre-processed in a standard manner following Refs. 23, 24, and 32, where a Z-transform is applied to normalize all variables. The training of our ViT-KNO and FourCastNet is implemented in a multi-GPU environment with 128 × 16 GBs in total. The testing of trained ViT-KNO and FourCastNet is implemented in a single 16-GB GPU environment.
Key model settings are summarized below. For ViT-KNO, the batch size is set as 64, the learning rate is 5 × 10−4 and updated by a cosine annealing approach, the patch size is set as 8 × 8, the number of heads is 16, the depth of heads is 12, the embedded dimension l in Eq. (72) is 768, the relative weights of prediction and reconstruction in the loss function are λp = 0.8 and λr = 0.2, and the number of kept low-frequency modes after the fast Fourier transform is f = 32. For FourCastNet, all model settings are adopted from the official version.23,24 The defined ViT-KNO model has 73 905 428 parameters in total while the official version of FourCastNet23,24 has 74 691 840 parameters. The optimizer of both models is the FusedAdam, an Adam optimizer61 that support mixed precision and distributed training. All 20 climate variables are learned and predicted together rather than, respectively. Note that the information of land-form is not provided to both models. Both models are required to learn climate variables with no additional information. The maximum number of available epochs for training is set as 150 to explore when these models can converge, which costs about 46.25 h in our environment.
In Fig. 4, we visualize several instances of the predicted climate fields by the ViT-KNO after 70 epochs of training, accompanied by corresponding true values. High consistency can be seen between these ground truths and their predicted counterparts derived by ViT-KNO. Quantitatively, the prediction accuracy of each climate variable during testing is measured by anomaly correlation coefficient (ACC) in Figs. 5 and 6, where we compare between ViT-KNO and FourCastNet after 70 and 115 epochs of training, respectively. According to the same prediction task reported by Refs. 23 and 24, the trained ViT-KNO outperforms the baseline state-of-the-art deep learning models for weather forecasting proposed by Ref. 67 significantly. According to our observations in Figs. 5 and 6, ViT-KNO generally achieves a higher testing accuracy than FourCastNet. The advantages of ViT-KNO compared with FourCastNet will be enlarged as the prediction duration length increases. Moreover, the implemented FourCastNet seems to over-fit the training dataset as the training epoch number increases while ViT-KNO does not (one can find this phenomenon by comparing between the model performances reported in Figs. 5 and 6). These results suggest the potential for ViT-KNO to become a competitive alternative of FourCastNet, especially in the learning tasks of long-term and non-linear dynamics.
Moreover, the time costs of a single time of prediction by ViT-KNO and FourCastNet are observed as ≃0.851 972 9 and ≃0.734 301 49 s on a single 16-GB GPU. Compared with the classic numerical weather forecasting systems (e.g., the ECMWF integrated forecasting system) whose prediction inevitably requires a multi-GPU environment (e.g., more than 1000 NVIDIA Selene nodes where each node consists of 8 NVIDIA A100 GPUs),68–70 ViT-KNO is orders of magnitude faster in the application (e.g., the integrated forecasting system L91 18 km model is expected to cost about 9840 node seconds for prediction on a NVIDIA Selene node70). Therefore, our ViT-KNO has the potential to serve as a unit in ensemble weather forecasting frameworks to realize an efficient prediction of global weather. Although we cannot directly implement the ECMWF integrated forecasting system for comparison due to the limited computing resources, one can refer to Refs. 23 and 24 for the comparison between FourCastNet and the ECMWF integrated forecasting system, which may provide an indirect verification of the competitive performance of ViT-KNO model. More detailed comparisons remain for explorations in the future.
IV. CONCLUSION
In this paper, we have presented KoopmanLab, an efficient module of Koopman neural operator family, for solving partial differential equations. The included models in this module, such as the compact KNO sub-family and the ViT-KNO sub-family, are provided with mathematical foundations, computational designs, and validations in solving concrete PDEs or predicting intricate dynamic system governed by unknown coupled PDEs. All models are suggested as competitive with other state-of-the-art approaches in corresponding tasks. Compared with classic numerical and neural-network-based PDE solvers, the proposed KNO variants can achieve significant acceleration, more robust mesh-independence, higher generalization capacity on changed conditions, more flexibility in characterizing latent PDEs with unknown forms, and a better balance between accuracy and efficiency. Therefore, we suggest the potential of KoopmanLab be applied in diverse down-stream tasks related with PDE solving. Users can download this module via
Several important questions remain for future studies. First, one may consider more specialized computational optimization of models in KoopmanLab (e.g., consider multi-stage training as suggested in Refs. 23 and 24 or multi-objective balancing by the Pareto theory71,72). Second, one can explore a more detailed comparison between the ViT-KNO sub-family and the ECMWF Integrated Forecasting System under the equivalent hardware conditions. Third, one can analyze the errors of our models caused by the potential continuous spectrum of the Koopman operator or the absence of ergodic property in real cases.
ACKNOWLEDGMENTS
Y.T. and P.S. appreciate the support from the Artificial and General Intelligence Research Program of Guo Qiang Research Institute at Tsinghua University (Grant No. 2020GQG1017), the Tsinghua University Initiative Scientific Research Program, and the Huawei Innovation Research Program (Grant No. TC20221109044). X.H. was supported by the National Natural Science Foundation of China (Grant No. 42125503) and the National Key Research and Development Program of China (Grant Nos. 2022YFE0195900, 2021YFC3101600, 2020YFA0608000, and 2020YFA0607900).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Wei Xiong: Conceptualization (supporting); Data curation (lead); Investigation (equal); Methodology (supporting); Software (lead); Validation (lead); Visualization (supporting); Writing – review & editing (supporting). Muyuan Ma: Data curation (supporting); Investigation (supporting); Methodology (supporting); Software (equal); Validation (equal); Visualization (equal). Xiaomeng Huang: Data curation (equal); Project administration (supporting); Resources (equal); Validation (equal); Writing – review & editing (equal). Ziyang Zhang: Resources (supporting). Pei Sun: Conceptualization (equal); Funding acquisition (lead); Project administration (lead); Resources (equal); Supervision (equal); Writing – review & editing (equal). Yang Tian: Conceptualization (lead); Formal analysis (lead); Methodology (lead); Software (supporting); Supervision (supporting); Validation (lead); Writing – original draft (lead); Writing – review & editing (equal).
DATA AVAILABILITY
The code and data that support the findings of this study are available via https://github.com/Koopman-Laboratory/KoopmanLab.73