Hyena Neural Operator for Partial Differential Equations

Numerically solving partial differential equations typically requires fine discretization to resolve necessary spatiotemporal scales, which can be computationally expensive. Recent advances in deep learning have provided a new approach to solving partial differential equations that involves the use of neural operators. Neural operators are neural network architectures that learn mappings between function spaces and have the capability to solve partial differential equations based on data. This study utilizes a novel neural operator called Hyena, which employs a long convolutional filter that is parameterized by a multilayer perceptron. The Hyena operator is an operation that enjoys sub-quadratic complexity and state space model to parameterize long convolution that enjoys a global receptive field. This mechanism enhances the model's comprehension of the input's context and enables data-dependent weight for different partial differential equations instances. To measure how effective the layers are in solving partial differential equations, we conduct experiments on Diffusion-Reaction equation and Navier Stokes equation. Our findings indicate Hyena Neural operator can serve as an efficient and accurate model for learning partial differential equations solution operator. The data and code used can be found at: https://github.com/Saupatil07/Hyena-Neural-Operator


Introduction
Numerical modeling of Partial differential equations (PDEs) plays a crucial role in engineering as they serve as fundamental tools for representing and analyzing various physical phenomena.
They find application in diverse areas, such as fluid dynamics, gas dynamics, electrical circuitry, heat transfer, and acoustics, enabling us to model and understand these phenomena effectively.PDEs provide a framework for understanding complex systems by describing the relationships between various quantities that change over time and space.They are widely used in science and engineering to make predictions, optimize designs, and analyze data.
Traditional numerical solvers for partial differential equations (PDEs) are often costly because they rely on methods that require a fine discretization of the problem domain.Numerous techniques in deep learning have been proposed to address the computational complexity of numerical solvers and to forecast fluid properties.1][12][13] Neural operators are designed to operate on function representations and enable the learning of operators directly from data.Compared to traditional solvers, they alleviate the need for fine discretization and can be used to infer the solution of different instances within a family of PDE once trained.One of the earliest neural operators proposed was the DeepONet. 14It consists of a branch network responsible for processing the input functions and learning the action of the operator, along with a trunk network that learns the function bases for the solution function space.Wang et al. 15 further improved the performance of DeepONets by introducing an improved architecture and training methods.MIONet 16 extends DeepONet to problems involving multiple input functions.In addition to DeepONet, another group of methods 17,18 leverage a learnable kernel integral to approximate the target operator.A notable instance is Fourier Neural Operator 19 (FNO), which utilizes the Fourier transform to learn the convolution kernel integral in the frequency domain.The Fourier neural operator has been further adapted to various forms as shown in (Tran et al. 20 , Guibas et al. 21, Li et al. 22 ).
Other than the Fourier domain, the wavelet domain has also been explored in (Tripura and Chakraborty 23 , Gupta et al. 24 ).Cao 25 draws the connection between a softmax-free attention and two different types of integral and proposes a attention-based operator learning framework.Li et al. 26 further expands the work on attention by proposing to propagate to the solution in latent space with cross-attention mechanism and relative positional encoding. 27rious previous works 19,24,25 have shown that the capability of capturing global interaction is crucial to the prediction accuracy.Non-local learnable modules such as spectral convolution 19 , attention 25 or dilated convolution 28 are better at learning complex time-evolving dynamics where other local learnable modules like residual neural network (ResNet) 29 often fails to model.State space models (SSMs) are a type of recurrent model that can be viewed as long-context convolution.It effectively extends the receptive field to the whole input sequence and has the potential to learn and model complex non-local interaction that lies in the PDE data.
The state space models are represented by the following equations: , its variants 32,33 and Gated State Space (GSS) 34 .A later work Hungry Hungry Hippo (H3) 35 was proposed to address the limitations of prior SSM layers, specifically targeting two key drawbacks: their incapability to recall previous tokens in the sequence and their expensive computational cost.H3 solves the associative recall by including an additional gate and a short convolution obtained via a shift SSM.It also proposes FlashConv, a fast and efficient algorithm for training and inferring architecture with a latent-marching strategy 26 .We demonstrate that HNO has competitive performance against Fourier Neural Operator on various numerical benchmarks.

Method Hyena Neural Operator
The Hyena operator can be characterized as a repetition of two sub-quadratic operations: an implicit long convolution h (which means that the Hyena filters are implicitly parameterized by the output of a feed-forward network) and a multiplicative component-wise control of the (projected) input.Hyena first computes N + 1 learnable projections1 of the input: , which is similar to query/key/value projections in a standard attention mechanism.The next step is to compute the convolution filters, which are implicitly parametrized [38][39][40] and modulated via a window function.Concretely, the value of the filter h on the t-th location is given by: where ψ(•) is a window function that decays exponentially with respect to t: ψ(t) = exp(−αt), with α controlling the decaying speed, FFN denotes the feed-forward network equipped with a sine activation function, and γ(•) is a positional encoding function: with K as a hyperparameter, L being the length of the input sequence.The implicit filter decouples the parameter size of the filter and its valid receptive field.The sine activation function together with the positional encoding function allows the filter to learn high-frequency patterns 41 whereas the exponential decaying function enables the learned filter to focus on where K denotes the convolution operation: K(h, u) = h * u = L n=1 h t−n u n , and ⊙ denotes element-wise multiplication, N is a hyperparameter.If we view the input sequence as the sampling of a function on the discretization grid {x t } N t=1 , then (4) can be viewed as an approximation to the integral transform: z n+1 (x t ) = ξ n (x t ) Ω h n (x t − y)z n (y)dy, where the function are iteratively updated by a kernel integral and an instance-based weight value ξ n (x t ).The spectral convolution layer in FNO can be viewed as a special case of (4) with filter's value explicitly parameterized and no instance-based weight.
Encoder The encoder is composed of three main components, an input embedding layer that takes in the input function's sampling and lifts the input features into high-dimensional encodings u (0) , multiple layers of Hyena operator followed by feedforward networks.The output from each Hyena layer is aggregated and then passed on to the projection layer which projects the output from the Hyena layers to latent embedding.The latent embeddings are passed through a series of Hyena layers and the output from the layers is once again aggregated and passed to the decoder.The update protocol inside each Hyena operator block is: where Hyena(•) denotes the Hyena operator, Norm(•) denotes the layer normalization layer 42 .
Decoder To generate the solution, the decoder utilizes the input coordinates and the output obtained from the encoder.The first layer is a random Fourier projection layer 41,43 .By incorporating random Fourier projection, the inherent spectral bias found in coordinate-based neural networks is alleviated 38,41 .Following the Fourier projection, the latent encoding u (L) , along with the encoding of positions p (0) that has been learned, is fed into the cross-attention module inspired by the Li et al. 26 .Finally, the decoder outputs the prediction by taking the result of the cross-attention module, passing it through the Hyena operator, and then applying a feed-forward network.The decoder process can be described as follows: Training settings The overall training framework of this work shares similarities with previous data-driven models focused on operator learning.We used the Adam optimizer 44 and a CosineAnnealing scheduler 45 with a decay rate of 1e − 8.The dropout rate was set to 0.03 inside the feedforward layers of the Hyena operator.Unless stated we have trained the models for 500 epochs with an initial learning rate set as 1 × 10 −4 .We use GELU 46 activation.To train the model on 2D Navier-Stokes data, we employ a curriculum strategy that involves gradually increasing the prediction time steps following Li et al. 26 .Instead of forecasting all upcoming states until the end of the specified time horizon, we initially limit the duration by a fraction called γ (around γ ≈ 0.5) and then gradually grow the time duration as the training progresses.In this approach, the network is trained to predict the states u t 0 , u t 1 , . . ., u γT .We found that implementing the above strategy worked better than asking the model to predict the whole sequence at once which consequently improves stability and leads to slightly faster convergence.

Numerical Experiments
We assess our model's performance using benchmark problems, we consider the 2D Navier-Stokes equation and the 1D Diffusion-Reaction equation.To ensure a comprehensive assessment, we conduct a comparative analysis between the performance of our model and that of the state-of-the-art neural operator, namely the Fourier neural operator.Detailed insights into our model's architecture for different problems are available in Appendix A.

1D Diffusion-Reaction
We used the dataset provided by PDEBench 47 a benchmark for SciML.The data consist of an one-dimensional diffusion-reaction type PDE, that combines a diffusion process and a rapid evolution from a source term. 48The equation is expressed as: u(0, x) = u 0 (x), x ∈ (0, 1).
We evaluate the performance of Fourier neural operator and Hyena neural operator on different values of ν = 0.5, 2.0 at different resolutions.We provide the condition at the initial time step and the model predicts the solution at the final time step.Table 1 shows that Hyena neural operator consistently performs better than FNO for different values of ν at varying resolutions.

Navier-Stokes Equation
The Navier-Stokes equations are one of the most important equations in physics.They are a fundamental description of the motion of fluids.It is a complex and nonlinear equation that dictates the dynamics of various fluid flows, encompassing turbulent phenomena as well.The  equation in velocity format can be written as: where f is the external force, ν represents kinematic viscosity, p is the pressure term and u is the velocity vector.The problem studied in this work follows the previous work of Li et al. 19 , where the target is to predict the vorticity: ω = ∂u y /∂x − ∂u x /∂y given a fixed time horizon T and the initial value ω 0 sampled from a Gaussian random field.The dataset is generated on a 256 × 256 grid and sub-sampled to 64 × 64 for training and testing.By applying the curriculum strategy to train the time-dependent data, the model was able to learn the solution more efficiently and converge slightly faster.

Conclusion
In this study, we present the Hyena neural operator, a subquadratic state-space model for learning the solution of PDEs.The data-controlled linear operator demonstrated promising performance and achieved competitive outcomes when compared to alternative approaches.
Future work for HNO includes downsampling the high-resolution data in latent space by using contracting-expanding architecture such as Unet. 49Other directions include using tokenized equations to learn physically relevant information 50 and improve the HNO further.
SSMs.It works by using a fused block FFT algorithm to compute the convolutions in the SSM, which significantly reduces the training and inference time.Recent work Hyena 36 further extends H3 and incorporates implicit filter parametrization, advancing the accuracy and efficiency of SSM-based model, which have achieved state-of-the-art performance across benchmarks like LRA. 37 This work presents a novel deep-learning architecture for learning PDE solutions called Hyena Neural Operator (HNO), which utilizes long convolutions and element-wise multiplicative gating mechanism.Hyena Neural Operator(HNO) employs an Encoder-Decoder

Figure 1 :
Figure 1: Hyena Neural Operator architecture.Given the initial observation and the grid, the encoder layer encodes it to a latent embedding, which is an input to the latent Hyena layers.The latent output from Hyena layers, Fourier projection, and the grid is given as input to the cross-attention module.The resultant values are once again passed through Hyena layers and the output solution is obtained following an MLP layer.

Figure 2 :
Figure 2: Hyena architecture.The input to the Hyena operator is first projected to a width defined by the order and input dimension.The projections are first passed through a short filter and then to generated filters made on the fly.Inside the Hyena filter, the data is processed in three steps: first the positional encoding, second the implicit filter, and lastly the exponential modulation.

Figure 3 :
Figure 3: Visualization of time evolution of 1D Diffusion-Reaction equation.Black dotted lines denote the model's output.The blue line denote the initial condition given as input to the models.
Fig 3 shows the time evolution of the equation.The models have been trained for 200 epochs with a batch size of 20.

with a batch size of 4 .
For the size of each Navier-Stokes dataset, NS2-full contains 9800/200 (train/test) samples; each of the rest datasets contains 1000/200 samples.Solving a complex equation like Navier-Stokes, the Hyena neural operator significantly outperforms the Fourier neural operator when tested on different viscosities ν = 10 −3 , 10 −4 , 10 −5 with varying T on both large dataset and small dataset .For viscosity such as ν = 10 −5 , where the flow change is more complicated compared to other viscosities, the Hyena operator can keep up with temporal changes due to its ability to capture the global interaction with long convoluions.

Table 2 :
Relative L 2 norm for Navier-Stokes equation benchmark with a fixed resolution of 64x64.Bold indicates best performance.