Fourier neural operators (FNOs) are invariant with respect to the size of input images, and thus images with any size can be fed into FNObased frameworks without any modification of network architectures, in contrast to traditional convolutional neural networks. Leveraging the advantage of FNOs, we propose a novel deeplearning framework for classifying images with varying sizes. Particularly, we simultaneously train the proposed network on multisized images. As a practical application, we consider the problem of predicting the label (e.g., permeability) of threedimensional digital porous media. To construct the framework, an intuitive approach is to connect FNO layers to a classifier using adaptive max pooling. First, we show that this approach is only effective for porous media with fixed sizes, whereas it fails for porous media of varying sizes. To overcome this limitation, we introduce our approach: instead of using adaptive max pooling, we use static max pooling with the size of channel width of FNO layers. Since the channel width of the FNO layers is independent of the input image size, the introduced framework can handle multisized images during training. We show the effectiveness of the introduced framework and compare its performance with the intuitive approach through the example of the classification of threedimensional digital porous media of varying sizes.
I. INTRODUCTION AND MOTIVATION
Since 2020, neural operators have gained extensive popularity, specifically with two versions of graph neural operators (Li , 2020b; 2020c) and Fourier neural operators (FNOs) (Li , 2020a; Kovachki , 2024). In this article, our attention is on FNOs. From a computer science perspective, regular FNOs fall in the category of supervised deep learning framework, necessitating a large volume of labeled data for training. FNOs have demonstrated their proficiency in input–output mapping across various industrial and scientific applications such as incompressible flows (Li , 2022b; Bonev , 2023; Peng , 2024; Choubineh , 2023; Lyu , 2023; Gupta and Brandstetter, 2022; and Peng , 2023a), wave equations (Zhu , 2023a; Zou , 2023; and Yang , 2023), thermal fields (Zhao , 2024; Hao , 2023), carbon storages and sequestration (Wen , 2022; Jiang , 2023b), and other areas (Peng , 2023b; You , 2022; Kontolati , 2023; Zhu , 2023b; Hua and Lu, 2023; White , 2023a; Li , 2021; Pathak , 2022; Rahman , 2022b; 2022a; Yang , 2022; Li , 2022a; Maust , 2022; Zhao , 2022; Renn , 2023; Xiong , 2023; Chen , 2023; Huang , 2023; Poels , 2023; White , 2023b; Thodi , 2023; Zhao , 2023; Tran , 2023; Lee, 2022; Brandstetter , 2023; Li , 2023; Majumdar , 2023; Jiang , 2023a; Lehmann , 2024; Subramanian , 2024; Fanaskov and Oseledets, 2024; Lanthaler, 2021; and Azzizadenesheli , 2023). From a computer vision perspective, these are framed as segmentation problems, where an input image, such as the geometry of an airfoil, is mapped to another image, for instance, the velocity field around that airfoil. An analogous area in computer vision is classification, where an input image is mapped, for example, to a name or number. While FNOs have potential in classification tasks, there exists only a limited amount of research conducted in this application as per our knowledge (Johnny , 2022; Xi , 2022; and Kabri , 2023).
Johnny (2022) used the FNO architecture for classifying images in the CIFAR10 dataset, containing ten different classes; however, they trained the network only on images with a fixed size of 32 × 32 pixels. Additionally, Kabri (2023) examined the FNO architecture for image classification. Although they tested images of various sizes (e.g., 28 × 28 pixels, 112 × 112 pixels, etc.), they trained and then tested the network separately for each size, assessing its performance on the corresponding size. Xi (2022) utilized the FNO architecture for the hyperspectral remote sensing image classification. Their dataset comprised images of various sizes, including 512 × 614 pixels, 610 × 340 pixels, and 512 × 217 pixels. However, they adjusted all images to a fixed size by adding patches. Consequently, although they employed the FNO architecture, in practice, they limited their analysis to images of a uniform size. In the current study, we narrow our focus on classification problems. More specifically, we consider the problem of predicting the permeability of threedimensional digital porous media, which vary in size, as a benchmark test case.
FNOs are invariant with respect to the size of input images, and this characteristic ensures that images of varying sizes can be processed by FNObased deep learning frameworks without requiring any architectural alterations. Note that regular convolutional neural networks (CNNs) lack this feature (Goodfellow , 2016). Building on this strength of FNOs, we introduce a deep learning framework for training the network simultaneously on images with varying sizes for a classification problem. To achieve this deep learning framework, FNO layers must be connected to a classifier, which is commonly a multilayer perceptron (MLP). An intuitive approach to set this would be to link FNO layers to a classifier via adaptive max pooling. Considering the application of permeability prediction of threedimensional porous media, our machinelearning experiments show that this intuitive approach only works well for porous media with fixed sizes. Pivoting from this, we propose our novel approach. Rather than using adaptive max pooling, we implement static max pooling with the size of the channel width of the FNO layers. Given that the size of the channel width of FNO layers is independent of the size of input images, our proposed framework can be efficiently trained on various image sizes at once (see Figs. 1 and 2).
To explain, at a high level, the difference between using adaptive max pooling (see Fig. 2) and static max pooling (see Fig. 1), let us consider for example a threedimensional image being fed as an input of the deep learning framework. For both pooling methods, at the framework's outset, FNO layers lift the input image from its threedimensional space to a higher dimensional space, determined by the size of the channel width of the FNO layers. In the case of adaptive max pooling, after FNO layer operations, the outcome eventually is dropped into the initial threedimensional space with the same size as the input image. This array then serves as the input of adaptive max pooling. The output of the adaptive pooling is then the input of the classifier. In the case of static max pooling, before FNO layers drop the output, we implement static max pooling, which functions within the high dimensional space and pools with the size of the channel width of FNO layers. The resulting output from this pooling then becomes the classifier's input. A more detailed exploration of these concepts is provided in Sec. II.
The study of physical and geometric features of porous media is important in diverse scientific and industrial areas such as digital rock physics (Andra , 2013a; 2013b), membrane systems (Liang and Fletcher, 2023), geological carbon storages (Blunt , 2013), and medicine (Kumeria, 2022; Das , 2018). Deep learning frameworks have been widely used for predicting the permeability of porous media (Meng , 2023; Xie , 2023; Kashefi and Mukerji, 2023; 2021; Liu , 2023; Hong and Liu, 2020; Wu , 2018; Tembely , 2020; Masroor , 2023; and Sun , 2023), but, to the best of our knowledge, all these frameworks were trained on a fixedsize porous media. Note that training the proposed network to predict the permeability of porous media of varying sizes comes with an exclusive challenge when compared to training the network on conventional images for the purpose of classifying them by their names (like those of cats and dogs). For conventional images, one possible solution to handle images with different sizes is to equalize them by adding mini patches to the smaller images. Nevertheless, this solution is inapplicable to the porous media problem. Adding mini patches to porous media can alter their physical properties such as permeability. For instance, adding mini patches around a porous medium simulates sealing it with wall boundaries, which prohibits flow within its pore spaces, resulting in a permeability of zero. Additionally, the inherently threedimensional nature of porous media introduces another layer of complexity compared to the twodimensional conventional images. We summarize the contributions of our work in the following bullet points:

We propose a novel deep learning framework for image classification.

The proposed framework leverages Fourier neural operators, which are invariant to the size of input images.

Specifically designed to train simultaneously on images of multiple sizes, the framework can effectively classify images of varying sizes.

This is an important feature for applications where input images naturally vary in size. We demonstrate its application specifically for threedimensional images.
The remainder of this article is organized as follows. We introduce and discuss the concept of Fourier neural operators for image classification in Sec. II, starting with the traditional strategy of adaptive max pooling, followed by our novel approach of static max pooling in the high dimension of the Fourier space channel. A brief review of theoretical aspects of FNOs is given in Sec. II C. Data generation and the training methodologies are, respectively, presented in Secs. III and IV. In Sec. V, we provide results and discussion, including a comparison between traditional strategy and our novel approach. Moreover, we present a sensitivity analysis, covering the number of Fourier modes, the channel width of discrete Fourier space, the number of FNO units, and the effect of activation functions and average pooling. The deep learning model generalizability is discussed in this section as well. Finally, we summarize the work and present insight into future directions in Sec. VI.
II. FOURIER NEURAL OPERATORS FOR IMAGE CLASSIFICATION
A. Our novel approach: Static max pooling in channel width of FNO layers
B. Intuitive approach: Adaptive max pooling in 3 D spatial space
In this subsection, we explain the intuitive approach (see Fig. 2). Drawing parallels to our approach elaborated in Subsection II A, we begin by considering the input porous medium, which is a threedimensional matrix represented by $ A n \xd7 n \xd7 n$. All operations outlined in Sec. II A are applied to $ A n \xd7 n \xd7 n$ until the network obtains the matrix $ B width \xd7 n \xd7 n \xd7 n l$ at an intermediate step, as depicted in Fig. 2. As the next step, we drop (as an inverse of the lifting operator explained in Sec. II A) the matrix $ B width \xd7 n \xd7 n \xd7 n l$ from the high dimensional space to the default space by means of a fully connected network. This transformation results in the matrix $ Z n \xd7 n \xd7 n$. At this juncture, we use the adaptive threedimensional max pooling, a functionality that is available in deep learning platforms such as PyTorch (Paszke , 2019) or TensorFlow (Abadi , 2015). To ensure a fair comparison between the traditional approach and our novel approach, we keep the size of the vector of the global feature consistent across both approaches. To this end, the output of the adaptive max pooling is tailored to yield a vector of size width. The resulting vector represents the global features of the input images.
Note that because the size of matrix $ Z n \xd7 n \xd7 n$ depends on the size of the input image (i.e., n), the pooling must be adaptive as we plan to train the network simultaneously on input images with varying sizes (e.g., $ A 40 \xd7 40 \xd7 40 , \u2009 A 48 \xd7 48 \xd7 48$, and $ A 56 \xd7 56 \xd7 56$). Subsequent to the adaptive max pooling, the global feature vector is connected to a classifier. This classifier features and architecture are precisely the same as the one elucidated in Sec. II A.
To close this subsection, it is noted that the main difference between static max pooling and adaptive max pooling can be articulated as follows. In static max pooling, the kernel size and stride are constant, whereas in adaptive max pooling, they are not constant and are computed based on the input size. For further details and formulations, one may refer to the TensorFlow (Abadi , 2015) and PyTorch (Paszke , 2019) handbooks.
C. A brief review of theoretical aspects of Fourier neural operators
We focused on the technical aspects and computer implementation of FNO layers in Secs. II A and II B. Theoretical aspects of FNO layers have already been vastly explained and discussed in the literature (Li , 2020a). In this subsection, we briefly review the theory behind FNO layers and highlight some important features.
III. DATA GENERATION
To generate synthetic data to examine the deep learning framework under investigation in this study, we consider cubic porous medium domains with length L along each side, spatial correlation length of l_{c}, and porosity of $\varphi $ (the ratio of pore spaces to the total volume of a porous medium). We use the truncated Gaussian algorithm (Lantuejoul, 2002; Le RavalecDupin , 2004) to generate synthetic porous media. In practice, we create threedimensional cubic arrays of dimension $ n \xd7 n \xd7 n$, populated with random numbers conforming to a normal distribution with the characteristics of a mean value of 0.0 and a standard deviation of 1.0. Subsequently, we filter the arrays by a threedimensional Gaussian smoothing kernel with a standard deviation of 5.0 and a filter size commensurate with a spatial correlation length (l_{c}) of 17. We then subject the arrays to a binarization process via a thresholding number such that the porosity ( $\varphi $) of the resulting arrays lies within the range of [0.125, 0.200]. We use the MATLAB software to handle the abovedescribed steps. We set L as $ n \xd7 \delta x$, where δx represents the length of each side of a pixel in porous media. We set δx to 0.003 m. We generate porous media with three different sizes by considering three different values for n, such that $ n 1 = 40 , \u2009 n 2 = 48$, and $ n 3 = 56$. In this way, each cubic porous medium can be characterized by its size as n^{3} (e.g., 40^{3}, 48^{3}, and 56^{3}). For each n, we generate 1250 data. We randomly split the generated data corresponding to each size into three categories of training (80%, i.e., 1000 data), validation (10%, i.e., 125 data), and test (10%, i.e., 125 data). Hence, there are 3750 data in total, 3000 data for the training set, 375 data for the validation set, and 375 data for the test set. Figure 3 exhibits a few examples of the generated synthetic data.
IV. TRAINING
V. RESULTS AND DISCUSSION
A. General analysis
As illustrated in Fig. 4, the success of our approach is evident in the R^{2} score, 0.968 09, obtained for the test set (e.g., 375 data). Additionally, Fig. 5 specifically showcases the R^{2} scores for the test set but individualized for each cubic size (i.e., 40^{3}, 48^{3}, and 56^{3}). As can be seen in Fig. 5, the R^{2} scores obtained are equal to 0.968 30, 0.969 78, and 0.966 07, respectively, for the cubic digital porous media of sizes 40^{3}, 48^{3}, and 56^{3}. The range of R^{2} scores for the three different sizes remains at an excellent level, demonstrating that our FNObased framework is robust and not overfitted to any specific size. Regarding the speedup achieved using the proposed deep learning framework, it predicts the permeability of the test set approximately in 18 s using our graphics processing unit (GPU) machine. In contrast, computing the permeability of the same data with our inhouse Lattice Boltzmann Method code, developed in C++ programing language, needs approximately 27 min on a single Intel(R) Core processor with a clock rate of 2.30 GHz. As a result, the average speedup factor we have accomplished is approximately 90 times faster compared to our conventional numerical solver. It is important to mention that the reported speedup factor is highly dependent on the efficiency of the numerical solver and the computing power. For instance, our numerical solver is a custom that operates on a single central processing unit. Modern and commercial applications (e.g., COMSOL and GeoDict) are significantly faster compared to our C++ code.
B. Number of Fourier modes in each dimension
Our deep learning experiments demonstrate that there is a critical interplay between the number of modes (i.e., $ m max , 1 , \u2009 m max , 2$, and $ m max , 3$) set in the proposed FNO framework and the tendency for overfitting during the training procedure. Accordingly, setting the number of modes beyond 2 leads to a severe divergence between the training and validation loss. This fact can be observed in Fig. 6 when we set $ m max , 1 = 7 , \u2009 m max , 2 = 7$, and $ m max , 3 = 7$ or $ m max , 1 = 10 , m max , 2 = 10$, and $ m max , 3 = 10$. The reported results indicate that the number of modes plays a critical role in the FNO model generalization. A further survey of the influence of the number of modes in the FNO configuration is performed by varying the number of modes in all three principal directions, from 2 to 10, and the obtained R^{2} scores are tabulated in Table I. Accordingly, the optimal mode configuration for avoiding overfitting is 2, as the divergence between the validation and training loss is minimized. Consequently, a careful selection of the number of modes in the FNO units is necessary to make the deep learning framework robust and reliable for the image classification application. The consequence of this scenario is observable in Fig. 7, where we plot the R^{2} score for the test sets, for example, for the choice of $ m max , 1 = m max , 2 = m max , 3 = 4 , \u2009 m max , 1 = m max , 2 = m max , 3 = 7$, and $ m max , 1 = m max , 2 = m max , 3 = 10$. In all of these cases, the R^{2} scores obtained for the prediction of the permeability of the porous media in the test set are less than 0.4.
Number of modes in each dimension .  2 .  3 .  4 .  5 .  6 .  7 .  8 .  9 .  10 . 

R^{2} score  0.968 09  0.154 16  0.267 57  0.263 61  0.233 25  0.387 89  0.404 33  0.317 73  0.288 39 
Number of modes in each dimension .  2 .  3 .  4 .  5 .  6 .  7 .  8 .  9 .  10 . 

R^{2} score  0.968 09  0.154 16  0.267 57  0.263 61  0.233 25  0.387 89  0.404 33  0.317 73  0.288 39 
We perform two other experiments. In the first one, we set only one mode (e.g., $ m max , 3$) to 10 ( $ m max , 3 = 10$) and the other two modes to 2 (i.e., $ m max , 1 = 2$ and $ m max , 2 = 2$). In the second one, we set only two modes (e.g., $ m max , 2$ and $ m max , 3$) to 10 and the reminder mode to 2 (i.e., $ m max , 1 = 2$). The outputs of these two experiments are illustrated in Fig. 8. As can be seen in Fig. 8, the resulting R^{2} scores of the test set are equal to 0.222 98 and 0.347 28, respectively, for the first and second experiments. Accordingly, we conclude that even increasing one mode beyond 2 drastically negatively affects the performance of the proposed FNO framework for the current application. Hence, the main challenge of working with the proposed network is its high sensitivity to the number of modes. As discussed in this subsection, changing even the number of modes in one dimension leads to overfitting of the network on the training data and a lack of efficiency in predicting the test data, consequently resulting in a lack of generalizability.
C. Channel width of FNOs
We further analyze the impact of different channel widths on the performance of the introduced deep learning framework. Based on our machine learning experiments, R^{2} scores obtained for the channel width of 8, 27, 64, and 125 are 0.499 04, 0.816 18, 0.968 15, and 0.944 57, respectively. When the channel width decreases from 64 to 27 or to 8, a significant drop in the R^{2} score is observed. Notably, increasing the channel width beyond 64 to 125 also leads to a slight decrease in the precision of permeability predictions.
As discussed in Sec. II C, the choice of channel width is directly related to the number of trainable parameters, which are 30 897, 163 783, 828 673, and 3 096 531 for each respective channel width. Moreover, the channel width also determines the size of the max pooling, representing the size of the global feature vector. Hence, optimizing channel width is critical. Small channel width leads to poor performance, whereas large channel width imposes high computational costs and memory allocation without necessarily a significant performance improvement.
D. Number of FNO units
We investigate the effect of varying the number of FNO units (see Fig. 1). Deep learning experiments are conducted using one, two, three, four, and five units to assess the impact on the introduced FNO performance. By computing the R^{2} score across the test set, we realize that there is no significant improvement in the prediction accuracy. R^{2} score for the FNO configuration with one, two, three, four, and five units are, respectively, 0.827 67, 0.917 03, 0.968 13, 0.967 59, and 0.978 18. Hence, adding more units (beyond 3) and making the network deeper does not have a remarkable effect on the prediction accuracy. However, the number of trainable parameters and consequently, the computational cost and required GPU memory (e.g., RAM) escalated by adding FNO units. For example, 820 353, 824 513, 828 673, 832 833, and 836 993 are, respectively, the number of trainable parameters of the model with one, two, three, four, and five layers of FNOs.
E. Activation functions
We give particular attention to the effect of choosing an activation function on the prediction ability of our FNO model. In the primary setup, we configure all layers to employ the ReLU activation function, except the last layer of the classifier, where we utilize a sigmoid function. We implement two alternative setups. In the first one, we alter the activation function of the last layer to ReLU, this configuration results in a drastic reduction in the R^{2} score of the test set, regardless of if the output permeability is normalized between 0 and 1. In the second setup, we replace the activation function in all layers with sigmoid. As a consequence of this setup, a slight decrease in performance is indicated, as R^{2} score of 0.914 78 is obtained for the test set. Note that the training procedure becomes slower in this setup, as the derivative of the sigmoid function results in a more complicated computation graph compared to that one output by the derivation of the ReLU function.
F. Static max pooling vs static average pooling
Within the context of capturing global features in the proposed FNObased framework, we explore the efficacy of implementing static average pooling as an alternative to static max pooling. Our machine learning experiment yields a R^{2} score of 0.944 78 in this case, demonstrating a marginal diminishment in the network performance compared to the presence of static max pooling. As supported by the literature (Qi , 2017a; 2017b; Kashefi , 2021; Kashefi and Mukerji, 2022; and Kashefi , 2023), max pooling is a preferred technique for classification tasks compared to average pooling. Our finding shows a similar pattern for the introduced FNObased framework.
G. Generalizability
In this subsection, we assess the generalizability ability of the proposed FNObased framework. Note that the concept of generalizability in the context of the present work extends to the network's performance to predict the permeability of cubic porous media with unseen sizes. As discussed in Sec. IV, the network was initially trained using porous media with cubic geometries of sizes 40^{3}, 48^{3}, and 56^{3}. To examine the network capacity to generalize, we predict the permeability of porous media with sizes 36^{3}, 44^{3}, 52^{3}, and 60^{3} with 375 cubes for each of these sizes using our pretrained FNObased framework. Figure 9 shows a few examples of these synthetic data, generated for the purpose of examining the network generalizability. As shown in Fig. 10, a slight decline is observed in the accuracy of permeability predictions for porous media with unseen sizes. However, the obtained $ R 2$ scores remain in an excellent range. These scores are 0.931 85, 0.911 24, 0.915 00, and 0.908 44 for the porous media sizes of $ 36 3$, $ 44 3$, $ 52 3$, and $ 60 3$, respectively. As another observation, the performance of our approach is marginally higher in predicting the permeability of unseen porous media with smaller cubic sizes. As highlighted in Fig. 10, R^{2} scores of porous media with sizes of 36^{3} are greater than ones with a size of 44^{3}. A similar scenario occurs when we compare porous media of sizes 52^{3} and 60^{3}. This can be attributed to the fact that, for smaller sizes, the fixedsize vector of the global feature encodes the features of smaller cubes more effectively. Moreover, note that the vector size is the same as the width channel. As a last comment in this subsection, to enhance the network's generalizability, a potential strategy could involve expanding the training dataset to include more than the initial three sets of geometry sizes.
H. Comparison with intuitive approach
1. Classification of fixedsized image
For the comparison between the proposed approach (see Fig. 1) and the intuitive approach (see Fig. 2), we consider the problem of predicting the permeability of porous media with fixed cubic sizes. Specifically, we consider a size of 48^{3}. Similar outputs are observed for other sizes. To ensure a fair comparison, both methodologies are investigated under similar conditions. Specifically, both methods are set to have an approximately equal number of trainable parameters (i.e., 828 738 for the intuitive strategy and 828 673 for our approach). Accordingly, the size of the vector representing the global feature is 64 in both methods. All other parameters such as the number of modes in each direction, the number of FNO units, the classifier architecture, and size, are the same in both methods and are set as those listed in Sec. IV (i.e., the training section).
Our results demonstrate that both methods perform proficiently, with the R^{2} score of 0.993 48 and 0.973 60 over the test set for the intuitive approach (see Fig. 2) and the proposed approach (see Fig. 1), respectively. The evolution of the loss function for the training and validation sets indicates a convergence after approximately 3000 epochs. This deep learning experiment confirms an approximately equivalent computational cost between the two approaches. Hence, when the image size of training data are fixed, both strategies are effective for the defined image classification task and there is no significant advantage for one method over the other, according to our analysis. As a last point in this subsection, we note that one may also use static max pooling in the architecture of the traditional approach since the size of porous media is fixed in this experiment. Based on our results, the performance does not change.
2. Classification of multisized images
In this subsection, we compare the performance of the proposed approach (see Fig. 1) with the intuitive approach (see Fig. 2) in predicting the permeability of porous media with varying sizes. For a fair comparison between the intuitive approach and the proposed approach, we use the same training, validation, and test set described in Sec. III. The evolution of both training and validation losses is depicted in Fig. 11. Figure 11 indicates a divergence between the training and validation losses for the network used in the intuitive approach, which suffers from overfitting, whereas this is not the case for the proposed approach. The superiority of the proposed approach is also evident by the R^{2} score obtained for the test set. Accordingly, the R^{2} scores of the proposed approach and the intuitive approach are, respectively, 0.968 09 and −0.426 32. The negative value of the R^{2} score for the intuitive approach demonstrates that its model makes worse predictions than a model that simply predicts all outputs as the mean value of the dataset. Note that changing hyperparameters, such as the number of modes, channel width, and number of FNO layers, does not improve the model of the intuitive approach.
This flaw stems from two reasons. First, using the intuitive approach, the network captures the global feature after lifting cubes into the original space, while the trainable parameters of the network are mainly defined in the Fourier space. Second, the adaptive max pooling's size is altered depending on the size of the input cubic porous medium. These two together lead to a misrepresentation of the global feature of cubes with different sizes, when the network tends to predict the permeability of the validation and test sets. Note that in Sec. V H 1, we showed that the intuitive approach worked well when it was trained over porous media with fixed sizes. However, the result of our machine learning experiments illustrates that the global features of cubes with different sizes are amalgamated. In contrast, our approach uses static max pooling consistent with the channel width of Fourier neural operators before transitioning back to the original space. This approach enables the capture of global features prior to changing spaces.
VI. SUMMARY AND FUTURE OUTLOOKS
In this research study, we introduced a novel deep learning framework based on Fourier neural operators for classifying images with different sizes (see Fig. 1). Because Fourier neural operators are resolution invariant, they have the potential to be used for the task of multisized image classification. To reach this goal, Fourier neural operators must be connected to a classifier, ideally using a pooling operator. To this end, we proposed the novel idea of implementing a static max pooling operator, which functions in a high dimensional space with the size of Fourier channel width. We showed the efficiency and robustness of this framework by predicting the permeability of threedimensional digital porous media with three different sizes of 40^{3}, 48^{3}, and 56^{3}. We explored the effect of key parameters such as the number of Fourier modes in each dimension, the channel width of the discrete Fourier space, activation functions in different layers, and the number of Fourier units. Additionally, we showed that while the network was only trained on the porous media with the sizes of 40^{3}, 48^{3}, and 56^{3}, it could successfully predict the permeability of the porous media with the sizes of 36^{3}, 44^{3}, 52^{3}, and 60^{3}, indicating its generalizability. Moreover, we demonstrated that the idea of implementing an adaptive max pooling (see Fig. 2), as an intuitive approach for connecting the FNO layers to the classifier, showed a lack of performance when predicting the permeability of porous media of varying sizes. Note that the adaptive max pooling operated in spatial spaces and that pooling had to be adaptive to handle input images with varying sizes.
As a future research direction, we aim to adapt the current architecture and extend its capabilities to image classification. In contrast to the problem of permeability prediction, this approach reduces the problem's dimensionality to two. Additionally, given that the standard dataset for image classification is usually large, we anticipate improved generalizability of the proposed framework. As another research direction, we would like to examine our deep learning framework using real data rather than synthetic data. We aim to expand our work to a variety of porous media, including biological tissues and fuel cells.
ACKNOWLEDGMENTS
Financial support by the ShellStanford collaborative project on digital rock physics was acknowledged. Additionally, the first author would like to thank Professor Gege Wen at Imperial College London for her helpful guidance and discussion on the software engineering aspects of this study. Furthermore, we are grateful to the reviewers for their valuable feedback.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Ali Kashefi: Conceptualization (lead); Formal analysis (lead); Methodology (lead); Software (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). Tapan Mukerji: Formal analysis (supporting); Funding acquisition (lead); Investigation (supporting); Methodology (supporting); Project administration (lead); Resources (lead); Supervision (lead); Writing – review & editing (supporting).
DATA AVAILABILITY
The Python code for the threedimensional problems is available on the following GitHub repository at https://github.com/AliStanford/FNOMultiSizedImages.