Gastrointestinal disorders are a class of prevalent disorders in the world. Capsule endoscopy is considered an effective diagnostic modality for diagnosing such gastrointestinal disorders, especially in small intestinal regions. The aim of this work is to leverage the potential of deep convolutional neural networks for automated classification of gastrointestinal abnormalities from capsule endoscopy images. This method developed a deep learning architecture, GastroNetV1, an automated classifier, to detect abnormalities in capsule endoscopy images. The gastrointestinal abnormalities considered are ulcerative colitis, polyps, and esophagitis. The curated dataset consists of 6000 images with “ground truth” labeling. The input image is automatically classified as ulcerative colitis, a polyp, esophagitis, or a normal condition by a web-based application designed with the trained algorithm. The classifier produced 99.2% validation accuracy, 99.3% specificity, 99.3% sensitivity, and 0.991 AUC. These results exceed that of the state-of-the-art systems. Hence, the GastroNetV1 could be used to identify the different gastrointestinal abnormalities in the capsule endoscopy images, which will, in turn, improve healthcare quality.

The disorders related to the digestive system are called gastrointestinal (GI) disorders. These disorders have an economic and social impact on society. The most common intestinal disorders are irritable bowel syndrome (IBS), GERD, ulcers, polyps, colorectal cancer (CRC), gastroenteritis, diverticular disease, celiac disease (CD), inflammatory bowel disease (IBD), gastric cancer, hemorrhoids, Crohn’s disease, and pancreatitis.1,2 Inflammation in the digestive tract results from the IBD called ulcerative colitis (UC). The condition starts in the rectum and progresses proximally and unbrokenly along the colon. Among the world’s developing regions, India has the most noteworthy prevalence frequency of 9.3 for IBD and 5.4 for ulcerative colitis per 100 000 persons. On the other hand, reports indicate that the occurrence rate of UC remains consistent between males and females from childhood through adulthood. Around 10% of patients with UC have a first-degree relative with the same disease, and it is more predominant than Crohn’s disease.3,4 Patients with longstanding or extensive UC have an increased chance of progressing to CRC.5 A gastric polyp is an unusual development of tissue projecting from the gastric mucosal layer. In general, the rate of polyps shows up to have expanded, as indicated by a higher prevalence in huge series.6 Evacuating the polyps diminishes the frequency and mortality due to colorectal cancer. Colonoscopy is the favored screening tool because it permits direct examination of the colorectal mucosa and evacuation of polyps with malignant potential.7 Esophagitis refers to inflammation or injury to the esophageal mucosa. The common cause is gastroesophageal reflux, which leads to erosive esophagitis. One-third of patients with esophagitis may have an ordinary appearance of the esophagus. Other findings include esophageal wrinkles, strictures, and mucosal rings (trachealization), and white medication-induced esophagitis has an estimated incidence of 3.9 per 100 000 populations per year with a mean age of 41.5 years at diagnosis.8 

Traditional imaging and endoscopic procedures aided clinicians to examine and diagnose the human GI tract. Basic radiology methods, such as barium swallow test and x-ray fluoroscopy, visualize and examine the upper gastrointestinal tract.9 Invasive and non-surgical procedures, such as upper endoscopy, gastroscopy, colonoscopy, and ultra-high magnification endocytoscopy, identify bleeding, density, ulcer severity, biopsy collection, and tumor identification.10 One problem with these methods is that they can cause a duodenal hematoma, infection, abdominal pain, or a tear in the colon wall during diagnosis. Balloon enteroscopy examines the colon polyps or areas of bleeding in the gastrointestinal (GI) tract. Some complications associated with this method are sedation, perforation, and a potent risk of ileus (transient slowing of the bowel).11 The disadvantage of conventional endoscopy methods, such as colonoscopy or esophagogastroduodenoscopy, is the inability to image and examine the small intestine, which can be avoided by employing a procedure called capsule endoscopy, in which the patient swallows a capsule that takes pictures of the whole GI tract. Capsule endoscopy includes limited preparation and no anesthesia, is a painless procedure, and is the better imaging diagnostic modality for diagnosing GI abnormalities.12 

The usage of artificial intelligence has greatly impacted the field of health care also. It serves as a tool for automated diagnosis. Machine learning and deep learning algorithms use huge medical data samples obtained using capsule endoscopy to train the net. The trained net detects clinical abnormalities in the GI tract automatically. The following are the objectives of the proposed work: (i) to develop a deep learning model for the automated classification of GI abnormalities from wireless capsule endoscopy (WCE) images, leveraging public domain databases, and (ii) to design a web-based application for aiding gastroenterologists by visualizing algorithm outputs and recommendations.

The remainder of this paper is organized in the following manner: Sec. II explains the start of the art research done in this field. Section III details the methodologies of the proposed work and materials used. Results with respect to the proposed work are presented in Sec. IV, and a detailed discussion by comparing the results with those of other similar studies is presented in Sec. V.

There are several studies done in this area of research for identifying pathologies from WCE images using state of art image processing methods, machine learning methods, and deep learning methods. This section describes in detail various methods studied. Wang et al.13 proposed a systematic evaluation and optimization method for detecting ulcers from wireless capsule endoscopy images through an automated means on a vast dataset collected from more than 30 hospitals and 100 medical examination centers using deep convolutional neural networks. The Second Glance (SecG) detection system diagnoses ulcers automatically. Barash et al.14 proposed a method to grade the severity of ulcers in video-capsule images of Crohn’s disease patients using an ordinal neural network solution. A deep learning algorithm automates the grading, and PillCam Crohn’s Capsule (PCC) Medtronic dataset was used. Saito et al.15 developed and tested a convolutional neural network to automatically detect protruding lesions of various types from wireless capsule endoscopy images. The data were collected using PillCam SB2 and were analyzed using STATA. They constructed an AI system using SSD. The stochastic gradient descent method fine-tunes all layers of the CNN. Ghosh and Chakareski16 developed a computer-aided diagnostic (CAD) tool for automated analysis of small intestinal abnormalities, such as bleeding. Convolutional layers used the VGG16 network, and a softmax classifier used the decoder output feature maps for pixel-wise classification. Yuan et al.17 proposed a two-stage, fully automated computer-aided detection system to detect ulcers from WCE images by saliency max-pooling method along with Locality-constrained Linear Coding (LLC) method. In this method, Support Vector Machine (SVM) classified 170 ulcer images and 170 normal images.

Raut et al.18 developed a segmentation and classification algorithm for the automated diagnosis of gastrointestinal tract disorders using capsule endoscopy images. The DeepLapV3+ algorithm segments the gastrointestinal tract, and the LeNet algorithm classifies abnormalities in the segmented image. Ribeiro et al.19 developed a deep learning model for small bowel disorders’ classification using capsule endoscopy images. The task is to classify the capsule endoscopy image as excellently visible (>90%), satisfactory (50–80), and unsatisfactory (<50%). The model was trained using 12 950 images obtained from clinical centers in Portugal.

Chung et al.20 used a no-code platform to develop a deep learning model for classifying gastrointestinal abnormalities using capsule endoscopy images. The abnormalities considered for this task were blood, inflamed, vascular, and polypoid conditions. No-code development of the deep learning classifier used the Neuro-T platform. Around 37 307 images from 24 capsule endoscopy videos were used to train the developed deep learning model. Mascarenhas et al.21 developed a convolutional neural network for detecting blood and mucosal lesions occurring in the colon using capsule endoscopy images. The CNN model was trained on 9005 images, where 3075 were normal, 3115 were blood, and 2815 were lesions obtained from 124 examinations. The CNN model is a pre-trained Xception network using a transfer learning technique. The useful works done on other images apart from gastrointestinal images are considered in our study for the purpose of correlation of the techniques employed. Wang et al.22 proposed EfficientNet-based U-Net for segmentation of fundus images. The contributions in Gopatoti and Vijayalakshmi,23 with respect to chest x-ray image classification, are also valid.

The proposed workflow of the automated diagnosis of GI abnormalities from capsule endoscopy images using CNN is shown in Fig. 1. The image provided by the user undergoes an appropriate image processing technique. The CNN trained on capsule endoscopy images will automatically classify the processed image into one of the four classes (normal, ulcerative colitis, polyps, and esophagitis).

FIG. 1.

Pictorial representation of the designated workflow.

FIG. 1.

Pictorial representation of the designated workflow.

Close modal

In this work, we propose a deep learning model, i.e., GastroNetV1, which is based on EfficientNetV2B2 architecture. Figure 2 represents the detailed architecture diagram of the proposed model used in this work.

FIG. 2.

Architecture diagram of GastroNetV1.

FIG. 2.

Architecture diagram of GastroNetV1.

Close modal

The EfficientNet scaling method uses fixed scaling coefficients to adjust the network’s width, depth, and resolution. The input of one convolution layer is fed as input to another convolution layer appearing later in the network.24 This connection eliminated the problem of data morphing, and deep CNNs came into existence using this concept, and it overcomes the expense of convolution operation. It consumes less time than the standard convolution and is efficient, with no nonlinearities introduced after the convolution operation. EfficientNetV2B2, which belongs to the EfficientNet family, is a convolutional neural network architecture. In addition to EfficientNetV2B2, several standard CNN architectures, such as VGG16, VGG19, and ResNet101, from the EfficientNet family are considered for comparison. The proposed architecture (GastroNetV1) combines certain layers and the existing EfficientNetB2 backbone with enhancements. The addition of these layers improves the performance of the EfficientNetB2 algorithm. The additional layers consist of convolution layers for generating heat maps (with the last convolution layer computing the gradients), global average pooling (which is a superior alternative to the flatten layer), a dense layer with 256 neurons using the ReLu activation function, batch normalization, a Gaussian noise layer with a standard deviation of 0.15, and a dropout layer with a drop rate of 0.15. The addition of these layers provides robustness and avoids overfitting.

The goal of the filtering action is to cancel noise while preserving the integrity of edge and detail information; nonlinear approaches generally provide more satisfactory results than linear techniques.25 Batch Normalization (BN) and its variations have successfully combatted the covariate shift induced by the training step of deep learning methods. Whereas these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions.26 The dropout approach randomly shuts down the feature detectors during the training phase. It enables the network to become invariant to the “noisy” spectral components by randomly selecting only the most important basis vectors for signal reconstruction during the spectral dropout regularization process.27 

The proposed work uses the Kvasir public dataset for wireless capsule endoscopy images curated from the source.28 The dataset includes polyps, ulcerative colitis, esophagitis, and normal conditions. The dataset (Table I) consists of 6000 images for `ground truth', showing anatomical landmarks, pathological findings, or endoscopic procedures in the GI tract.29,30 The anatomical landmarks are Z-line, pylorus, and cecum, while the pathological findings include esophagitis, polyps, and ulcerative colitis.

TABLE I.

The WCE image dataset used in this study.

Class No.Class nameTotal images (before augmentation)Total images (after augmentation)
Normal 1500 3000 
Ulcerative colitis 1500 3000 
Polyps 1500 3000 
Esophagitis 1500 3000 
 Total 6000 12000 
Class No.Class nameTotal images (before augmentation)Total images (after augmentation)
Normal 1500 3000 
Ulcerative colitis 1500 3000 
Polyps 1500 3000 
Esophagitis 1500 3000 
 Total 6000 12000 

Data augmentation is an important operation in developing an image classifier. This method is helpful, especially in the medical field, where collecting large volumes of data becomes tedious. Data augmentation is an approach to improve the strength of the dataset by applying specified transformations on the existing dataset to generate new data. In this work, horizontal flipping was applied on the capsule endoscopy images, resulting in 12 000 images.

In this work, we used the cross-entropy loss function and Adam optimizer. The cross-entropy loss function, also known as log loss, relates the probabilities of the ground truth to that of the model prediction.31 Adam is one of the most popular choices of optimizers used by deep learning researchers. Adam stands for adaptive estimation. This optimizer can greatly aid in the process of classification.32 The model used early stopping to find the convergence while training. It will stop the training when there is no improvement in the performance metrics of the trained model. Based on the equation, all the models performed 25 epochs with 225 steps per epoch with a batch size of 16.

The evaluation of the classification process uses sensitivity, specificity, accuracy, AUC, and loss. In data analysis, we use the following parameters: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). These classification values are useful in calculating various metrics. TP is the count of ground truth and model predictions, being positive. FP is the count of ground truth that is negative, but the model prediction is positive. TN is the count of the model prediction and ground truth that is negative, and FN is the count of the ground truth that is positive, but the prediction is negative.

The physician cannot access the integrated development environment (IDE) for the diagnosis. Consequently, a tool that can assist a physician in the diagnosis is required. We have created a web-based application that automatically classifies capsule endoscopy images into one of the following classes: normal, ulcers, polyps, and esophagitis. This web application uses Python’s streamlit library. The web application classifies the input capsule endoscopy images automatically, assisting the physician in diagnosing the illness. Figure 3 represents the user interface design of the developed web application.

FIG. 3.

Design of the interactive web-based user interface.

FIG. 3.

Design of the interactive web-based user interface.

Close modal

The process utilized the Python language (version 3.6.7) and the Keras module from TensorFlow (version 2.8.0) as the backend tools to implement it. It ran on Google Collab IDE with 8 GB RAM and 12 GB NVIDIA K80 GPU. The classification process employed the Keras module from TensorFlow. We obtained pre-trained versions of the models from a TensorFlow Keras application package and appended additional layers to these pre-trained models to obtain the final model.

The implementation and results obtained for the proposed GastroNetV1, which is a CNN based system for detection of abnormalities in GI tract from WCE images, are detailed in this section.

The first step of the proposed work is training the CNN based GastroNetV1 model using the WCE data described in Table I. There are four classes (normal, ulcerative colitis, polyps, and esophagitis) considered in the dataset, and each class consists of 1500 WCE images before augmentation. Then, we applied image augmentation techniques described in the methodology section to enrich the data size. The training–validation split ratio of 0.86 is considered for the model development (Table II). The model underwent training on the Google Colab platform. The image sets considered for training and validation are organized into different directories with a batch size of 16.

TABLE II.

Training and validation datasets.

ClassTotal number of images used for trainingTotal number of images used for validation
Normal (0) 2600 400 
Ulcerative colitis (1) 2600 400 
Polyps (2) 2600 400 
Esophagitis (3) 2600 400 
ClassTotal number of images used for trainingTotal number of images used for validation
Normal (0) 2600 400 
Ulcerative colitis (1) 2600 400 
Polyps (2) 2600 400 
Esophagitis (3) 2600 400 

The default learning rate for the Adam optimizer was set at 1 × 10−3. The model was trained for 25 epochs using the training images and evaluated on the training and validation image datasets. Obtaining the performance plots for the different classification metrics, we generated and hosted in the front-end design for the web app. Table III represents the comparative analysis of the performance metrics for the several CNN backbones trained on the capsule endoscopy images for training and validation, respectively. The best results are in boldfaced.

TABLE III.

Comparison of evaluation metrics of various CNN architectures used for training and validation. The best results are emphasized in bold.

CNN backboneAccuracyAUCPrecisionRecall
Training 
VGG19 0.967 0.998 0.969 0.966 
ResNet50 0.983 0.999 0.983 0.983 
ResNet101 0.980 0.999 0.981 0.980 
EfficientNetV2B1 0.984 0.999 0.984 0.983 
Proposed GastroNetV1 0.987 0.999 0.987 0.987 
Validation 
VGG19 0.963 0.999 0.964 0.961 
ResNet50 0.980 0.999 0.981 0.980 
ResNet101 0.976 0.999 0.978 0.976 
EfficientNetV2B1 0.985 0.999 0.985 0.985 
Proposed GastroNetV1 0.992 0.991 0.993 0.993 
CNN backboneAccuracyAUCPrecisionRecall
Training 
VGG19 0.967 0.998 0.969 0.966 
ResNet50 0.983 0.999 0.983 0.983 
ResNet101 0.980 0.999 0.981 0.980 
EfficientNetV2B1 0.984 0.999 0.984 0.983 
Proposed GastroNetV1 0.987 0.999 0.987 0.987 
Validation 
VGG19 0.963 0.999 0.964 0.961 
ResNet50 0.980 0.999 0.981 0.980 
ResNet101 0.976 0.999 0.978 0.976 
EfficientNetV2B1 0.985 0.999 0.985 0.985 
Proposed GastroNetV1 0.992 0.991 0.993 0.993 

The GastroNetV1 was trained for 45 epochs using the Adam optimizer and cross-entropy loss function. It was determined to be the best model based on its performance in the training and validation datasets. Figure 4 represents the performance plots for the proposed model (GastroNetV1). Table IV compares the proposed model’s accuracy values with existing models.

FIG. 4.

Performance plots. (a) Accuracy, (b) precision, (c) recall, (d) AUC, and (e) error plot of the proposed model for training and validation datasets.

FIG. 4.

Performance plots. (a) Accuracy, (b) precision, (c) recall, (d) AUC, and (e) error plot of the proposed model for training and validation datasets.

Close modal
TABLE IV.

Comparison of the validation accuracy of standard works with that of our work highlighted in boldface.

Similar research studiesArchitecture usedAccuracy (%)
Li & Meng33  MLP 88.0 
Wang et al.34  CNN(ResNet34) 90.1 
Raut et al.18  DeepNetV3+LeNet 99.1 
Ghosh et al.16  CNN (AlexNet and SegNet) 94.4 
Tsuboi et al.35  CNN 95.0 
Hassan et al.36  SVM 99.0 
Ribeiro et al.19  CNN 92.1 
Chung et al.20  CNN 98.0 
Proposed GastroNetV1 CNN (hybrid) 99.2 
Similar research studiesArchitecture usedAccuracy (%)
Li & Meng33  MLP 88.0 
Wang et al.34  CNN(ResNet34) 90.1 
Raut et al.18  DeepNetV3+LeNet 99.1 
Ghosh et al.16  CNN (AlexNet and SegNet) 94.4 
Tsuboi et al.35  CNN 95.0 
Hassan et al.36  SVM 99.0 
Ribeiro et al.19  CNN 92.1 
Chung et al.20  CNN 98.0 
Proposed GastroNetV1 CNN (hybrid) 99.2 

Machine learning based recognition of medical images reduced the workload and also attained specific accuracy.37–41 CT reconstruction for endoscopic tasks and other medical examinations, specially using machine learning model, are getting attraction.42–44 

In the subsequent step, we developed a web-based system with user-friendly interfaces with an option to browse and select WCE images from the local system or any other storage media either locally or on the cloud server. The system then infers the input image with the developed model (GastroNetV1) and displays the predicted class output. There is also an option available for the patients to know about the pathology (basic understanding) predicted by the system.

Figure 5 represents the input WCE image and the corresponding Grad-CAM (gradient class activation mapping) heat map predicted and generated by the proposed GastroNetV1 for the respective intestinal disorder with the classified outcome. The heat map varies from blue to red, where blue means “lowest focus” and red means “highest focus.” The developed web-based system also emphasizes descriptive suggestions about the predicted pathology.

FIG. 5.

Input image and the corresponding heat map given by the GastroNetV1: (a) normal (without any pathology), (b) polyp, (c) ulcerative colitis, and (d) esophagitis.

FIG. 5.

Input image and the corresponding heat map given by the GastroNetV1: (a) normal (without any pathology), (b) polyp, (c) ulcerative colitis, and (d) esophagitis.

Close modal

The WCE procedure is one of the important modes for screening small intestinal region of the digestive tract. The procedure may produce more than 100 thousand images, which requires a significant effort from the GI specialist to manually analyze and identify pathologies. Hence, the proposed research aimed to develop a CNN based model (GastroNetV1) for identifying problematic images and present those to the GI specialist to make the quick decision.

The proposed work demonstrates successful training on WCE images, and the model has performed well on the training and validation sets without signs of overfitting (Table III). The proposed method surpasses the performance of current state-of-the-art algorithms. The proposed GastroNetV1 model produced an accuracy, precession, and recall of 99.2%, 99.3%, and 99.3%, respectively (Table III). Similar results have been achieved by other researchers: Li and Meng developed an MLP neural network for detecting abnormal regions in WCE images and produced a specificity of 87.8%.33 Raut et al. developed a DeepLapV3+ algorithm that segments the gastrointestinal tract, and the LeNet algorithm classifies abnormalities from the segmented image, producing 99.1% accuracy, 98.8% precision, and 99.1% recall.18 

Ribeiro et al. developed a deep learning model to classify WCE images as excellently visible (>90%), satisfactory (50–80), and unsatisfactory (<50%), which was trained on 12 950 images obtained from clinical centers in Portugal.19 Chung et al. used a no-code platform to develop a deep-learning model for classifying GI abnormalities, such as blood, inflamed, vascular, and polypoid conditions. The model was trained on 37 307 images from 24 capsule endoscopy videos and produced an accuracy of 98%, a precision of 89%, and a recall of 97%.20 Several CNN based models have been developed by various research groups for detecting various abnormalities in the GI tract from WCE images: Mascarenhas et al. proposed an algorithm for detecting blood and mucosal lesions occurring in the colon,21 Tsuboi et al. developed a model for extracting specific features and quantities for distinguishing the location of angioectasia correctly.35 The segmentation model proposed by Li et al. for CT images used the depth path module, which is a necessary approach in our study.45 Alaskar et al. developed a pre-trained neural network GoogLeNet to recognize ulcer images by adding four new layers to the standard architecture, namely, a dropout layer of 0.5 probability of dropout, a fully connected layer, a softmax layer, and a classification-output layer. In the experiments, a total of 144 layers were used to build GoogLeNet. The developed model achieved an accuracy of 85%.46 The accuracy of the developed model on the validation dataset (99.15%) is better than that of this work.

Hassan and Haque developed a texture-feature-descriptor-based algorithm that operates on the Normalized Gray Level Co-occurrence Matrix (NGLCM) of the magnitude spectrum of the images to detect bleeding and non-bleeding areas. Normalized Gray Level Co-occurrence Matrix (NGLCM) matrix was constructed to extract features from each log-transformed magnitude spectrum. They used linear SVM, polynomial and Radial Basis Function (RBF) kernels, and SVM classifiers. Among these, linear SVM produced the highest specificity of 98.95%.36 The specificity produced by the proposed work (99.25%) is better than that of the work of Hassan and Haque.

Ghosh et al. developed computer-aided diagnostic (CAD) tools for automated analysis of detecting small intestinal abnormalities, such as bleeding. A VGG16 architecture is used as first 13 convolutional layers, and a softmax classifier is fed by the decoder output feature maps for pixel-wise classification. The first 22 layers of our pre-trained AlexNet architecture are used, and the last three layers are designed according to our application. One fully connected layer is introduced, which has two categories (bleeding and non-bleeding) and thus contains two fully connected neurons. In addition, one softmax and one classification output layer are included. This network is trained and tested on a single NVIDIA GeForce GTX 1070. In the SegNet architecture, a VGG16 network is used as the first 13 convolutional layers and five max-pooling and five upsampling layers are used. Finally, a softmax classifier is fed by the decoder output feature maps for pixel-wise classification. Their(Ghosh et al) method produced a 98.49% F1-score and a 97.51% sensitivity.16 Similarly, Padmavathi et al proposed a classification method based on LeNet-5, which achieved a 99.12% accuracy using the publicly available Kvasir-V2 dataset.38 However, the proposed work outperformed their results by achieving a sensitivity of 99.25%.

Wang et al. proposed a method called Second Glance (SecG) detection framework for automatic detection of ulcers using the deep convolutional network. The architecture used was RetinaNet and also had a second glance refinement stage. The refinement stage is built by the ResNet34 and ResNet50 backbones. By utilizing two CNN backbones and two anchor settings, we can effectively utilize four distinct primary detectors: RT34 and RT50, which signify RetinaNet with ResNet34 and ResNet50, respectively. A1 and A2 refer to anchor settings. This method produced a specificity of 90.48%.34 Overall, when compared to the performances of the other researcher’s results, the proposed GastroNetV1 model demonstrated a superior performance in detecting the GI abnormalities (Table IV). Similarly, in our earlier study, we developed a U-Net-based polyp segmentation model from colonoscopy images, which produced an accuracy of 97.1%.47 

An interactive web-based user interface has been developed for deploying the proposed model (GastroNetV1), as shown in Fig. 3. The model will be inferred with the image selected, and the predicted result will be presented on the display. This feature will be easier for the GI specialist to find the problematic frames/images in the WCE video of the given human subject.

The proposed model was trained on only severe gastrointestinal abnormalities, such as esophagitis, polyps, and ulcers, along with normal class data. There are other features available in the user interface: display viewports for displaying input image and the predicted pseudo-color map overlaid on the input image in a separate viewport; a text box to show the predicted pathology; and another text box to show the description of the pathology or basic details about the pathology. These will help the GI specialist analyze the data much faster and create the report effectively. Using this tool, one can automatically detect the disease or disorder in the GI tract in a cost-effective manner. The results produced are shown in Fig. 5 for the test data given to the system. Table III and Table IV present the results obtained while building the GastroNetV1 model and the validation process. However, the work has a limitation. The number of pathologies considered for model building is limited to only three. In order to create a more effective model, more pathologies and more datasets with variations need to be considered. Furthermore, we can extract more information from the WCE images and enhance the process with an improved accuracy. Some of the future research directions would be combining the proposed model with image enhancement algorithms and convolutional neural networks to improve the accuracy and reduce the misclassification rate to develop a better classification tool.

A CNN based model, GastroNetV1, was successfully developed to classify four classes of GI tract disorders (normal, ulcerative colitis, polyps, and esophagitis) using WCE data obtained from the public sources, and an accuracy of more than 99% was obtained. Furthermore, an interactive web-based user interface has been developed to infer the GastroNetV1 model with the selected input data through the user interface to obtain the predicted outcome in visual form with the associated details in a consolidated manner. Hence, the trained GastroNetV1 model will be helpful for the gastroenterologist to make diagnostic decisions much faster and in a cost-effective manner.

The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through Large Research Project under Grant No. RGP2/179/45.

The authors have no conflicts to disclose.

S. Rajkumar: Investigation (equal); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal). C.S. Harini: Resources (equal); Software (equal); Validation (equal); Visualization (equal); Writing – review & editing (equal). Jayant Giri: Project administration (equal); Resources (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal). V.A. Sairam: Data curation (equal); Formal analysis (equal); Methodology (equal); Resources (equal); Writing – review & editing (equal). Naim Ahmad: Data curation (equal); Formal analysis (equal); Resources (equal); Software (equal); Supervision (equal); Writing – review & editing (equal). Ahmed Said Badawy: Resources (equal); Software (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal). G.K. Krithika: Project administration (equal); Resources (equal); Software (equal); Supervision (equal). P. Dhanusha: Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Validation (equal). G.E Chandrasekar: Formal analysis (equal); Project administration (equal); Validation (equal); Visualization (equal). V. Sapthagirivasan: Resources (equal); Software (equal); Supervision (equal); Validation (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request..

1
National Laboratory of Medicine https://medlineplus.gov/ency/article/007447.htm.
2
S.
Fukudo
,
T.
Okumura
,
M.
Inamori
,
Y.
Okuyama
,
M.
Kanazawa
,
T.
Kamiya
,
K.
Sato
et al, “
Evidence-based clinical practice guidelines for irritable bowel syndrome 2020
,”
J. Gastroenterol.
56
(
3
),
193
217
(
2021
).
3
T.
Kobayashi
,
B.
Siegmund
,
C.
Le Berre
et al, “
Ulcerative colitis
,”
Nat. Rev. Dis. primers
6
(
1
),
74
(
2020
).
4
D. C.
Baumgart
and
W. J.
Sandborn
, “
Crohn’s disease
,”
The Lancet
380
(
9853
),
1590
1605
(
2012
).
5
L.
Du
and
C.
Ha
, “
Epidemiology and pathogenesis of ulcerative colitis
,”
Gastroenterol. Clin. North America
49
(
4
),
643
654
(
2020
).
6
F. A.
Angarita
,
A. E.
Feinberg
,
S. M.
Feinberg
et al, “
Management of complex polyps of the colon and rectum
,”
Int. J. Colorectal Dis.
33
(
2
),
115
129
(
2018
).
7
S. G.
Vitale
,
S.
Haimovich
,
Laganà
et al, “
Endometrial polyps. An evidence-based diagnosis and management guide
,”
Eur. J. Obstet. Gynecol. Reprod. Biol.
260
,
70
77
(
2021
).
8
C.
Antunes
and
A.
Sharma
, “
Esophagitis
,” in
StatPearls [Internet]
(
StatPearls Publishing
,
2021
).
9
M. P.
Hiorns
, “
Gastrointestinal tract imaging in children: Current techniques
,”
Pediatr. Radiol.
41
(
1
),
42
54
(
2011
).
10
R.
Singh
,
D.
Sathananthan
,
W.
Tam
,
A.
Ruszkiewicz
, “
Endocytoscopy for diagnosis of gastrointestinal neoplasia: The expert’s approach
,”
Video J. Encycl. of GI Endosc.
1
(
1
),
18
19
(
2013
).
11
P. B. F.
Mensink
,
J.
Haringsma
,
T.
Kucharzik
et al, “
Complications of double balloon enteroscopy: A multicenter survey
,”
Endoscopy
39
(
07
),
613
615
(
2007
).
12
T.
Nakamura
and
A.
&Terano
, “
Capsule endoscopy: Past, present, and future
,”
J. Gastroenterol.
43
(
2
),
93
99
(
2008
).
13
S.
Wang
,
Y.
Xing
,
L.
Zhang
et al, “
A systematic evaluation and optimization of automatic detection of ulcers in wireless capsule endoscopy on a large dataset using deep convolutional neural networks
,”
Phys. Med. Biol.
64
(
23
),
235014
(
2019
).
14
Y.
Barash
,
L.
Azaria
,
S.
Soffer
et al, “
Ulcer severity grading in video capsule images of patients with Crohn’s disease: An ordinal neural network solution
,”
Gastrointest. Endosc.
93
(
1
),
187
192
(
2021
).
15
H.
Saito
,
T.
Aoki
,
K.
Aoyama
et al, “
Automatic detection and classification of protruding lesions in wireless capsule endoscopy images based on a deep convolutional neural network
,”
Gastrointest. Endos.
92
(
1
),
144
151.e1
(
2020
).
16
T.
Ghosh
and
J.
Chakareski
, “
Deep transfer learning for automated intestinal bleeding detection in capsule endoscopy imaging
,”
J. Digital Imaging
34
(
2
),
404
417
(
2021
).
17
Y.
Yuan
,
J.
Wang
,
B.
Li
, and
M. Q. H.
Meng
, “
Saliency based ulcer detection for wireless capsule endoscopy diagnosis
,”
IEEE Trans. Med. Imaging
34
(
10
),
2046
2057
(
2015
).
18
V.
Raut
,
R.
Gunjan
,
V. V.
Shete
, and
U. D.
Eknath
, “
Gastrointestinal tract disease segmentation and classification in wireless capsule endoscopy using intelligent deep learning model
,”
Comput. Methods Biomech. Biomed. Eng.: Imaging Visualization
11
(
3
),
606
622
(
2023
).
19
T.
Ribeiro
,
M. J.
Mascarenhas Saraiva
,
J.
Afonso
,
P.
Cardoso
,
F.
Mendes
,
M.
Martins
,
A. P.
Andrade
,
H.
Cardoso
,
M.
Mascarenhas Saraiva
,
J.
Ferreira
, and
G.
Macedo
, “
Design of a convolutional neural network as a deep learning tool for the automatic classification of small-bowel cleansing in capsule endoscopy
,”
Medicina
59
(
4
),
810
(
2023
).
20
J.
Chung
,
D. J.
Oh
,
J.
Park
,
S. H.
Kim
, and
Y. J.
Lim
, “
Automatic classification of GI organs in wireless capsule endoscopy using a No-code platform-based deep learning model
,”
Diagnostics
13
(
8
),
1389
(
2023
).
21
M.
Mascarenhas
,
T.
Ribeiro
,
J.
Afonso
,
J. P.
Ferreira
,
H.
Cardoso
,
P.
Andrade
,
M. P.
Parente
,
R. N.
Jorge
,
M.
Mascarenhas Saraiva
, and
G.
Macedo
, “
Deep learning and colon capsule endoscopy: Automatic detection of blood and colonic mucosal lesions using a convolutional neural network
,”
Endosc. Int. Open
10
(
02
),
E171
E177
(
2022
).
22
J.
Wang
,
X.
Li
, and
Y.
Cheng
, “
Towards an extended EfficientNet-based U-Net framework for joint optic disc and cup segmentation in the fundus image
,”
Biomed. Signal Process. Control
85
,
104906
(
2023
).
23
A.
Gopatoti
and
P.
Vijayalakshmi
, “
MTMC-AUR2CNet: Multi-textural multi-class attention recurrent residual convolutional neural network for COVID-19 classification using chest X-ray images
,”
Biomed. Signal Process. Control
85
,
104857
(
2023
).
24
M.
Tan
and
Q.
Le
, “
EfficientNet: Rethinking model scaling for convolutional neural network4
,” in
International Conference on Machine Learning
(
PMLR
,
2019
), pp.
6105
6114
.
25
F.
Russo
, “
A method for estimation and filtering of Gaussian noise in images
,”
IEEE Trans. Instrum. Meas.
52
(
4
),
1148
1154
(
2003
).
26
M.
Lu
,
Q.
Zhao
,
J.
Zhang
,
K. M.
Pohl
,
L.
Fei-Fei
,
J. C.
Niebles
, and
E.
Adeli
, “
Metadata normalization
,” in
Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2021
), pp.
10912
10922
.
27
S. H.
Khan
,
M.
Hayat
, and
F.
Porikli
, “
Regularization of deep neural networks with spectral dropout
,”
Neural Networks
110
,
82
90
(
2019
).
28
The KVASIR-Capsule Dataset https://osf.io/dv2ag/.
29
K.
Pogorelov
,
K. R.
Randel
,
C.
Griwodz
et al, “
KVASIR: A multi-class image dataset for computer-aided gastrointestinal disease detection,” in
MMSys'17:
Proceedings of the 8th ACM on Multimedia Systems Conference
(
ACM Digital Library
,
2017
), pp.
164
169
.
30
See
https://www.kaggle.com/datasets/francismon/curated-colon-dataset-for-deep-learning
for more information about “Curated WCE Colon Disease Dataset for Deep Learning Algorithms.
31
S.
Rajaraman
,
G.
Zamzmi
, and
S. K.
&Antani
, “
Novel loss functions for ensemble-based medical image classification
,”
PLoS One
16
(
12
),
e0261307
(
2021
).
32
S.
Bock
,
J.
Goppold
, and
M.
Weiß
, “
An improvement of the convergence proof of the ADAM-Optimizer
,” arXiv:1804.10587 (
2018
).
33
B.
Li
and
M. Q. H.
Meng
, “
Computer-based detection of bleeding and ulcer in wireless capsule endoscopy images by chromaticity moments
,”
Comput. Biol. Med.
39
(
2
),
141
147
(
2009
).
34
S.
Wang
,
Y.
Xing
,
L.
Zhang
,
H.
Gao
, and
H.
Zhang
, “
Deep convolutional neural network for ulcer recognition in wireless capsule endoscopy: Experimental feasibility and optimization
,”
Comput. Math. Methods Med.
2019
,
1
.
35
A.
Tsuboi
,
S.
Oka
,
K.
Aoyama
et al, “
Artificial intelligence using a convolutional neural network for automatic detection of small-bowel angioectasia in capsule endoscopy images
,”
Dig. Endosc.
32
(
3
),
382
390
(
2020
).
36
A. R.
Hassan
and
M. A.
Haque
, “
Computer-aided gastrointestinal hemorrhage detection in wireless capsule endoscopy videos
,”
Comput. Methods Programs Biomed.
122
(
3
),
341
353
(
2015
).
37
Z.
Jiang
,
X.
Han
,
C.
Zhao
,
S.
Wang
, and
X.
Tang
, “
Recent advance in biological responsive nanomaterials for biosensing and molecular imaging application
,”
Int. J. Mol. Sci.
23
(
3
),
1923
(
2022
).
38
B.
He
,
Q.
Lu
,
J.
Lang
,
H.
Yu
,
C.
Peng
,
P.
Bing
,
S.
Li
,
Q.
Zhou
,
Y.
Liang
, and
G.
Tian
, “
A new method for CTC images recognition based on machine learning
,”
Front. Bioeng. Biotechnol.
8
,
897
(
2020
).
39
T.
Sun
,
J.
Lv
,
X.
Zhao
,
W.
Li
,
Z.
Zhang
, and
L.
Nie
, “
In vivo liver function reserve assessments in alcoholic liver disease by scalable photoacoustic imaging
,”
Photoacoustics
34
,
100569
(
2023
).
40
C.
Zhang
,
H.
Ge
,
S.
Zhang
,
D.
Liu
,
Z.
Jiang
,
C.
Lan
,
L.
Li
,
H.
Feng
, and
R.
Hu
, “
Hematoma evacuation via image-guided para-corticospinal tract approach in patients with spontaneous intracerebral hemorrhage
,”
Neurol. Therapy
10
(
2
),
1001
1013
(
2021
).
41
J.
Hu
,
T.
Xu
,
H.
Shen
,
Y.
Song
,
J.
Yang
,
A.
Zhang
,
H.
Ding
,
N.
Xing
,
Z.
Li
,
L.
Qiu
,
L.
Ma
,
Y.
Yang
,
Z.
Feng
,
Z.
Du
,
W.
He
,
Y.
Sun
,
J.
Cai
,
Q.
Li
,
Y.
Chen
,
S.
Yang
,
M.
Mei
,
S.
Luo
,
K.
Liao
,
Y.
Zhang
,
Y.
He
,
Y.
He
,
B.
Peng
,
M.
Xiao
, and
P. A. S. C.
Chongqing
, “
Accuracy of gallium-68 pentixafor positron emission tomography–computed tomography for subtyping diagnosis of primary aldosteronism
,”
JAMA Network Open
6
(
2
),
e2255609
(
2023
).
42
H.
Mo
,
X.
Li
,
B.
Ouyang
,
G.
Fang
, and
Y.
Jia
, “
Task autonomy of a flexible endoscopic system for laser-assisted surgery
,”
Cyborg Bionic Syst.
2022
,
9759504
.
43
C.
Yang
,
D.
Sheng
,
B.
Yang
,
W.
Zheng
, and
C.
Liu
, “
A dual-domain diffusion model for sparse-view CT reconstruction
,”
IEEE Signal Process. Lett.
31
,
1279
(
2024
).
44
L.
Yin
,
L.
Wang
,
S.
Lu
,
R.
Wang
,
Y.
Yang
,
B.
Yang
,
S.
Liu
,
A.
AlSanad
,
S. A.
AlQahtani
,
Z.
Yin
,
X.
Li
,
X.
Chen
, and
W.
Zheng
, “
Convolution-transformer for image feature extraction
,”
Comput. Model. Eng. Sci.
0
,
1
(
2024
).
45
W.
Li
,
Y.
Cao
,
S.
Wang
, and
B.
Wan
, “
Fully feature fusion based neural network for COVID-19 lesion segmentation in CT images
,”
Biomed. Signal Process. Control
86
,
104939
(
2023
).
46
H.
Alaskar
,
A.
Hussain
,
N.
Al-Aseem
, et al, “
Application of convolutional neural networks for automated ulcer detection in wireless capsule endoscopy images
,”
Sensors
19
(
6
),
1265
(
2019
).
47
R.
Sadagopan
,
S.
Ravi
,
S. V.
Adithya
, and
S.
Vivekanandhan
, “
PolyEffNetV1: A CNN based colorectal polyp detection in colonoscopy images
,”
Proc. Inst. Mech. Eng., Part H
237
(
3
),
406
418
(
2023
).
48
P.
Padmavathi
,
J.
Harikiran
, and
J.
Vijaya
, “
Effective deep learning based segmentation and classification in wireless capsule endoscopy images
,”
Multimed Tools Appl.
82
(
30
),
47109
47133
(
2023
).