The image captioning task is regarded as an active problem in computer vision research. The focal aim of this task is to generate accurate descriptions for the content of an input image. The state-of-the-art automatic image captioningsystems typically rely on the Encoder-Decoder architecture and only a few among which exploit the features resulting fromthe object detection task in order to maximize the accuracy of captions generation. In this work we introduce a new attentionguided Encoder-Decoder based captioning approach that utilizes two types of features: a) deep visual features extracted from EfficientNetV2 pre-trained model on ImageNet dataset, and b) object features extracted from YOLOv7 pre-trained model on MSCOCO dataset. Additionally, we compute a new object features-driven feature called Priority Factor as a utility to rank objects based on their prominence in input images. The proposed approach is evaluated using a famous dataset in computer vision (MSCOCO). The empirical performance of our approach is measured using eight metrics, and the results elaborate the effectiveness of adding our schema (Priority Factor) to object features, leading to minor improvements in evaluation metrics BLEU-1, BLEU-2, BLEU-3, and BLEU-4. This approach outperforms four state of the art approaches in various evaluation metrics such as BLEU-1, BLEU-2, BLEU-3, and SPICE.
Skip Nav Destination
,
Article navigation
10 February 2025
THE SECOND INTERNATIONAL CONFERENCE ON SCIENTIFIC RESEARCH AND INNOVATION 2023 (2ICSRI2023)
25–26 August 2023
Cincinnati, USA
Research Article|
February 10 2025
A deep learning based approach for image captioning: Exploiting priority factor to enhance accuracy and relevance
Ali S. Haleem;
Ali S. Haleem
a)
Software Department Information Technology, University of Babylon
, Hilla, Iraq
a)Corresponding author: [email protected]
Search for other works by this author on:
Israa H. Ali
Israa H. Ali
b)
Software Department Information Technology, University of Babylon
, Hilla, Iraq
Search for other works by this author on:
Ali S. Haleem
a)
Israa H. Ali
b)
Software Department Information Technology, University of Babylon
, Hilla, Iraq
a)Corresponding author: [email protected]
AIP Conf. Proc. 3169, 030019 (2025)
Citation
Ali S. Haleem, Israa H. Ali; A deep learning based approach for image captioning: Exploiting priority factor to enhance accuracy and relevance. AIP Conf. Proc. 10 February 2025; 3169 (1): 030019. https://doi.org/10.1063/5.0254437
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
12
Views
Citing articles via
The implementation of reflective assessment using Gibbs’ reflective cycle in assessing students’ writing skill
Lala Nurlatifah, Pupung Purnawarman, et al.
Inkjet- and flextrail-printing of silicon polymer-based inks for local passivating contacts
Zohreh Kiaee, Andreas Lösel, et al.
Effect of coupling agent type on the self-cleaning and anti-reflective behaviour of advance nanocoating for PV panels application
Taha Tareq Mohammed, Hadia Kadhim Judran, et al.
Related Content
Enhancing image captioning performance based on efficientnet B0 model and transformer encoder-decoder
AIP Conf. Proc. (March 2024)
Image caption generation in Hindi language with attention mechanism
AIP Conf. Proc. (June 2023)
Image caption generator using CNN & LSTM
AIP Conf. Proc. (September 2023)
Scene recognition and image caption generation
AIP Conf. Proc. (September 2023)
Deep learning-based auto generated image captions
AIP Conf. Proc. (April 2025)