An objective function is one strategy to achieve a fitting model in machine learning. The target of this paper is to acquire more information about the objective function and to observe of it application in CNN. Various CNN architecture has proposed to achieve high accuracy by apply of the objective function. We used the CNN framework as a method to explain the content of architecture. To achieve a good model, every CNN used an objective function as a parameter to measure the closeness between the learning dataset and the actual dataset. As a pre-trained model to extract the critical feature, many scholars proposed a pre-trained CNN model to get high accuracy and a significant model. One of the ablation studies in CNN is a reformulation of the objective function. An objective function has often shown by a matrix operation known as a loss function or loss entropy. As a result, from this research is various CNN architecture models that tailor to many different objects. We can review from architecture, formulation, filter, and dense layer to achieve a good feature extraction as a feature map. Many parameters can observe on every step of CNN. Impact of this review, we can get a baseline model as beginning research to develop a new CNN architecture that can compare with the baseline model.

1.
Gu
,
J.
,
Wang
,
G.
,
Cai
,
J.
, and
Chen
,
T.
2017
.
An Empirical Study of Language CNN for Image Captioning IEEE
International Conference on Computer Vision
, p.
1231
1240
2.
Chen
,
S.
, and
Zhao
,
Q.
2018
.
Boosted Attention: Leveraging Human Attention for Image Captioning
.
Proceedings of the European Conference on Computer Vision (ECCV)
1
, p
68
84
3.
Mun
,
J.
,
Cho
,
M.
, and
Han
,
B.
2017
Text-Guided Attention Model for Image Captioning Thirty-First
.
AAAI Conference on Artificial Intelligence
, p.
4233
4239
4.
Shin
,
A.
,
Ushiku
,
Y.
, and
Harada
,
T.
2016
Image Captioning with Sentiment Terms via Weakly-Supervised Sentiment Dataset
.
British Machine Vision Conference.
p.
531
531
5.
Aneja
,
J.
,
Deshpande
,
A.
, and
Schwing
,
A. G.
2017
.
Convolutional Image Captioning
.
Computer Vision and Pattern Recognition.
p.
5561
5570
6.
Dai
,
B.
2017
.
Contrastive Learning for Image Captioning
Advances in Neural Information Processing Systems Conference
,
30
, p
898
907
7.
Shuster
,
K.
,
Humeau
,
S.
,
Hu
,
H.
,
Bordes
,
A.
, and
Weston
,
J.
2019
.
Engaging Image Captioning via Personality
IEEE Conference on Computer Vision and Pattern Recognition
, p.
2516
12526
8.
Ding
,
G.
,
Chen
,
M.
,
Zhao
,
S.
,
Chen
,
H.
, and
Han
,
J.
2018
.
Neural Image Caption Generation with Weighted Training and Reference
Cognitive Computation
, p.
101007
9.
Feng
,
Y.
2019
.
Unsupervised Image Captioning
IEEE Conference on Computer Vision and Pattern Recognition
,
1
, p
4125
4134
10.
Gan
,
C.
,
Gan
,
Z.
,
He
,
X.
, and
Gao
,
J.
2017
.
StyleNet: Generating Attractive Visual Captions with Styles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, p.
955
964
11.
He
,
X.
,
Shi
,
B.
,
Bai
,
X.
,
Xia
,
G.
, and
Zhang
,
Z.
2017
.
Image Caption Generation with Part of Speech Guidance
Pattern Recognition Letters
0
p
1
9
12.
Dai
,
B.
, and
Fidler
,
S.
2018
.
A Neural Compositional Paradigm for Image Captioning
.
NeurIPS
, p.
1
11
13.
Papineni
,
K.
,
Roukos
,
S.
,
Ward
,
T.
, and
Wei-Jing
,
Z.
2002
.
BLEU: a Method for Automatic Evaluation of Machine Translation
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL)
, pp
311
318
14.
Shin
,
D.
, and
Kim
,
I.
2018
.
Deep Image Understanding Using Multilayered Contexts
Mathematical Problems in Engineering
,
5847460
, p.
1
11
15.
Devlin
,
J.
2015
.
Language Models for Image Captioning: The Quirks and What Works
arXiv preprint arXiv:1505.01809
16.
Kouga
,
V.
,
Pavlopoulos
,
J.
, and
Androutsopoulos
,
I.
2016
.
A Survey on Biomedical Image Captioning Proceedings of the Second Workshop on Shortcomings
.
Vision and Language
, p.
26
36
17.
Li
,
X.
,
Song
,
X.
,
Herranz
,
L.
,
Zhu
,
Y.
, and
Jiang
,
S.
2016
.
Image Captioning with both Object and Scene Information
24th ACM international conference on Multimedia
, p.
1107
1110
18.
Ferraro
,
F.
2015
.
A Survey of Current Datasets for Vision and Language
Research Computation and Language
, p.
207
213
19.
Karpathy
,
A.
, and
Fei-Fei
,
L.
,
Deep Visual-Semantic Alignments for Generating Image Descriptions
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
39
, p
664
676
4
20.
Cui
,
Y.
,
Yang
,
G.
,
Veit
,
A.
,
Huang
,
X.
,
Belongie
,
S.
, and
Tech
,
C.
2018
Learning to Evaluate Image Captioning
Computer Vision and Pattern Recognition
, p.
1
13
21.
Bai
,
S.
, and
An
,
S.
2018
.
A survey on automatic image caption generation Shuang
.
Neurocomputing
,
311
, p
291
304
22.
Harga
,
I.
, and
Ivašić
,
M.
2019
.
Deep Image Captioning : An Overview
42nd International ICT Convention– MIPRO CIS-Intelligent Systems
23.
Batra
,
V.
,
He
,
Y.
, and
Vogiatzis
,
G.
2016
.
Neural Caption Generation for News Images
Proceedings of the Eleventh International Conference on Language Resources and Evaluation
, p.
1726
1733
24.
Pan
,
J.
,
Yang
,
H.
,
Duygulu
,
P.
, and
Faloutsos
,
C.
2004
.
Automatic Image Captioning
IEEE International Conference on Multimedia and Expo
, June
25.
Kaiming
,
H.
,
Xiangyu
,
Z.
,
Shaoqing
,
R.
, and
Jian
,
S.
2016
.
Deep Residual Learning for Image Recognition
Computer Vision
, p.
1
9
26.
Krizhevsky
,
A.
,
Sutskever
,
I.
, and
Hinton
G E
2012
ImageNet Classification with Deep Convolutional Neural Networks
Advances in Neural Information Processing Systems 25 (NIPS 2012)
25
p
1
9
27.
Lecun
,
Y.
,
Bottou
,
L.
,
Bengio
,
Y.
, and
Haffner
,
P.
1998
.
Gradient-Based Learning Applied to Document Recognition
Proceedings of the IEEE
86
(
11
), p.
2278
2324
28.
Simonyan
,
K.
, and
Zisserman
,
A.
2015
.
Very Deep Convolutional Networks for Large-Scale Image Recognition
arxiv, p.
1
14
29.
Szegedy
,
C.
,
Wei
,
L.
, and
Al
,
E.
2015
.
Going Deeper with Convolutions
,
Computer Vision
, p.
1
9
30.
Donahue
,
J.
, et al
2015
.
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
IEEE conference on computer vision and pattern recognition
, p.
2625
2634
31.
Wang
,
J.
,
Fu
,
J.
,
Xu
,
Y.
, and
Mei
,
T.
2008
Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks
IJCAI
, p.
3484
3490
32.
Benenson
,
R.
, and
Ferrari
,
V.
2019
.
Large-scale interactive object segmentation with human annotators
IEEE Conference on Computer Vision and Pattern Recognition
, p.
11700
11709
.
33.
Devlin
,
J.
, et al
2015
.
Language Models for Image Captioning: The Quirks and What Works
Computation and Language
, p.
100
105
34.
Dong
,
X.
,
Zhu
,
L.
,
Zhang
,
D.
,
Yang
,
Y.
, and
Wu
,
F.
2018
.
Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering
Proceedings of the 26th ACM international conference on Multimedia
, p.
54
62
35.
Li
,
N.
, and
Chen
,
Z.
2018
.
Image Captioning with Visual-Semantic LSTM
Twenty-Seventh International Joint Conference on Artificial Intelligence
, p.
793
799
36.
Madhyastha
,
P.
, and
Wang
,
J.
2018
.
End-to-end Image Captioning Exploits Multimodal Distributional Similarity
arXiv:1809.04144
37.
Mathews
,
A.
,
Xie
,
L.
, and
He
,
X.
2016
.
SentiCap: Generating Image Descriptions with Sentiments
AAAI Conference on Artificial Intelligence
,
30
(
1
), p
3574
3580
1
38.
Mullachery
,
V.
, and
Motwani
,
V.
2018
.
Image Captioning
arXiv:1805.09137
39.
Banerjee
,
S.
, and
Lavie
,
A.
2005
.
METEOR: An automatic metric for MT evaluation with improved correlation with human judgment
Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization
,
29
, p
65
72
40.
Vedantam
,
R.
,
Zitnick
,
C. L.
, and
Parikh
,
D.
2015
.
CIDEr: Consensus-based image description evaluation Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
,
07
, p
4566
4575
June
This content is only available via PDF.
You do not currently have access to this content.