This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.

1.
B.
Liu
,
Sentiment Analysis and Opinion Mining
,
Synthesis lectures on human language technologies
,
5
(
1
),
1
167
(
2012
).
2.
H.
Arafat
,
R. M.
Elawady
,
S.
Barakat
and
N. M.
Elrashidy
.
Different Feature Selection for Sentiment Classification
.
International Journal of Information Science and Intelligent System
,
1
(
3
)
137
50
(
2014
).
3.
A.
Abbasi
,
S.
France
,
Z.
Zhang
and
H.
Chen
.
Selecting Attributes for Sentiment Classification Using Feature Relation Networks
.
IEEE Transactions on Knowledge and Data Engineering
,
23
(
3
),
447
62
(
2011
).
4.
Vinodhini
,
G.
and
R.
Chandrasekaran
.
Effect of Feature Reduction in Sentiment Analysis of Online Reviews
,
International Journal of Advance in Computer Engineering & Technology ((IJARCET)
,
2
(
6
)
2165
2172
(
2013
).
5.
S. R.
Ahmad
,
A. A.
Bakar
and
M.R.
Yaakub
.
Metaheuristic Algorithms For Feature Selection In Sentiment Analysis
.
In Science and Information Conference (SAI 2015)
,
222
226
(
2015
).
6.
H.
Liu
and
L.
Yu
.
Toward Integrating Feature Selection Algorithms for Classification and Clustering
.
IEEE Transactions on Knowledge and Data Engineering
,
17
(
4
),
491
502
(
2005
).
7.
M.H.
Aghdam
,
N.
Ghasem-Aghaee
and
M.E.
Basiri
.
Text Feature Selection using Ant Colony Optimization
.
Journal Expert Systems with Applications
,
36
(
3
),
6843
6853
(
2009
).
8.
A.
Abbasi
,
H.
Chen
and
A.
Salem
.
Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web forums
.
ACM Transactions on Information Systems
,
26
(
3
) (
2008
).
9.
B.
Agarwal
and
N.
Mittal
.
Sentiment Classification using Rough Set based Hybrid Feature Selection
.
In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2013)
,
NAACL-HLT. Atlanta
, p.
115
119
(
2013
).
10.
Y.
Saeys
,
I.
Inza
and
P.
Larrañaga
.
A Review of Feature Selection Techniques in Bioinformatics
.
Bioinformatics
(Oxford, England),
23
(
19
),
2507
2517
(
2007
).
11.
R.
Jensen
and
Q.
Shen
.
Fuzzy-Rough Sets Assisted Attribute Selection
.
IEEE Transactions on Fuzzy Systems
,
15
(
1
) (
2007
).
12.
R.
Jensen
and
Q.
Shen
.
Fuzzy-rough attribute reduction with application to web catagorization
. In the Transaction on
Fuzzy Sets and System
,
141
(
3
),
469
485
(
2004
).
13.
R.
Jensen
and
Q.
Shen
.
New Approaches to Fuzzy-Rough Feature Selection
.
IEEE Transactions on Fuzzy Systems
,
17
(
4
),
824
838
(
2009
).
14.
R.
Kohavi
and
G.H.
John
.
Wrappers for Feature Subset Selection
.
Artificial Intelligence
,
97
(
1–2
),
273
324
(
1997
).
15.
M.
Mafarja
and
D.
Eleyan
.
Ant Colony Optimization based Feature Selection in Rough Set Theory
.
International Journal of Computer Science and Electronics Engineering (IJCSEE)
,
1
(
2
), (
2013
).
16.
A.
Unler
and
A.
Murat
.
A Discrete Particle Swarm Optimization Method for Feature Selection in Binary Classification Problems
.
European Journal of Operational Research
,
206
(
3
),
528
539
(
2010
).
17.
S.C.
Yusta
.
Different Metaheuristic Strategies to Solve the Feature Selection Problem
.
Pattern Recognition Letters
,
30
(
5
), p.
525
534
(
2009
).
18.
M.E.
Basiri
and
S.
Nemati
.
A Novel Hybrid ACO-GA Algorithm for Text Feature Selection
.
In Evolutionary Computation, 2009. CEC’09. IEEE Congress
on,
2561
2568
(
2009
).
19.
J.
Zhu
,
H.
Wang
and
J.T.
Mao
.
Sentiment Classification using Genetic Algorithm and Conditional Random Field
.
In Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on
,
193
196
(
2010
).
20.
P.
Kalaivani
and
K.L.
Shunmuganathan
.
Feature Reduction Based on Genetic Algorithm and Hybrid Model for Opinion Mining
.
Scientific Programming
(2015),
12
(
2015
).
21.
Z.
Liu
,
S.
Liu
,
L.
Liu
,
J.
Sun
,
X.
Peng
and
T.
Wang
.
Sentiment recognition of online course reviews using multi-swarm optimization-based selected features
.
Neurocomputing
,
185
,
11
20
(
2016
).
22.
Y.
Jin
,
W.
Xiong
and
C.
Wang
.
Feature Selection for Chinese Text Categorization Based on Improved Particle Swarm Optimization
.
In Natural Language Processing and Knowledge Engineering (NLP-KE), IEEE
,
1
6
(
2010
).
23.
H.K.
Chantar
and
D.W.
Corne
.
Feature Subset Selection for Arabic Document Categorization using BPSO-KNN
.
In Nature and Biologically Inspired Computing (NaBIC), 2011 Third World Congress, IEEE
,
546
551
(
2011
).
24.
R.
Jensen
.
Combining rough and fuzzy sets for feature selection
. Ph.D. thesis,
Edinburgh University
,
2005
.
25.
A.
Al-Ani
.
Ant Colony Optimization for Feature Subset Selection
. In
WEC
(
2
),
35
38
. (
2005
).
26.
M.F.
Triola
.
Elementary Statistics
.
Reading, MA
, (
Pearson/Addison-Wesley
,
2006
).
27.
J.
Demšar
.
Statistical Comparisons of Classifiers over Multiple Data Sets
.
Journal of Machine Learning Research
,
7
(Jan)
1
30
(
2006
).
28.
M.
Hu
and
B.
Liu
.
Mining and Summarizing Customer Reviews
.
In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
,
168
77
(
2004
).
29.
I.H.
Written
and
E.
Frank
.
Data Mining: Practical Machine Learning Tools and Techniques
. Second Edi.
Elsevier Science
, (
2005
).
This content is only available via PDF.