Text preprocessing is a crucial stage in natural language processing (NLP). In traditional machine learning algorithms, text preprocessing is often done by cleaning the text, such as removing all stopwords, emojis, symbols. Then proceed with standardization processes such as lower casing, stemming, lemmatization. Even though it often helps the training process with minimal resources and shorter time, this method often eliminates particular meanings, such as the meaning of a sentiment. This research uses a state-of-the-art approach in sentiment analysis with deep learning using word embedding, which is proven to outperform the machine learning approach. We collect datasets through social media, Twitter, and YouTube. Then we test them with various text preprocessing techniques that are most optimal for Indonesian sentiment analysis. The results show that the dataset treated with emoji conversion, multi-word grouping on negated stopwords, slang word conversions, abbreviations, and typo correction can significantly increase the model’s accuracy. Meanwhile, the stemming process on the dataset does not help increase the accuracy of the sentiment analysis model.

1.
S.
Pichai
, “
Google’s I/O developer conference 2017
”, [Online]. Available: https://events.google.com/io2017/. [Access on 10 July 2020].
2.
Net Marketshare
, “
Operating System Market Share
,” [Online]. Available: https://netmarketshare.com/operating-system-market-share.aspx. [Access on 10 July 2020].
3.
P.
Palumbo
,
L.
Sayfullina
,
D.
Komashinskiy
,
E.
Eirola
, and
J.
Karhumen
, “
A Pragmatic Android Malware Detection Procedure
,”
Computers & Security
, July
2017
4.
P.
Faruki
,
A.
Bharmal
,
V.
Laxmi
,
V.
Ganmoor
,
M.S.
Gaur
, and
M.
Conti
, “Android Security: A Survey of Issues, Malware Penetration and Defenses,”
IEEE Communications Surveys & Tutorials
, Vol.
17
, pp.
998
1022
,
2015
.
5.
F. I.
Abro
, “
Investigating Android permissions and intents for malware detection
,” Unpublished Doctoral thesis,
University of London
,
2018
.
6.
K.
Sugunan
,
T.G.
Kumar
, and
K.A.
Dhanya
, “
Static and Dynamic Analysis for Android Malware Detection
,”
Advances in Big Data and Cloud Computing
, pp.
147
155
, April
2018
.
7.
R. B.
Hadiprakoso
,
H.
Kabetta
,
I. K.
Buana
Hybrid-Based Malware Analysis for Effective and Efficiency Android Malware Detection
International Conference of Informatics, Multimedia, Cyber, and Information System (ICIMCIS)
,
2020
8.
S.
Hou
,
A.
Saas
,
L.
Chen
,
Y.
Ya
, “
Deep4maldroid: a deep learning framework for android malware detection based on linux kernel system call graphs
International Conference on Web Intelligence Workshops (WIW)
, pp.
104
111
,
2016
.
9.
F.Y.
Osisanwo
,
J.E.T.
Akinsola
,
O.
Awodele
,
J.O.
Hinmikaiye
,
O.
Olakanmi
, and
J.
Akinjobi
, “Supervised Machine Learning Algorithms: Classification and Comparison”.
International Journal of Computer Trend and Technology (IJCTT)
, Vol.
48
, pp.
128
138
,
2017
.
10.
N. V.
Duc
,
P. T.
Giang
dan
P.
Minh
, “Permission Analysis for Android Malware Detection,” in
The Proceedings of the 7th VAST
,
Hanoi
,
2015
.
11.
S.
Chen
,
M.
Xue
,
Z.
Tang
,
L.
Xu
, and
H.
Zhu
,
“StormDroid: A Streaminglized Machine Learning –Based System for Detecting Android Malware,” ASIA CCS ’16
:
Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security
, pp.
377
388
, May
2016
.
12.
F.
Idreesa
,
M. R. M.
Contib
,
T. M.
Chena
dan
Y.
Rahulamathavan
, “
PIndroid: A novel Android malware detection system using ensemble learning methods
,”
Computers & Security,
vol.
68
, pp.
36
46
,
2017
13.
S.
Malik
, “
Android System Call Analysis for Malicious Application Detection
,”
International Journal of Computer Sciences and Engineering
, vol.
5
, no.
11
, pp.
105
108
,
2017
.
14.
Li
,
Xiang
, et al "
An android malware detection method based on android manifest file
."
2016 4th International Conference on Cloud Computing and Intelligence Systems (CCIS
).
IEEE
,
2016
.
15.
Kabakus
,
Abdullah
Talha
, and
Ibrahim Alper
Dogru
. "
An in-depth analysis of Android malware using hybrid techniques
."
Digital Investigation
24
(
2018
):
25
33
.
16.
Damodaran
,
Anusha
, et al "
A comparison of static, dynamic, and hybrid analysis for malware detection
."
Journal of Computer Virology and Hacking Techniques
13
.
1
(
2017
):
1
12
.
17.
Ali-Gombe
,
Aisha
I.
, et al "
Toward a more dependable hybrid analysis of android malware using aspect- oriented programming
."
computers & security
73
(
2018
):
235
248
.
18.
H.
Fereidooni
,
M.
Conti
,
D.
Yao
, and
A.
Sperduti
, “
ANASTASIA: Android Malware detection using Static analysis of Applications
,”
2016 8th IFIP Internasional Conference on New Technologies, Mobility and Security
, November
2016
.
19.
Z.
Rehman
,
S.N.
Khan
,
K.
Muhammad
,
J.W.
Lee
,
Z.
Ly
,
S.W.
Baik
,
P.A.
Shah
,
K.
Awan
, and
I.
Mehmood
, “
Machine Learning-assisted Signature and Heuristic-based Detection of Malwares in Android Devices
,”
Computers & Electrical Engineering
, Vol.
69
, pp.
828
841
, July
2018
20.
Surendran
,
Roopak
,
Tony
Thomas
, and
Sabu
Emmanuel
. "
A TAN based hybrid model for android malware detection
."
Journal of Information Security and Applications
54
(
2020
):
102483
.
21.
[
Z.
Yuan
,
Y.
Lu
,
Y.
Xue
Droiddetector: android malware characterization and detection using deep learning
,”
Tsinghua Sci. Technol.
,
21
(
1
) (
2016
), pp.
114
123
.
22.
M. K.
Alzaylaee
,
S.Y.
Yerimab
,
S.
Sezerc
, “
DL-Droid: Deep learning based android malware detection using real devices
Computers & Security
Vol.
89
, February
2020
.
23.
S.
Hou
,
A.
Saas
,
L.
Chen
,
Y.
Ye
,
T.
Bourlai
Deep neural networks for automatic android malware detection
”,
Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017
,
ACM
, pp.
803
810
,
2017
This content is only available via PDF.
You do not currently have access to this content.