Sentiment analysis (SA) is the study of people's emotions and attitudes toward a particular topic. It is beneficial for monitoring and analyzing social media text in order to gather public opinion. Despite the fact that there are SA applications for monolingual text such as English and non-English languages like Hindi, Chinese and French, the Malay language has far fewer works, not to mention the mixed language such as Malay-English (also known as Manglish). Other than comments and posts from websites and social media, the emoji used by internet users can also help to provide better insights into how they truly feel about a particular topic. Our work focuses on Malay-English mixed language comments and posts on how Malaysians feel about daily new cases of Covid-19 in Malaysia. We proposed a neural network framework to perform SA on languages spoken by Malaysians, namely Malay, English, and Malay-English, by also taking into account the emoji used by internet users. The data was pre-processed to remove noises and then transformed into word vector representation using word embedding technique. Then we propose a framework that involves training and testing mixed language textual data along with emoji analysis by using bidirectional Long Short Term Memory (biLSTM) neural network. To compare with the proposed method, several machine learning models and Long Short Term Memory (LSTM) with word vectorization was used. Finally, compared to the machine learning model such as Naïve Bayes and Logistic Regression, neural networks such as LSTM, the proposed method; biLSTM with tuned hyper-parameter for Malay-English mixed language achieved the highest accuracy of 76.6%, and macro F1-score of 69.6%.

1.
Haozheng
,
N.
,
Chuqi
,
Y.
, &
Somya
,
S.
(
2018
).
Twitter Sentiment Analysis with Emoji and Emoticon Embedding
.
Columbia Journal
,
9
.
2.
Ekman
,
P.
(
1992
).
An argument for basic emotions
.
Cognition and Emotion
,
6
(
3–4
), pp.
169
200
.
3.
Wang
,
Y.
,
Sun
,
A.
,
Han
,
J.
,
Liu
,
Y.
, &
Zhu
,
X.
(
2018
).
Sentiment analysis by capsules
.
The Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW
2018,
2
, pp.
1165
1174
.
4.
Gathika
,
R.
(
2020
).
Probability and Machine Learning? — Part 1-Probabilistic vs Non-Probabilistic Machine Learning Models.
5.
Paliwal
,
S.
,
Kumar Khatri
,
S.
, &
Sharma
,
M.
(
2018
).
Sentiment Analysis and Prediction Using Neural Networks
.
Proceedings of the International Conference on Inventive Research in Computing Applications, ICIRCA 2018, Icirca
, pp.
1035
1042
.
6.
Al-Saffar
,
A.
,
Awang
,
S.
,
Tao
,
H.
,
Omar
,
N.
,
Al-Saiagh
,
W.
, &
Al-bared
,
M.
(
2018
).
Malay sentiment analysis based on combined classification approaches and Senti-lexicon algorithm
.
PLoS ONE
,
13
(
4
), pp.
1
18
.
7.
Chekima
,
K.
, &
Alfred
,
R.
(
2018
).
Sentiment Analysis of Malay Social Media Text
.
Lecture Notes in Electrical Engineering
,
488
(
February
), pp.
205
219
.
8.
Hemalatha
,
S.
, &
Ramathmika
,
R.
(
2019
).
Sentiment analysis of yelp reviews by machine learning
.
2019 International Conference on Intelligent Computing and Control Systems, ICCS 2019, Iciccs
, pp.
700
704
.
9.
López
,
S. A.
, &
Cuadrado-Gallego
,
J. J.
(
2019
).
Supervised learning methods application to sentiment analysis
.
ACM International Conference Proceeding Series
,
3
.
10.
Kurniasari
,
L.
, &
Setyanto
,
A.
(
2020
).
Sentiment Analysis using Recurrent Neural Network
.
Journal of Physics: Conference Series
,
1471
(
1
).
11.
Gundapu
,
S.
, &
Mamidi
,
R.
(
2020
).
gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data.
12.
Joshi
,
A.
,
Prabhu
,
A.
,
Shrivastava
,
M.
, &
Varma
,
V.
(
2016
).
Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text
.
COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers, 2012
, pp.
2482
2491
13.
Abdul-Mageed
,
M.
, &
Ungar
,
L.
(
2017
).
EmoNet: Fine-grained emotion detection with gated recurrent neural networks
.
ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers
),
1
, pp.
718
728
.
14.
Balakrishnan
,
V.
,
Selvanayagam
,
P. K.
, &
Yin
,
L. P.
(
2020
).
Sentiment and Emotion Analyses for Malaysian Mobile Digital Payment Applications.
Pp.
67
71
.
15.
Ghag
,
K. V.
, &
Shah
,
K.
(
2016
).
Comparative analysis of effect of stopwords removal on sentiment classification
.
IEEE International Conference on Computer Communication and Control, IC4 2015
, pp.
2
7
.
16.
Mohammad
,
S.
(
2016
).
A Practical Guide to Sentiment Annotation: Challenges and Solutions.
January, pp.
174
179
.
17.
Mikolov
,
T.
,
Sutskever
,
I.
,
Chen
,
K.
,
Corrado
,
G.
, &
Dean
,
J.
(
2013
).
Distributed representations ofwords and phrases and their compositionality
.
Advances in Neural Information Processing Systems
, pp.
1
9
.
18.
Ge
,
L.
(
2017
).
Improving Text Classification with Word Embedding.
Pp.
1796
1805
.
19.
Chris
,
N.
(
2020
).
A Beginner's Guide to Word2Vec and Neural Word Embeddings
. In
Pathmind Inc
.
Pathmind
20.
Řehůřek
,
R.
(
2021
).
Word2vec Embeddings
.
Radim Řehůřek
21.
Du
,
J.
,
Cheng
,
Y.
,
Zhou
,
Q.
,
Zhang
,
J.
,
Zhang
,
X.
, &
Li
,
G.
(
2020
).
Power Load Forecasting Using BiLSTM-Attention
.
IOP Conference Series: Earth and Environmental Science
,
440
(
3
).
22.
Xu
,
G.
,
Meng
,
Y.
,
Qiu
,
X.
,
Yu
,
Z.
, &
Wu
,
X.
(
2019
).
Sentiment analysis of comment texts based on BiLSTM
.
IEEE Access
,
7
(
c
), pp.
51522
51532
.
23.
Diederik
P.
Kingma
,
J. L. B.
(
2015
).
ADAM: A Method For Stochastic
.
ArXiv
,
1
15
24.
Hajiabadi
,
H.
,
Molla-Aliod
,
D.
,
Monsefi
,
R.
, &
Yazdi
,
H. S.
(
2020
).
Combination of loss functions for deep text classification
.
International Journal of Machine Learning and Cybernetics
,
11
(
4
), pp.
751
761
.
25.
Sagar
,
S.
(
2017
).
Epoch vs Batch Size vs Iterations
.
Towards Data Science.
26.
Brownlee
,
J.
(
2019
).
Return Sequences and Return States for LSTMs in Keras
.
Machine Learning Mastery
27.
Ghag
,
K. V.
, &
Shah
,
K.
(
2016
).
Comparative analysis of effect of stopwords removal on sentiment classification
.
IEEE International Conference on Computer Communication and Control, IC4 2015
, pp.
2
7
.
28.
Padurariu
,
C.
, &
Breaban
,
M. E.
(
2019
).
Dealing with data imbalance in text classification
.
Procedia Computer Science
,
159
, pp.
736
745
.
29.
Garbin
,
C.
, &
Zhu
,
X.
(
2020
).
Dropout vs . Batch Normalization : An Empirical Study of Their Impact to Deep Learning.
pp.
1
38
.
This content is only available via PDF.
You do not currently have access to this content.