Text preprocessing is a crucial stage in natural language processing (NLP). In traditional machine learning algorithms, text preprocessing is often done by cleaning the text, such as removing all stopwords, emojis, symbols. Then proceed with standardization processes such as lower casing, stemming, lemmatization. Even though it often helps the training process with minimal resources and shorter time, this method often eliminates particular meanings, such as the meaning of a sentiment. This research uses a state-of-the-art approach in sentiment analysis with deep learning using word embedding, which is proven to outperform the machine learning approach. We collect datasets through social media, Twitter, and YouTube. Then we test them with various text preprocessing techniques that are most optimal for Indonesian sentiment analysis. The results show that the dataset treated with emoji conversion, multi-word grouping on negated stopwords, slang word conversions, abbreviations, and typo correction can significantly increase the model’s accuracy. Meanwhile, the stemming process on the dataset does not help increase the accuracy of the sentiment analysis model.
Skip Nav Destination
Article navigation
7 December 2023
PROCEEDINGS OF THE 4TH TARUMANAGARA INTERNATIONAL CONFERENCE OF THE APPLICATIONS OF TECHNOLOGY AND ENGINEERING (TICATE) 2021
5–6 August 2021
Jakarta, Indonesia
Research Article|
December 07 2023
Text preprocessing for optimal accuracy in Indonesian sentiment analysis using a deep learning model with word embedding
Raden Budiarto Hadiprakoso;
Raden Budiarto Hadiprakoso
a)
1
Politeknik Siber dan Sandi Negara
, (Jl. H. USA, Ciseeng, Bogor, Indonesia
)a)Corresponding author: [email protected]
Search for other works by this author on:
Hermawan Setiawan;
Hermawan Setiawan
b)
1
Politeknik Siber dan Sandi Negara
, (Jl. H. USA, Ciseeng, Bogor, Indonesia
)
Search for other works by this author on:
Ray Novita Yasa;
Ray Novita Yasa
c)
1
Politeknik Siber dan Sandi Negara
, (Jl. H. USA, Ciseeng, Bogor, Indonesia
)
Search for other works by this author on:
a)Corresponding author: [email protected]
AIP Conf. Proc. 2680, 020050 (2023)
Citation
Raden Budiarto Hadiprakoso, Hermawan Setiawan, Ray Novita Yasa, Girinoto; Text preprocessing for optimal accuracy in Indonesian sentiment analysis using a deep learning model with word embedding. AIP Conf. Proc. 7 December 2023; 2680 (1): 020050. https://doi.org/10.1063/5.0126116
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
27
Views
Citing articles via
Inkjet- and flextrail-printing of silicon polymer-based inks for local passivating contacts
Zohreh Kiaee, Andreas Lösel, et al.
Design of a 100 MW solar power plant on wetland in Bangladesh
Apu Kowsar, Sumon Chandra Debnath, et al.
Production and characterization of corncob biochar for agricultural use
Praphatsorn Rattanaphaiboon, Nigran Homdoung, et al.
Related Content
The effect of emojis on arabic text in sentiment analysis using deep learning
AIP Conf. Proc. (January 2025)
Deep learning for Twitter sentiment analysis about the pros and cons of Covid-19 vaccines in Indonesia
AIP Conf. Proc. (May 2023)
Sentiment analysis on Malay-English mixed language text using artificial neural network
AIP Conf. Proc. (February 2024)
Deep learning based face recognition and emoji prediction system
AIP Conf. Proc. (September 2023)
Cyber-bullying detection in English and Hindi corpus with parental control
AIP Conf. Proc. (July 2024)