"Kata terikat" represent a commonly utilized word class in Indonesian journalistic articles, yet their usage is often erroneous. "Kata terikat" are categorized into three types based on their division: words that are connected, separated by a space, and divided by a hyphen (-). This presents an opportunity for the automation of detection and error checking of "kata terikat" usage. The Rabin-Karp algorithm is employed for the detection of "kata terikat" due to their varied patterns, and the Random Forest algorithm is applied for the classification and correction of incorrectly used "kata terikat". The dataset used for this research is accommodated by Tribun News in form of nearly 1000 samples of journalistic article. The research conducted reveals that the "kata terikat" correction model achieved an accuracy of 86.24%. Three rounds of testing were carried out using 10, 20, and 40 journalistic articles from the Tribun News dataset, yielding accuracies of 85.71%, 91.67%, and 86.67%, respectively.

1.
O.
Mailani
,
I.
Nuraeni
,
S. A.
Syakila
, and
J.
Lazuardi
, “
Bahasa sebagai alat komunikasi dalam kehidupan manusia
,”
Kampret Journal
1
,
1
10
(
2022
).
2.
R.
Peter
and
M. S.
Simatupang
, “
Keberagaman bahasa dan budaya sebagai kekayaan bangsa indonesia
,”
Dialektika: Jurnal Bahasa, Sastra, dan Budaya
9
,
96
105
(
2022
).
3.
T. R. K. B. B.
Indonesia
, “
Kamus besar bahasa indonesia
,” (
2018
).
4.
S.
Rumilah
and
I.
Cahyani
, “
Struktur bahasa; pembentukan kata dan morfem sebagai proses morfemis dan morfofonemik dalam bahasa indonesia
,”
Jurnal Pendidikan Bahasa Indonesia
8
,
70
87
(
2020
).
5.
K. E. P.
Setiawan
and
W.
Zyuliantina
, “
Analisis kesalahan berbahasa indonesia pada status dan komentar di facebook
,”
Tabasa: Jurnal Bahasa, Sastra Indonesia, dan Pengajarannya
1
,
96
109
(
2020
).
6.
M.
Niknik
et al,
Cermat dalam berbahasa teliti dalam berpikir
(
Mitra Wacana Media
,
2016
).
7.
Y.
Faqih
,
Y.
Rahmanto
,
A. A.
Aldino
, and
B.
Waluyo
, “
Penerapan string matching menggunakan algoritma boyer-moore pada pengembangan sistem pencarian buku online
,”
Bulletin of Computer Science Research
2
,
100
106
(
2022
).
8.
P.
Aigbe
and
E.
Nwelih
, “
Analysis and performance evaluation of selected pattern matching algorithms
,”
Journal of Science and Technology Research
3
(
2021
).
9.
M. A.
Yulianto
and
N.
Nurhasanah
, “
The hybrid of jaro-winkler and rabin-karp algorithm in detecting indonesian text similarity
,”
Jurnal Online Informatika
6
,
88
95
(
2021
).
10.
L. S. N.
Nunes
,
J. L.
Bordim
,
Y.
Ito
, and
K.
Nakano
, “
A rabin-karp implementation for handling multiple pattern-matching on the gpu
,”
IEICE TRANSACTIONS on Information and Systems
103
,
2412
2420
(
2020
).
11.
I.
Kurniawan
,
EYD Ejaan yang Disempurnakan
(
Nuansa Cendekia
,
2023
).
12.
P. C.
Sen
,
M.
Hajra
, and
M.
Ghosh
, “Supervised classification algorithms in machine learning: A survey and review,” in
Emerging Technology in Modelling and Graphics: Proceedings of IEM Graph 2018
(
Springer
,
2020
) pp.
99
111
.
13.
A.
Le Glaz
,
Y.
Haralambous
,
D.-H.
Kim-Dufor
,
P.
Lenca
,
R.
Billot
,
T. C.
Ryan
,
J.
Marsh
,
J.
Devylder
,
M.
Walter
,
S.
Berrouiguet
, et al, “
Machine learning and natural language processing in mental health: systematic review
,”
Journal of Medical Internet Research
23
,
e15708
(
2021
).
14.
K.
Liu
,
X.
Hu
,
H.
Zhou
,
L.
Tong
,
W. D.
Widanage
, and
J.
Marco
, “
Feature analyses and modeling of lithiumion battery manufacturing based on random forest classification
,”
IEEE/ASME Transactions on Mechatronics
26
,
2944
2955
(
2021
).
15.
N.
Mediyawati
,
J. C.
Young
, and
S. B.
Nusantara
, “
U-tapis: Automatic spelling filter as an effort to improve indonesian language competencies of journalistic students
,”
Jurnal Cakrawala Pendidikan
40
,
402
412
(
2021
).
16.
B. P. D. P.
Bahasa
, “
Ejaan bahasa indonesia yang disempurnakan edisi v
,”.
17.
K.
Chowdhary
and
K.
Chowdhary
, “
Natural language processing
,”
Fundamentals of artificial intelligence
,
603
649
(
2020
).
18.
A. D.
Hartanto
,
A.
Syaputra
, and
Y.
Pristyanto
, “Best parameter selection of rabin-karp algorithm in detecting document similarity,” in
2019 International Conference on Information and Communications Technology (ICOIACT)
(
IEEE
,
2019
) pp.
457
461
.
19.
M. M.
Musthofa
and
A.
Yaqin
, “Implementation of rabin karp algorithm for essay writing test system on organization xyz,” in
2019 International Conference on Information and Communications Technology (ICOIACT)
(
IEEE
,
2019
) pp.
502
507
.
20.
D. B.
Rahmawati
,
M. L.
Irfani
,
R. B.
Purba
, and
I.
Ranggadara
, “
Text mining to detect plagiarism in e-learning system using rabin karp algorithm
,”
Iconic Research and Engineering Journals
3
,
183
191
(
2020
).
21.
B.
Charbuty
and
A.
Abdulazeez
, “
Classification based on decision tree algorithm for machine learning
,”
Journal of Applied Science and Technology Trends
2
,
20
28
(
2021
).
22.
S.
Tangirala
, “
Evaluating the impact of gini index and information gain on classification using decision tree classifier algorithm
,”
International Journal of Advanced Computer Science and Applications
11
,
612
619
(
2020
).
23.
J.
Quist
,
H.
Mirza
,
M. C.
Cheang
,
M. L.
Telli
,
J. A.
O’Shaughnessy
,
C. J.
Lord
,
A. N.
Tutt
, and
A.
Grigoriadis
, “
A four-gene decision tree signature classification of triple-negative breast cancer: implications for targeted therapeutics
,”
Molecular cancer therapeutics
18
,
204
212
(
2019
).
24.
M.
Aria
,
C.
Cuccurullo
, and
A.
Gnasso
, “
A comparison among interpretative proposals for random forests
,”
Machine Learning with Applications
6
,
100094
(
2021
).
25.
J. L.
Speiser
,
M. E.
Miller
,
J.
Tooze
, and
E.
Ip
, “
A comparison of random forest variable selection methods for classification prediction modeling
,”
Expert systems with applications
134
,
93
101
(
2019
).
26.
I.
Logunova
, “
Random forest classifier: Basic principles and applications
,”.
27.
J.-K.
Song
,
Y.
Zhang
,
X.-Y.
Fei
,
Y.-R.
Chen
,
Y.
Luo
,
J.-S.
Jiang
,
Y.
Ru
,
Y.-W.
Xiang
,
B.
Li
,
Y.
Luo
, et al, “
Classification and biomarker gene selection of pyroptosis-related gene expression in psoriasis using a random forest algorithm
,”
Frontiers in Genetics
13
,
850108
(
2022
).
28.
B.
Kabra
and
C.
Nagar
, “
Convolutional neural network based sentiment analysis with tf-idf based vectorization
,”
Journal of Integrated Science and Technology
11
,
503
503
(
2023
).
29.
S.
Singh
, “
Countvectorizer vs tfidfvectorizer
,”.
30.
H. D.
Abubakar
,
M.
Umar
, and
M. A.
Bakale
, “
Sentiment classification: Review of text vectorization methods: Bag of words, tf-idf, word2vec and doc2vec
,”
SLU Journal of Science and Technology
4
,
27
33
(
2022
).
31.
D.
Krstinić
,
M.
Braović
,
L.
Šerić
, and
D.
Božić-Štulić
, “
Multi-label classifier performance evaluation with confusion matrix
,”
Computer Science & Information Technology
1
(
2020
).
32.
A.
Martelli
,
A. M.
Ravenscroft
,
S.
Holden
, and
P.
McGuire
,
Python in a Nutshell
(“
O’Reilly Media, Inc
.”,
2023
).
33.
TRIBUNnews.com
, “
About us | tribunnews.com
,”.
34.
R.
Ratna
, “
Akurasi berita dalam jurnalisme daring (kasus alat test antigen bekas di bandara kualanamu pada portal berita kompas. com
),”
Majalah Semi Ilmiah Populer Komunikasi Massa
2
(
2021
).
35.
J. J.
Rawung
, “
Perancangan sistem informasi manajemen portal berita berbasis website
,”
TEKNOBIS: Jurnal Teknologi, Bisnis dan Pendidikan
1
,
9
16
(
2023
).
36.
D.
Dasgupta
,
Z.
Akhtar
, and
S.
Sen
, “
Machine learning in cybersecurity: a comprehensive survey
,”
The Journal of Defense Modeling and Simulation
19
,
57
106
(
2022
).
37.
Z.
Barut
and
V.
Altuntas
, “
Applied comparison of string matching algorithms
,”
Gaziosmanpaşa Bilimsel Araştırma Dergisi
12
,
76
85
(
2023
).
This content is only available via PDF.
You do not currently have access to this content.