The Ministry of Villages PDTT in collaboration with the National Development Planning Agency and the Central Statistics Agency issued Village Potential data in 2015 (Podes 2015) consisting of 74093 villages and having 42 indicators/attributes with podes status used as labels, but in the historical data there is a problem with the dataset owned by Podes 2015 identified as having data with class data to unbalanced an. According to data problems such as class imbalances can affect the performance of the algorithm to be poor, this is because if the minority data class is smaller or lower than the majority data class, the prediction results will be more inclined to the majority data class. The gradient boosted decision tree method has the advantage of good performance in handling classifications that combine simple parameter functions with ’bad’ results (high prediction errors) to create highly accurate predictions. However, the gradient boosted decision tree algorithm has a disadvantage that it cannot be applied to the problem of regression from a small distribution of data, therefore it takes large data to use algorithms, where complex interactions will be modelled simply. To solve the problem can be done with a method that can balance the class and improve accuracy. Adaboost is one of the boosting methods that is able to balance classes by giving weight to the level of classification errors that can change the distribution of data while SMOTE is a well-known method applied in order to deal with class imbalances. This technique synthesizes a new sample of minority classes to balance the dataset by creating a new instance of the minority class with the formation of a consolidation convex of adjacent instances. Experiments were carried out by applying the adaboost method to the gradient boosted decision tree (GBDT) to get optimal results and a good level of accuracy. The experimental results obtained from the proposed method are the SMOTE technique on the gradient boosted decision tree with ada boost for accuracy of 8 8.91%, classification error of 11.09% compared to the naïve bayes algorithm where only get accuracy 40.16%, classification error 59.84. On the measurement of Classification Error. Finally, in kappa measurements, it can be concluded in determining the status of villages using the smote technique method for gradient boosted decision trees and adaboost proven to be able to solve class imbalance problems and increase high accuracy and can reduce the classification error rate.

1.
Witten
I.H.
,
Frank
E.
,
Hall
M.A.
Data Mining Third Edition
.
MK Morgan Kaufman
;
2011
.
2.
Ilham
A.
,
Khikmah
L.
,
Qahslim
A.
,
Indra Iswara
I.B.A.
,
Laumal
F.E.
,
Rahim
R.
A systematic literature review on attribute independent assumption of Naive Bayes: research trend, datasets, methods and frameworks
.
IOP Conf Ser Mater Sci Eng
2018
;
420
:
012086
. .
3.
Han
and
Kamber
.
Data Mining Concepts and Techniques
Third Edition. vol.
1
.
2012
. .
4.
Ryu
D.
,
Baik
J.
Effective multi-objective naïve Bayes learning for cross-project defect prediction
.
Appl Soft Comput J
2016
;
49
:
1062
77
. .
5.
Undavia
J.N.
,
Dolia
P.M.
,
Shah
N.P.
Prediction of Graduate Students for Master Degree based on Their Past Performance using Decision Tree in Weka Environment
.
Int J Comput Appl
2013
;
74
:
23
9
. .
6.
Bisri
A.
,
Satria
Wahono
R.
Penerapan Adaboost untuk Penyelesaian Ketidakseimbangan Kelas pada Penentuan Kelulusan Mahasiswa dengan Metode Decision Tree
.
J Intell Syst
2015
;
1
:
27
32
.
7.
Díez-Pastor
J.F.
,
Rodríguez
J.J.
,
García-Osorio
C.
,
Kuncheva
L.I.
Random Balance: Ensembles of variable priors classifiers for imbalanced data
.
Knowledge-Based Syst
2015
;
85
:
96
111
. .
8.
Tahir
M.A.
,
Kittler
J.
,
Yan
F.
Inverse random under sampling for class imbalance problem and its application to multi-label classification
.
Pattern Recognit
2012
;
45
:
3738
50
. .
9.
Weiss
G.M.
,
McCarthy
K.
,
Zabar
B.
Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?
Proc Int Conf Data Min
2012
:
3
6
.
10.
Kotsiantis
S.
,
Kanellopoulos
D.
,
Pintelas
P.
Handling imbalanced datasets : A review
.
GETS Int rans Comput Sci Eng
2010
;
30
:
25
36
. .
11.
He
H.
,
Garcia
E.A.
Learning from Imbalanced Data
.
Curr Top Med Chem
2008
;
8
:
1691
709
. .
12.
Drummond
C.
,
Holte
R.C.
C4.5, Class Imbalance, and Cost Sensitivity: Why Under-sampling beats Over-sampling
.
Work Learn from Imbalanced Datasets II
2003
:
1
8
. .
13.
Freund
Y.
,
Schapire
R.E.
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting *
.
J Comput Syst Sci
1997
;
139
:
119
39
.
14.
Report
T.
,
Classifiers
A.
,
Breiman
L.
,
Berkeley C.
BIAS
, VARIANCE, AND ARCING CLASSIFIERS.
Univ California, Dep Stat Berkeley
,
Calif
1996
.
15.
Friedman
J.
,
Hastie
T.
,
Tibshirani
R.
ADDITIVE LOGISTIC REGRESSION: A STATISTICAL VIEW OF BOOSTING
.
Ann Stat
2000
;
28
:
337
407
.
16.
Rao
H.
,
Shi
X.
,
Rodrigue
A.K.
,
Feng
J.
,
Xia
Y.
,
Elhoseny
M.
, et al.
Feature selection based on artificial bee colony and gradient boosting decision tree
.
Appl Soft Comput J
2019
;
74
:
634
42
. .
17.
Guelman
L.
Gradient boosting trees for auto insurance loss cost modeling and prediction
.
Expert Syst Appl
2012
;
39
:
3659
67
. .
18.
Saberian
M.J.M.
,
Vasconcelos
N.
Multiclass Boosting: Theory and Algorithms.
Adv Neural Inf Process Syst 24 25th Annu Conf Neural Inf Process Syst 2011 Proc a Meet Held 12-14 December
2011
,
Granada, Spain
2011:
2124
32
.
19.
Lemmens
A.
,
Croux
C.
Bagging and Boosting Classification Trees to Predict Churn
.
J Mark Res
2006
;
43
:
276
86
. .
20.
Septian
N.Y.
Data Mining Menggunakan Algoritma Naïve Bayes Untuk Klasifikasi Kelulusan Mahasiswa Universitas Dian Nuswantoro
.
J Semant
2013
2009
:
1
11
.
21.
Qilla Aulia Suri
A.M.G.
Fakultas Teknik – Universitas Muria Kudus
.
Pros SNATIF Ke-6 Tahun 2019
2019
:
96
101
.
22.
Han
J.
,
Kamber
M.
,
Pei
J.
Data Mining: Concepts and Techniques
.
Elsevier Inc
.;
2012
.
This content is only available via PDF.
You do not currently have access to this content.