There has been a growing occurrence of errors in a dataset, one of which is the incomplete data on an attribute or commonly acknowledged as a missing value, affecting the results of an analysis conducted for researchers. Attempt to address such issue includes the imputation, a method of filling in the missing value by Replacing the missing value with a possible value based on dataset information. This study aims to deal with missing values in albumin attribute hepatic data by utilizing K-Nearest Neighbor (KNN) imputation, performed by calculating the weight mean estimation for the number of K which has been determined. K is thus the closest observation, where in this study, the K that would be utilized is when K=3, K=5, K=7, K=9, and K=15. To determine the accuracy of an imputation, an evaluation is performed by utilizing the Mean Square Error (MSE). Based on the results obtained in this study, the best accuracy of program calculations is obtained when K=7 and the best MSE is achieved when K=15.

1.
R.
Malarvizhi
and
A. S.
Thanamani
, “
K-nearest neighbor in missing data imputation
,”
International Journal of Engineering Research and Development
,
5
,
5
7
(
2012
).
2.
H.
De Silva
and
A. S.
Perera
, “
Missing data imputation using Evolutionary k-Nearest neighbor algorithm for gene expression data
,”
in 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer)
,
141
146
(
2016
).
3.
R. J. A.
Little
and
D. B.
Rubin
, Statistical analysis with missing data,
793
.
John Wiley & Sons
(
2019
).
4.
G.
King
,
J.
Honaker
,
A.
Joseph
, and
K.
Scheve
, “
List-wise deletion is evil: what to do about missing data in political science
,” (
1998
).
5.
D.
Li
,
J.
Deogun
,
W.
Spaulding
, and
B.
Shuart
, “
Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method BT - Rough Sets and Current Trends in Computing
,”
573
579
(
2004
).
6.
A.
Farhangfar
,
L.
Kurgan
, and
J.
Dy
, “
Impact of imputation of missing values on classification error for discrete data
,”
Pattern Recognition
,
41
,
3692
3705
(
2008
),
7.
W. D.
Septiani
, “
Komparasi Metode Klasifikasi Data Mining Algoritma C4.5 Dan Naive Bayes Untuk Prediksi Penyakit Hepatitis
,”
None
,
13, 76-84
(
2017
),
8.
P. J.
Garcia-Laencina
,
J.-L.
Sancho-Gomez
,
A. R.
Figueiras-Vidal
, and
M.
Verleysen
, “
K nearest neighbours with mutual information for simultaneous classification and missing data imputation
,”
Neurocomputing
,
72
,
1483
1493
(
2009
),
9.
U.
Mawarsari
, “
Imputasi Missing Data Dengan K-nearest Neighbor Danalgoritma Genetika
,”
AdMathEdu
,
6
, (
2016
).
This content is only available via PDF.
You do not currently have access to this content.