Type 2 diabetes is a global disease issue and is one of leading causes of death. Current discovery indicates that this disease could be categorized into many sub-clusters, which is a step for precision medicine. In this paper, we aim to analyze and compare two approaches of data reduction, i.e. with and without principal component analysis (PCA) on the standardized and normalized data. Data preparation was first performed. The model was then developed and validated by plotting Elbow method and silhouette width graph. Normalized data with principal component (PC) of 2 gives the best clustering visualization, the lowest within cluster sum of squared (WCSS) score (195.41) and highest Silhouette score (0.3491) compared to using both standardized data and standardized data (PC = 2) with 23518.82 (WCSS score) and 0.1976 (Silhouette score). We concluded that by integrating PCA with k-means clustering, the score value of WCSS shown to be lower while higher value recorded for Silhouette score.

1.
F. I.
Mustapha
,
S.
Azmi
,
M. R. A.
Manaf
,
Z.
Hussein
,
J.
Mahir
,
F.
Ismail
, et al, "
What are the direct medical costs of managing Type 2 Diabetes Mellitus in Malaysia
,"
Med J Malaysia,
Vol.
72
, pp.
271
277
,
2017
.
2.
P.
Mehta
, "
Deconstructing complex diseases: identification of new phenotypical sub-clusters of Type 2 diabetes using machine learning
," ed,
2019
, p.
70
.
3.
J.
Tooke
,
J.
Lundgren
,
R.
Trembath
, and
J.
Iredale
, "
Stratified, personalised or P4 medicine: a new direction for placing the patient at the centre of healthcare and health education
," in
The Academy of Medical Sciences
,
2015
, p.
37
.
4.
E.
Ahlqvist
,
P.
Storm
,
A.
Käräjämäki
,
M.
Martinell
,
M.
Dorkhan
,
A.
Carlsson
, et al, "
Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables
,"
The Lancet Diabetes and Endocrinology,
2018
.
5.
O.
Fedotkina
,
O.
Sulaieva
,
T.
Ozgumus
,
L.
Cherviakova
,
N.
Khalimon
,
T.
Svietleisha
, et al, "
Novel reclassification of adult diabetes is useful to distinguish stages of β-cell function linked to the risk of vascular complications: the DOLCE study from northern Ukraine
,"
Frontiers in Genetics,
Vol.
12
, p.
1114
,
2021
.
6.
H.
Tanabe
,
H.
Saito
,
A.
Kudo
,
N.
Machii
,
H.
Hirai
,
G.
Maimaituxun
, et al, "
Factors associated with risk of diabetic complications in novel cluster-based diabetes subgroups: a Japanese retrospective cohort study
,"
Journal of clinical medicine,
Vol.
9
, p.
2083
,
2020
.
7.
J. M.
Dennis
,
B. M.
Shields
,
W. E.
Henley
,
A. G.
Jones
, and
A. T.
Hattersley
, "
Disease progression and treatment response in data-driven subgroups of type 2 diabetes compared with models based on simple clinical features: an analysis using clinical trial data
,"
The Lancet Diabetes and Endocrinology,
Vol.
7
, pp.
442
451
,
2019
.
8.
O. P.
Zaharia
,
K.
Strassburger
,
A.
Strom
,
G. J.
Bönhof
,
Y.
Karusheva
,
S.
Antoniou
, et al, "
Risk of diabetes-associated diseases in subgroups of patients with recent-onset diabetes: a 5-year follow-up study
,"
The Lancet Diabetes and Endocrinology,
2019
.
9.
I.
Kavakiotis
,
O.
Tsave
,
A.
Salifoglou
,
N.
Maglaveras
,
I.
Vlahavas
, and
I.
Chouvarda
, "
Machine learning and data mining methods in diabetes research
,"
Computational and structural biotechnology journal,
Vol.
15
, pp.
104
116
,
2017
.
10.
U.
Srinivasan
and
B.
Arunasalam
, "
Leveraging big data analytics to reduce healthcare costs
,"
IT professional,
Vol.
15
, pp.
21
28
,
2013
.
11.
K.
Jee
and
G.-H.
Kim
, "
Potentiality of big data in the medical sector: focus on how to reshape the healthcare system
,"
Healthcare informatics research,
Vol.
19
, pp.
79
85
,
2013
.
12.
T. G.
Penkova
, "
Principal component analysis and cluster analysis for evaluating the natural and anthropogenic territory safety
,"
Procedia computer science,
Vol.
112
, pp.
99
108
,
2017
.
13.
Y.
Liu
,
Z.
Li
,
H.
Xiong
,
X.
Gao
, and
J.
Wu
, "
Understanding of internal clustering validation measures
," in
2010 IEEE international conference on data mining
,
2010
, pp.
911
916
.
14.
A.
Ben-Hur
and
I.
Guyon
, "Detecting stable clusters using principal component analysis," in
Functional genomics
, ed:
Springer
,
2003
, pp.
159
182
.
15.
C.
Ding
and
X.
He
, "
K-means clustering via principal component analysis
," in
Proceedings of the twenty-first international conference on Machine learning
,
2004
, p.
29
.
This content is only available via PDF.
You do not currently have access to this content.