Across the globe, diabetes is recognized as one of the many causes of deaths, especially in Third World countries as there is a lack of treatment for diabetes, especially in the early stages. In study, the presence of diabetes will be classified within the community, thus contributing to the existing technology within the healthcare system. Our discovery can help doctors to predict the existence of diabetes accurately and alert patients to seek early treatments. Four data mining algorithms were used within this study which consists of both single and ensemble classifiers. The two single classifiers are decision tree, and logistic regression classifier while the ensemble classifiers are random forest, and stacking. These classifiers are chosen as they are efficient and high in performance. This research uses the PIMA diabetes dataset as it can be obtained by the general public. The stratify cross-validation is used to ensure the efficiency of the models. Ensemble classifiers show better or similar testing results compared to single classifiers. From data visualisation, two important features are discovered.

1.
A.
Misra
,
H.
Gopalan
,
R.
Jayawardena
,
A. P.
Hills
,
M.
Soares
,
A.A.
Reza-Albarrán
and
K. L.
Ramaiya
, “Diabetes in developing countries” in
Journal of diabetes
11
(
7
), (
Wiley Online Library
,
2019
), pp.
522
539
.
2.
A.
Mujumdar
, and
V.
Vaidehi
, “Diabetes prediction using machine learning algorithms” in
Procedia Computer Science
165
, (
Elsevier
,
2019
), pp.
292
299
.
3.
N. P.
Tigga
and
S.
Garg
, “Prediction of type 2 diabetes using machine learning classification methods” in
Procedia Computer Science
167
, (
Elsevier
,
2020
), pp.
706
716
.
4.
J.J.
Khanam
and
S.Y.
Foo
, “A comparison of machine learning algorithms for diabetes prediction” in
ICT Express
7
(
4
), (
Elsevier
,
2021
), pp.
432
439
.
5.
A.
Rajagopal
,
S.
Jha
,
R.
Alagarsamy
,
S. G.
Quek
and
G.
Selvachandran
, “A novel hybrid machine learning framework for the prediction of diabetes with context-customized regularization and prediction procedures” in
Mathematics and Computers in Simulation
198
, (
Elsevier
,
2022
), pp.
388
406
.
6.
P.
Theerthagiri
,
A. U.
Ruby
, and
J.
Vidya
, “Diagnosis and Classification of the Diabetes Using Machine Learning Algorithms” in
SN Computer Science
4
(
1
), (
Springer
,
2022
), pp.
1
10
.
7.
J.
Li
,
Q.
Chen
,
X.
Hu
,
P.
Yuan
,
L.
Cui
,
L.
Tu
and
J.
Xu
, “Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques” in
International Journal of Medical Informatics
149
, (
Elsevier
,
2021
), pp.
1
7
.
8.
S. M.
Ganie
and
M. B.
Malik
, “An ensemble machine learning approach for predicting type-II diabetes mellitus based on lifestyle indicators” in
Healthcare Analytics
2
, (
Elsevier
,
2022
), pp.
1
14
.
9.
S.
Kushwaha
,
R.
Srivastava
,
R.
Jain
,
V.
Sagar
,
A. K.
Aggarwal
,
S.K.
Bhadada
and
P.
Khanna
, “Harnessing machine learning models for non-invasive pre-diabetes screening in children and adolescents” in
Computer Methods and Programs in Biomedicine
226
, (
Elsevier
,
2022
), pp.
1
7
.
10.
R.
Rastogi
and
M.
Bansal
, “Diabetes prediction model using data mining techniques” in
Measurements: Sensors
25
, (
Elsevier
,
2022
), pp.
1
9
.
This content is only available via PDF.
You do not currently have access to this content.