Classifying applicants with good credit risk is essential for developing the banking industry in the future. On the other hand, applicants with bad risk can certainly make banks, as financial institutions provide credit, experience difficulties. The opportunity for bad credit risk can be great if the credit application acceptance system misclassifies applicants with a bad risk. This paper aims to investigate two statistical learning models to classify applicants who have good credit risk reliably; multinomial naïve Bayes and decision tree ID3. These nonparametric models require that all predictor variables are of categorical type. The proposed strategy discretizes the numeric type variables using the Sturges frequency distribution table. The results, validated using k-fold cross-validation resampling methods, showed that the strategy is very appropriate, especially in the ID3 decision tree model, because it produces a classification with performance measures greater than 90% for accuracy and sensitivity, precision, and F1-score. Only the specificity is 83.23%. However, sensitivity as a performance measure that prioritizes credit risk has a value of almost 100%, namely 99.87%. The multinomial naive Bayes method also has good performance. All performance measures have a greater than 85% value, and only specificity is below 85%. These performance measures indicate that the ID3 decision tree method is better than the multinomial naive Bayes method for classifying credit risk.

1.
Y.
Religia
,
G. T.
Pranoto
,
E. D.
Santosa
. “
South German Credit Data Classification using Random Forest Algorithm to Predict Bank Credit Receipts
”,
Journal of Informatika dan Sains
,
3
(
2
), (
2020
).
2.
T. M.
Alam
,
K.
Shaukat
,
I.A.
Hameed
,
S.
Luo
,
M. U.
Sarwar
,
S.
Shabbir
,
J.
Li
, and
M.
Khushi
. “
An Investigation of Credit Card Default Prediction in the Imbalanced Datasets
”,
IEEE Access
,
8
, (
2020
).
3.
D. D.
Thanawala
. “
Credit Risk Analysis using machine Learning and Neural Networks, Michigan Technological University
”.
Master’s Report. Part of the Other Applied Mathematics Commons.
(
2019
).
4.
S.
Aronoff
. “
The Minimum Accuracy Value as an Index of Classification Accuracy
”.
Photogrammetric Engineering and Remote Sensing
,
51
(
1
),
99
111
(
1985
).
5.
E. S.
Kresnawati
,
Y.
Resti
,
Y. B.
Suprihatin
, B.,
M. R.
Kurniawan
,
W. A.
Amanda
. “
Coronary Artery Disease Prediction Using Decision Trees and Multinomial Naïve Bayes with k-Fold Cross Validation
”.
Inovasi Matematika
,
3
(
2
),
174
189
(
2021
).
6.
Y.
Resti
,
E. S.
Kresnawati
,
N. R.
Dewi
,
D.A.
Zayanti
,
N.
Eliyati
. “
Diagnosis of Diabetes Mellitus in Women of Reproductive Age using The Prediction Methods of Naïve Bayes, discriminant Analysis, and Logistic Regression
”.
Science and Technology Indonesia
,
6
(
2
),
96
104
(
2021
).
7.
S.
García
,
J.
Luengo
, &
F.
Herrera
. “Data Preprocessing in Data Mining. In
J.
Kacprzyk
&
L. C.
Jain
(Eds.),
Intelligent Systems Reference Library
(72nd ed., Vol. 72)” (
Springer
,
New York
,
2015
)
8.
A.
Podviezko
,
V.
Podvezko
. “
Influence of data transformation on multicriteria evaluation result
.
Procedia Engineering
”,
122
,
151
157
(
2015
).
9.
I. H.
Witten
,
E.
Frank
. Data Mining.
Practical Machine Learning Tools and Techniques
. Second Edition. (
The Morgan Kaufmann Series in Data Management Systems
,
San Fransisco
,
2011
).
10.
H. A.
Sturges
, “
The Choice of a Class Interval
”,
Journal of the Americal Statistical Association
,
21
(
153
),
65
66
(
1926
).
11.
J. D.
Rodrnguez
,
A.
Perez
, and
J. A.
Lozano
. “
Sensitivity analysis of k-fold cross validation in prediction error estimation
”,
IEEE transactions on pattern analysis and machine intelligence
,
32
(
3
),
569
575
(
2009
).
12.
Y.
Bengio
and Y. “
Grandvalet. No unbiased estimator of the variance of k-fold cross-validation
”,
Journal of machine learning research
,
5
,
1089
1105
(
2004
).
13.
G.
James
,
D.
Witten
D,
T.
Hastie
,
R.
Tibshirani
R,
An Introduction to Statistical Learning with Application in R
(
Springer
,
New York
,
2013
).
14.
T.
Hastie
,
R.
Tibshirani
, and
J.
Friedman
. The Elements of Statistical Learning”,
Data Mining, Inference, and Prediction
, Second Edition (
Springer
,
New York
,
2008
).
15.
Y.
Pan
,
H.
Gao
,
H.
Lin
,
Z.
Liu
,
L.
Tang
, and
S.
Li
. “
Identification of bacteriophage virion proteins using multinomial Naïve bayes with g-gap feature tree
”,
International Journal of Molecular Sciences
,
19
(
6
),
2018
.
16.
D.
Soria
,
J.
Garibaldi
,
F.
Ambrogi
,
E.
Biganzoli
, “
A ’nonparametric’ version of the naive Bayes classifier, Knowledge Based System
24
(
6
),
775
784
(
2011
).
17.
J.
Han
,
M.
Kamber
, and
J.
Pei
.
Data Mining: Concept and Techniques
, Third Edition. (
Waltham: Morgan Kaufmann
,
New York
,
2012
)
18.
S. V.
Burger
.
Introduction to machine learning with R: Rigorous mathematical analysis
(
O’Reilly Media
,
USA
,
2018
).
19.
M.
Kuhn
,
K.
Johnson
.
Applied Predictive modelling
(
Springer
,
New York
,
2013
).
20.
Z.
Hassani
,
M.A.
Meybodi
,
V.
Hajihashemi
, “
Credit Risk Assessment using Learning Algorithms for Feature Selection. Fuzzy Information and Engineering
”,
12
(
4
),
529
544
(
2021
).
21.
Z.
Zhao
,
S.
Xu
,
B. H.
Kang
,
M. M.J.
Kabir
,
Y.
Liu
,
R.
Wasing
, “
Investigation and improvement of multilayer perception neural networks for credit scoring
”,
Expert System with Applications
42
,
3508
3516
(
2015
).
This content is only available via PDF.
You do not currently have access to this content.