The delivery of defect-free products is always being a challenge in the software industry. Limitation of testing criteria is reasoned as important aspects that lead to the existence of faults/bugs in the developed system. However, fault and effort prediction is a futuristic event in any software development-planning phase. Nevertheless, to save time, effort and budget forecasting faults and effort become critical aspects of software development. It has been proven that unsupervised and semi-supervised classification techniques produce more accurate results in the lack of availability of past information. To reduce the manual intervention of experts for identifying modules, authors propose an automatic software tool with a semi-supervised feature based on a self-organizing map to detect labels using reduced map size. Three different scenarios, which integrate proposed clustering with regression-based classification, are the main contribution of the study. The fusion of clustering and regression improves the capability of the prediction model in the presence of heterogeneous data. The use of feature subset selection is also considered with an experimental comparison. The combination of feature selection with the proposed technique provides more flexibility to choose a significant amount of attributes.

1.
T.
Hall
,
S.
Beecham
,
D.
Bowes
,
D.
Gray
, and
S.
Counsell
S.
A systematic literature review on fault prediction performance in software engineering
.
IEEE Transactions on Software Engineering
38
(
6
),
1276
304
(
2011
).
2.
G.
Abaei
,
Z.
Rezaei
,
A.
Selamat
, “
Fault prediction by utilizing self-organizing map and threshold
”,
In Proceedings of the International Conference on Control System, Computing and Engineering (IEEE, 2013
), pp.
465
470
.
3.
J.
Zheng
.
Cost-sensitive boosting neural networks for software defect prediction
.
Expert Systems with Applications
37
(
6
),
4537
43
(
2010
).
4.
P. K.
Rajput
,
G.
Nagpal
,
Aarti
, “
CGANN-Clustered Genetic Algorithm with Neural Network for Software Cost Estimation
”,
in proceeding of International Conference on Advances in Engineering and Technology (ICAET 2014
), pp.
268
272
.
5.
C.
Jin
,
S. W.
Jin
,
J. M.
Ye
.
Artificial neural network-based metric selection for software fault-prone prediction model
.
IET software
6
(
6
),
479
87
(
2012
).
6.
S. K.
Pandey
,
R. B.
Mishra
,
A. K.
Tripathi
.
Machine learning based methods for software fault prediction: A survey
.
Expert Systems with Applications
172
,
114595
(
2021
).
7.
N.
Seliya
and
T. M.
Khoshgoftaar
.
Software quality estimation with limited fault data: a semi-supervised learning perspective
.
Software Quality Journal
15
(
3
),
327
44
(
2017
).
8.
Aarti
,
G.
Sikka
, and
R.
Dhir
.
Grey relational classification algorithm for software fault-proneness with SOM clustering
.
International Journal of Data Mining, Modelling and Management
12
(
1
),
28
64
(
2018
)
9.
Y.
Peng
,
G.
Kou
,
G.
Wang
,
H.
Wang
, and
F. I.
Ko
.
Empirical evaluation of classifiers for software risk management
.
International Journal of Information Technology and Decision Making
,
8
(
4
),
749
767
(
2009
).
10.
M.
Park
and
E.
Hong
.
Software fault prediction model using clustering algorithms determining the number of clusters automatically
.
International Journal of Software Engineering and Its Applications
8
(
7
),
199
204
(
2014
).
11.
Z.
Xu
,
L.
Li
,
M.
Yan
,
J.
Liu
,
X.
Luo
,
J.
Grundy
,
Y.
Zhang
, and
X.
Zhang
X.
A comprehensive comparative study of clustering-based unsupervised defect prediction models
.
Journal of Systems and Software
172
,
110862
(
2021
).
12.
E.
Trillo
,
G. G.
Tognotta
,
G.
Scanniello
, “
Clustering for fault prediction with cluffp
”,
in Proceedings of 40th EUROMICRO Conference on Software Engineering and Advanced Applications (IEEE, 2014
), pp.
406
407
.
13.
M. M.
Öztürk
,
U.
Cavusoglu
,
A.
Zengin
.
A novel defect prediction method for web pages using k-means++
.
Expert Systems with Applications
42
(
19
),
6496
506
(
2015
).
14.
G.
Sikka
and
R.
Dhir
.
An Investigation on the Metric Threshold for Fault-Proneness
.
International Journal of Education and Management Engineering
7
(
3
),
35
(
2017
).
15.
Aarti
,
G.
Sikka
,
R.
Dhir
.
Threshold-based empirical validation of object-oriented metrics on different severity levels
.
International Journal of Intelligent Engineering Informatics
7
,
231
62
(
2019
).
16.
S. N.
Devi
.
Software fault prediction with metric threshold using clustering algorithm
.
International Journal of Research and Engineering
3
(
5
),
35
9
(
2016
).
17.
Y.
Li
.
A fault prediction and cause identification approach in complex industrial processes based on deep learning
.
Computational Intelligence and Neuroscience
(
2021
).
18.
S. S.
Rathore
,
S.
Kumar
,
An empirical study of ensemble techniques for software fault prediction
.
Appl Intell
51,
3615
3644
(
2021
).
19.
M. R.
Ahmed
,
M. A.
Ali
,
N.
Ahmed
,
M. F.
Zamal
,
F. J.
Shamrat
, ”
The impact of software fault prediction in real-world application: An automated approach for software engineering
”,
in Proceedings of 6th International Conference on Computing and Data Engineering
(
2020
), pp.
247
251
.
20.
X.
Gao
and
Y.
Wang
.
Optimized integration of traditional folk culture based on DSOM-FCM
.
Personal and Ubiquitous Computing
24
(
2
),
273
86
(
2020
).
This content is only available via PDF.
You do not currently have access to this content.