Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree. It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier. If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met. The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting. The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain.

1.
J.
Myles
,
R. N.
Feudale
,
Y.
Liu
,
N. A.
Woody
and S. D,
Journal of Chemometrics
18
,
275
285
(
2004
).
2.
Y.
Song
and
Y.
Lu
,
Shanghai Arch Psychiatry
85
,
130
135
(
2015
).
3.
G. K.
Gupta
,
Introduction to Data Mining with Case Studies
(
Prentice Hall
,
New Delhi
,
2006
), pp.
109
124
.
4.
L.
Breiman
,
J. H.
Friedman
,
R. A.
Olshen
and
C. J.
Stone
.
Classification and Regression Trees
(
Wadsworth, Inc
,
Monterey, Calif., U.S.A
,
1984
)
5.
C. K.
Ch’ng
and
N. I.
Mahat
,
Proceedings of the 3rd International Conference on Quantitative Sciences and Its Applications in
International Conference on Quantitative Sciences and Its Applications, Proceedings of the 3rd International Conference on Quantitative Sciences and Its Applications 1635
(
Langkawi, Malaysia
,
2014
), edited by
I.
Haslinda
, et al
, pp.
716
723
.
6.
C. K.
Ch’ng
, “
Winsorize tree algorithm for handling outliers in classification problem,” Ph.D. thesis
,
University Utara Malaysia
,
2016
.
7.
M.
Bohanec
and
I.
Bratko
,
Mach. Learn
15
,
223
250
(
1994
).
8.
L. A.
Breslow
and
D. W.
Aha
,
The Knowledge Engineering Review
12
,
1
40
(
1997
).
9.
L. C.
Thomas
,
R. W.
Oliver
, and
D. J.
Hand
,
Journal of the Operational Research Society
56
,
1006
1015
(
2005
).
10.
D.
Fisher
and
J.C.
Schlimmer
,
Machine Learning 1988
,
Proceedings of the Fifth International Conference on Machine Learning
, edited by
John
Laird
(
Ann Arbor, Michigan, USA
,
1988
), p.
22
28
.
11.
J. R.
Quinlan
, C4.5:
Programs for machine learning
(
Morgan Kaufmann Publishers
,
USA
,
1993
).
12.
S. W.
Norton
,
Proceedings of the Eleventh International Joint Conference on Artificial Intelligence
in
International Joint Conferences on Artificial Intelligence 11
(
Detroit, Michigan, USA
,
1989
), pp.
800
805
.
13.
H.
Ragavan
and
L.
Rendell
. In
Proceedings of the Tenth International Conference on Machine Learning
, (
Amherst, MA, USA
,
1993
), pp.
252
259
.
14.
S.
Murthy
and
S.
Salzbergm
,
Proceedings of the 14th international joint conference on Artificial intelligence Learning 2
, (
Montreal, Quebec, Canada
,
1995
), pp.
1025
1031
.
15.
L.
Rokach
, and
O.
Maimon
.
Data Mining with Decision Trees Theory and Applications
(
World Scientific
,
Singapore
,
2008
).
16.
J. R.
Quinlan
.
International Journal of Man-Machine Studies - Special Issue
:
Knowledge Acquisition for Knowledge-based Systems
27
,
221
234
(
1987
).
17.
M.
Bramer
.
Principle of data mining
(
Springer-Verlag
,
London
,
2013
).
18.
J.
Mingers
,
Journal of the Operational Research Society
38
,
39
47
(
1987
).
19.
L.
Rokach
, and
O.
Maimon
.
Data Mining with Decision Trees Theory and Applications 2nd Edition
(
World Scientific
,
Singapore
,
2015
).
20.
E.
Frank
, “
Pruning decision trees and lists
,” Ph.D. thesis (
2000
), available at http://www.cs.waikato.ac.nz/∼eibe/pubs/thesis.final.pdf.
21.
Kantardzic
,
Data Mining Concepts, Models, and Algorithms 2nd Edition
(
Wiley
,
New Jersey
,
2011
).
This content is only available via PDF.
You do not currently have access to this content.