Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

DSpace/Manakin Repository

Search OPUS

Advanced Search


My Account

Show simple item record

dc.contributor.author Wang, T
dc.contributor.author Qin, Z
dc.contributor.author Jin, Z
dc.contributor.author Zhang, S
dc.date.accessioned 2011-02-07T06:22:05Z
dc.date.issued 2010-07
dc.identifier.citation Journal of Systems and Software, 2010, 83 (7), pp. 1137 - 1147
dc.identifier.issn 0164-1212
dc.identifier.other C1 en_US
dc.identifier.uri http://hdl.handle.net/10453/13481
dc.description.abstract Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting. © 2010 Elsevier Inc. All rights reserved.
dc.language eng
dc.relation.isbasedon 10.1016/j.jss.2010.01.002
dc.title Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning
dc.type Journal Article
dc.parent Journal of Systems and Software
dc.journal.volume 7
dc.journal.volume 83
dc.journal.number 7 en_US
dc.publocation UK en_US
dc.identifier.startpage 1137 en_US
dc.identifier.endpage 1147 en_US
dc.cauo.name FEIT.School of Systems, Management and Leadership en_US
dc.conference Verified OK en_US
dc.for 0806 Information Systems
dc.for 0803 Computer Software
dc.personcode 020030
dc.personcode 999567
dc.personcode 100789
dc.percentage 50 en_US
dc.classification.name Computer Software en_US
dc.classification.type FOR-08 en_US
dc.edition en_US
dc.custom en_US
dc.date.activity en_US
dc.location.activity en_US
dc.description.keywords Classification
dc.description.keywords Cost-sensitive learning
dc.description.keywords Over-fitting
pubs.embargo.period Not known
pubs.organisational-group /University of Technology Sydney
pubs.organisational-group /University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group /University of Technology Sydney/Faculty of Engineering and Information Technology/School of Systems, Management and Leadership
pubs.organisational-group /University of Technology Sydney/Strength - Quantum Computation and Intelligent Systems
utslib.copyright.status Closed Access
utslib.copyright.date 2015-04-15 12:17:09.805752+10
pubs.consider-herdc true
utslib.collection.history Closed (ID: 3)

Files in this item

This item appears in the following Collection(s)

Show simple item record