Cost-sensitive classification with inadequate labeled data

Wang, T; Qin, Z; Zhang, S; Zhang, C

Cost-sensitive classification with inadequate labeled data

Wang, T Qin, Z Zhang, S Zhang, C

Permalink

Publication Type:: Journal Article
Citation:: Information Systems, 2012, 37 (5), pp. 508 - 516
Issue Date:: 2012-07-01

Closed Access

	Filename	Description	Size
	2011001315OK.pdf		371.19 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wang, T	en_US
dc.contributor.author	Qin, Z	en_US
dc.contributor.author	Zhang, S	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2012-07-01	en_US
dc.identifier.citation	Information Systems, 2012, 37 (5), pp. 508 - 516	en_US
dc.identifier.issn	0306-4379	en_US
dc.identifier.uri	http://hdl.handle.net/10453/22205
dc.description.abstract	It is an actual and challenging issue to learn cost-sensitive models from those datasets that are with few labeled data and plentiful unlabeled data, because some time labeled data are very difficult, time consuming and/or expensive to obtain. To solve this issue, in this paper we proposed two classification strategies to learn cost-sensitive classifier from training datasets with both labeled and unlabeled data, based on Expectation Maximization (EM). The first method, Direct-EM, uses EM to build a semi-supervised classifier, then directly computes the optimal class label for each test example using the class probability produced by the learning model. The second method, CS-EM, modifies EM by incorporating misclassification cost into the probability estimation process. We conducted extensive experiments to evaluate the efficiency, and results show that when using only a small number of labeled training examples, the CS-EM outperforms the other competing methods on majority of the selected UCI data sets across different cost ratios, especially when cost ratio is high. © 2011 Elsevier Ltd. All rights reserved.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP0985456
dc.relation.ispartof	Information Systems	en_US
dc.relation.isbasedon	10.1016/j.is.2011.10.009	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Cost-sensitive classification with inadequate labeled data	en_US
dc.type	Journal Article
utslib.citation.volume	5	en_US
utslib.citation.volume	37	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	5	en_US
pubs.publication-status	Published	en_US
pubs.volume	37	en_US

Abstract:

It is an actual and challenging issue to learn cost-sensitive models from those datasets that are with few labeled data and plentiful unlabeled data, because some time labeled data are very difficult, time consuming and/or expensive to obtain. To solve this issue, in this paper we proposed two classification strategies to learn cost-sensitive classifier from training datasets with both labeled and unlabeled data, based on Expectation Maximization (EM). The first method, Direct-EM, uses EM to build a semi-supervised classifier, then directly computes the optimal class label for each test example using the class probability produced by the learning model. The second method, CS-EM, modifies EM by incorporating misclassification cost into the probability estimation process. We conducted extensive experiments to evaluate the efficiency, and results show that when using only a small number of labeled training examples, the CS-EM outperforms the other competing methods on majority of the selected UCI data sets across different cost ratios, especially when cost ratio is high. © 2011 Elsevier Ltd. All rights reserved.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/22205