Feature selection with biased sample distributions

Kamal, AHM; Zhu, X; Pandya, A; Hsu, S

Feature selection with biased sample distributions

Kamal, AHM Zhu, X Pandya, A Hsu, S

Permalink

Publication Type:: Conference Proceeding
Citation:: 2009 IEEE International Conference on Information Reuse and Integration, IRI 2009, 2009, pp. 23 - 28
Issue Date:: 2009-11-17

Closed Access

	Filename	Description	Size
	2009001676OK.pdf		1.02 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Kamal, AHM	en_US
dc.contributor.author	Zhu, X	en_US
dc.contributor.author	Pandya, A	en_US
dc.contributor.author	Hsu, S	en_US
dc.date.issued	2009-11-17	en_US
dc.identifier.citation	2009 IEEE International Conference on Information Reuse and Integration, IRI 2009, 2009, pp. 23 - 28	en_US
dc.identifier.isbn	9781424441167	en_US
dc.identifier.uri	http://hdl.handle.net/10453/19195
dc.description.abstract	Feature selection concerns the problem of selecting a number of important features (w.r.t. the class labels) in order to build accurate prediction models. Traditional feature selection methods, however, fail to take the sample distributions into the consideration which may lead to poor predictions for minority class examples. Due to the sophistication and the cost involved in the data collection process, many applications, such as Biomedical research, commonly face biased data collections with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). For these applications, the minority class examples, such as disease samples, credit card frauds, and network intrusions, are only a small portion of the data collections but deserve full attentions for accurate prediction. In this paper, we propose three filtering techniques, Higher Weight (HW), Differential Minority Repeat (DMR) and Balanced Minority Repeat (BMR), to identify important features from biased data collections. Experimental comparisons with the ReliefF method on five datasets demonstrate the effectiveness of the proposed methods in selecting informative features from data with biased sample distributions.	en_US
dc.relation.ispartof	2009 IEEE International Conference on Information Reuse and Integration, IRI 2009	en_US
dc.relation.isbasedon	10.1109/IRI.2009.5211613	en_US
dc.title	Feature selection with biased sample distributions	en_US
dc.type	Conference Proceeding
utslib.for	0806 Information Systems	en_US
dc.location.activity	Las Vegas, USA	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Feature selection concerns the problem of selecting a number of important features (w.r.t. the class labels) in order to build accurate prediction models. Traditional feature selection methods, however, fail to take the sample distributions into the consideration which may lead to poor predictions for minority class examples. Due to the sophistication and the cost involved in the data collection process, many applications, such as Biomedical research, commonly face biased data collections with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). For these applications, the minority class examples, such as disease samples, credit card frauds, and network intrusions, are only a small portion of the data collections but deserve full attentions for accurate prediction. In this paper, we propose three filtering techniques, Higher Weight (HW), Differential Minority Repeat (DMR) and Balanced Minority Repeat (BMR), to identify important features from biased data collections. Experimental comparisons with the ReliefF method on five datasets demonstrate the effectiveness of the proposed methods in selecting informative features from data with biased sample distributions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/19195