Coordinating discernibility and independence scores of variables in a 2D space for efficient and accurate feature selection

Xie, J; Wang, M; Zhou, Y; Li, J

Coordinating discernibility and independence scores of variables in a 2D space for efficient and accurate feature selection

Xie, J Wang, M Zhou, Y Li, J

Permalink

Publication Type:: Conference Proceeding
Citation:: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, 9773 pp. 116 - 127
Issue Date:: 2016-01-01

Closed Access

	Filename	Description	Size
	paper.pdf	Accepted Manuscript version	542.65 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xie, J	en_US
dc.contributor.author	Wang, M	en_US
dc.contributor.author	Zhou, Y	en_US
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413	en_US
dc.date.issued	2016-01-01	en_US
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, 9773 pp. 116 - 127	en_US
dc.identifier.isbn	9783319422961	en_US
dc.identifier.issn	0302-9743	en_US
dc.identifier.uri	http://hdl.handle.net/10453/103177
dc.description.abstract	© Springer International Publishing Switzerland 2016. Feature selection is to remove redundant and irrelevant features from original ones of exemplars, so that a sparse and representative feature subset can be detected for building a more efficient and accurate classifier. This paper presents a novel definition for the discernibility and independence scores of a feature, and then constructs a two dimensional (2D) space with the feature’s independence as y-axis and discernibility as x-axis to rank features’ importance. This new method is named FSDI (Feature Selection based on Discernibility and Independence of a feature). The discernibility score of a feature is to measure the distinguishability of the feature to detect instances from different classes. The independence score is to measure the redundancy of a feature. All features are plotted in the 2D space according to their discernibility and independence coordinates. The area of the rectangular corresponding to a feature’s discernibility and independence in the 2D space is used as a criterion to rank the importance of the features. Top-k features with much higher importance than the rest ones are selected to form the sparse and representative feature subset for building an efficient and accurate classifier. Experimental results on 5 classical gene expression datasets demonstrate that our proposed FSDI algorithm can select the gene subset efficiently and has the best performance in classification. Our method provides a good solution to the bottleneck issues related to the high time complexity of the existing gene subset selection algorithms.	en_US
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_US
dc.relation.isbasedon	10.1007/978-3-319-42297-8_12	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Coordinating discernibility and independence scores of variables in a 2D space for efficient and accurate feature selection	en_US
dc.type	Conference Proceeding
utslib.citation.volume	9773	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	9773	en_US

Abstract:

© Springer International Publishing Switzerland 2016. Feature selection is to remove redundant and irrelevant features from original ones of exemplars, so that a sparse and representative feature subset can be detected for building a more efficient and accurate classifier. This paper presents a novel definition for the discernibility and independence scores of a feature, and then constructs a two dimensional (2D) space with the feature’s independence as y-axis and discernibility as x-axis to rank features’ importance. This new method is named FSDI (Feature Selection based on Discernibility and Independence of a feature). The discernibility score of a feature is to measure the distinguishability of the feature to detect instances from different classes. The independence score is to measure the redundancy of a feature. All features are plotted in the 2D space according to their discernibility and independence coordinates. The area of the rectangular corresponding to a feature’s discernibility and independence in the 2D space is used as a criterion to rank the importance of the features. Top-k features with much higher importance than the rest ones are selected to form the sparse and representative feature subset for building an efficient and accurate classifier. Experimental results on 5 classical gene expression datasets demonstrate that our proposed FSDI algorithm can select the gene subset efficiently and has the best performance in classification. Our method provides a good solution to the bottleneck issues related to the high time complexity of the existing gene subset selection algorithms.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/103177