Coordinating discernibility and independence scores of variables in a 2D space for efficient and accurate feature selection

Publisher:
Springer
Publication Type:
Conference Proceeding
Citation:
12th Intelligent Computing Methodologies Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, 9773 pp. 116 - 127
Issue Date:
2016-01-01
Full metadata record
Files in This Item:
Filename Description Size
paper.pdfAccepted Manuscript version542.65 kB
Adobe PDF
© Springer International Publishing Switzerland 2016.Feature selection is to remove redundant and irrelevant features from original ones of exemplars, so that a sparse and representative feature subset can be detected for building a more efficient and accurate classifier. This paper presents a novel definition for the discernibility and independence scores of a feature, and then constructs a two dimensional (2D) space with the feature’s independence as y-axis and discernibility as x-axis to rank features’ importance. This new method is named FSDI (Feature Selection based on Discernibility and Independence of a feature). The discernibility score of a feature is to measure the distinguishability of the feature to detect instances from different classes. The independence score is to measure the redundancy of a feature. All features are plotted in the 2D space according to their discernibility and independence coordinates. The area of the rectangular corresponding to a feature’s discernibility and independence in the 2D space is used as a criterion to rank the importance of the features. Top-k features with much higher importance than the rest ones are selected to form the sparse and representative feature subset for building an efficient and accurate classifier. Experimental results on 5 classical gene expression datasets demonstrate that our proposed FSDI algorithm can select the gene subset efficiently and has the best performance in classification. Our method provides a good solution to the bottleneck issues related to the high time complexity of the existing gene subset selection algorithms.
Please use this identifier to cite or link to this item: