Feature selection using hierarchical feature clustering

Publication Type:
Conference Proceeding
CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 974 - 984
Issue Date:
Full metadata record
Files in This Item:
Filename Description SizeFormat
2011004557OK.pdf400.12 kBAdobe PDF
One of the challenges in data mining is the dimensionality of data, which is often very high and prevalent in many domains, such as text categorization and bio-informatics. The high-dimensionality of data may bring many adverse situations to traditional learning algorithms. To cope with this issue, feature selection has been put forward. Currently, many efforts have been attempted in this field and lots of feature selection algorithms have been developed. In this paper we propose a new selection method to pick discriminative features by using information measurement. The main characteristic of our selection method is that the selection procedure works like feature clustering in a hierarchically agglomerative way, where each feature is considered as a cluster and the between-cluster and within-cluster distances are measured by mutual information and the coefficient of relevancy respectively. Consequently, the final aggregated cluster is the selection result, which has the minimal redundancy among its members and the maximal relevancy with the class labels. The simulation experiments on seven datasets show that the proposed method outperforms other popular feature selection algorithms in classification performance
Please use this identifier to cite or link to this item: