Feature selection using hierarchical feature clustering

Liu, H; Wu, X; Zhang, S

Feature selection using hierarchical feature clustering

Liu, H Wu, X Zhang, S

Permalink

Publisher:: ACM
Publication Type:: Conference Proceeding
Citation:: CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 974 - 984
Issue Date:: 2011-01

Closed Access

	Filename	Description	Size
	2011004557OK.pdf		400.12 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Liu, H	en_US
dc.contributor.author	Wu, X	en_US
dc.contributor.author	Zhang, S	en_US
dc.contributor.editor	NA	en_US
dc.date	2011-10-24	en_US
dc.date.issued	2011-01	en_US
dc.identifier.citation	CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 974 - 984	en_US
dc.identifier.uri	http://hdl.handle.net/10453/19185
dc.description.abstract	One of the challenges in data mining is the dimensionality of data, which is often very high and prevalent in many domains, such as text categorization and bio-informatics. The high-dimensionality of data may bring many adverse situations to traditional learning algorithms. To cope with this issue, feature selection has been put forward. Currently, many efforts have been attempted in this field and lots of feature selection algorithms have been developed. In this paper we propose a new selection method to pick discriminative features by using information measurement. The main characteristic of our selection method is that the selection procedure works like feature clustering in a hierarchically agglomerative way, where each feature is considered as a cluster and the between-cluster and within-cluster distances are measured by mutual information and the coefficient of relevancy respectively. Consequently, the final aggregated cluster is the selection result, which has the minimal redundancy among its members and the maximal relevancy with the class labels. The simulation experiments on seven datasets show that the proposed method outperforms other popular feature selection algorithms in classification performance	en_US
dc.publisher	ACM	en_US
dc.relation.ispartof	CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management	en_US
dc.relation.ispartof	ACM international conference on Information and knowledge management	en_US
dc.relation.isbasedon	10.1145/2063576.2063716	en_US
dc.rights	© ACM 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 974 - 984 http://doi.acm.org/10.1145/2063576.2063716	en_US
dc.title	Feature selection using hierarchical feature clustering	en_US
dc.type	Conference Proceeding
utslib.location	USA	en_US
utslib.location.activity	Glasgow, Scotland	en_US
utslib.for	0806 Information Systems	en_US
dc.location.activity	Glasgow, Scotland	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.consider-herdc	true	en_US
pubs.place-of-publication	USA	en_US
pubs.start-date	2011-10-24	en_US

Abstract:

One of the challenges in data mining is the dimensionality of data, which is often very high and prevalent in many domains, such as text categorization and bio-informatics. The high-dimensionality of data may bring many adverse situations to traditional learning algorithms. To cope with this issue, feature selection has been put forward. Currently, many efforts have been attempted in this field and lots of feature selection algorithms have been developed. In this paper we propose a new selection method to pick discriminative features by using information measurement. The main characteristic of our selection method is that the selection procedure works like feature clustering in a hierarchically agglomerative way, where each feature is considered as a cluster and the between-cluster and within-cluster distances are measured by mutual information and the coefficient of relevancy respectively. Consequently, the final aggregated cluster is the selection result, which has the minimal redundancy among its members and the maximal relevancy with the class labels. The simulation experiments on seven datasets show that the proposed method outperforms other popular feature selection algorithms in classification performance

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/19185