CNFL: Categorical to Numerical Feature Learning for Clustering and Classification

Golinko, E; Sonderman, T; Zhu, X

CNFL: Categorical to Numerical Feature Learning for Clustering and Classification

Golinko, E Sonderman, T Zhu, X

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - 2017 IEEE 2nd International Conference on Data Science in Cyberspace, DSC 2017, 2017, pp. 585 - 594
Issue Date:: 2017-08-08

Closed Access

	Filename	Description	Size
	08005535.pdf	Published version	237.02 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Golinko, E	en_US
dc.contributor.author	Sonderman, T	en_US
dc.contributor.author	Zhu, X	en_US
dc.date.issued	2017-08-08	en_US
dc.identifier.citation	Proceedings - 2017 IEEE 2nd International Conference on Data Science in Cyberspace, DSC 2017, 2017, pp. 585 - 594	en_US
dc.identifier.isbn	9781538615997	en_US
dc.identifier.uri	http://hdl.handle.net/10453/126793
dc.description.abstract	© 2017 IEEE. Categorical data exist in many domains, such as text data, gene sequences, or data from Census Bureau. While such data are easy for human interpretation, they cannot be directly used by many classification methods, such as support vector machines and others, which require underlying data to be represented in a numerical format. To date, most existing learning methods convert categorical data into binary features, which may result in high dimensionality and sparsity. In this paper, we propose a method to convert category data into an arbitrary number of numerical features. Our method, named CNFL, uses simple matching to calculate proximity between instances, then uses an eigendecomposition to convert the proximity matrix into a low-dimensional space, which can be used to represent instances for classification or clustering. Experiments on 21 datasets demonstrate that numerical features learned from CNFL can effectively represent the original data for machine learning tasks.	en_US
dc.relation.ispartof	Proceedings - 2017 IEEE 2nd International Conference on Data Science in Cyberspace, DSC 2017	en_US
dc.relation.isbasedon	10.1109/DSC.2017.87	en_US
dc.title	CNFL: Categorical to Numerical Feature Learning for Clustering and Classification	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2017 IEEE. Categorical data exist in many domains, such as text data, gene sequences, or data from Census Bureau. While such data are easy for human interpretation, they cannot be directly used by many classification methods, such as support vector machines and others, which require underlying data to be represented in a numerical format. To date, most existing learning methods convert categorical data into binary features, which may result in high dimensionality and sparsity. In this paper, we propose a method to convert category data into an arbitrary number of numerical features. Our method, named CNFL, uses simple matching to calculate proximity between instances, then uses an eigendecomposition to convert the proximity matrix into a low-dimensional space, which can be used to represent instances for classification or clustering. Experiments on 21 datasets demonstrate that numerical features learned from CNFL can effectively represent the original data for machine learning tasks.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/126793