CNFL: Categorical to Numerical Feature Learning for Clustering and Classification
- Publication Type:
- Conference Proceeding
- Proceedings - 2017 IEEE 2nd International Conference on Data Science in Cyberspace, DSC 2017, 2017, pp. 585 - 594
- Issue Date:
© 2017 IEEE. Categorical data exist in many domains, such as text data, gene sequences, or data from Census Bureau. While such data are easy for human interpretation, they cannot be directly used by many classification methods, such as support vector machines and others, which require underlying data to be represented in a numerical format. To date, most existing learning methods convert categorical data into binary features, which may result in high dimensionality and sparsity. In this paper, we propose a method to convert category data into an arbitrary number of numerical features. Our method, named CNFL, uses simple matching to calculate proximity between instances, then uses an eigendecomposition to convert the proximity matrix into a low-dimensional space, which can be used to represent instances for classification or clustering. Experiments on 21 datasets demonstrate that numerical features learned from CNFL can effectively represent the original data for machine learning tasks.
Please use this identifier to cite or link to this item: