Coupled Attribute Similarity Learning on Categorical Data

Wang, C; Dong, X; Zhou, F; Cao, L; Chi, CH

Coupled Attribute Similarity Learning on Categorical Data

Wang, C

Dong, X Zhou, F Cao, L

Chi, CH

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Neural Networks and Learning Systems, 2015, 26 (4), pp. 781 - 797
Issue Date:: 2015-04-01

Closed Access

	Filename	Description	Size
	co.pdf	Published Version	3.44 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wang, C https://orcid.org/0000-0002-2890-0057	en_US
dc.contributor.author	Dong, X	en_US
dc.contributor.author	Zhou, F	en_US
dc.contributor.author	Cao, L https://orcid.org/0000-0003-1562-9429	en_US
dc.contributor.author	Chi, CH	en_US
dc.date.issued	2015-04-01	en_US
dc.identifier.citation	IEEE Transactions on Neural Networks and Learning Systems, 2015, 26 (4), pp. 781 - 797	en_US
dc.identifier.issn	2162-237X	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121747
dc.description.abstract	© 2012 IEEE. Attribute independence has been taken as a major assumption in the limited research that has been conducted on similarity analysis for categorical data, especially unsupervised learning. However, in real-world data sources, attributes are more or less associated with each other in terms of certain coupling relationships. Accordingly, recent works on attribute dependency aggregation have introduced the co-occurrence of attribute values to explore attribute coupling, but they only present a local picture in analyzing categorical data similarity. This is inadequate for deep analysis, and the computational complexity grows exponentially when the data scale increases. This paper proposes an efficient data-driven similarity learning approach that generates a coupled attribute similarity measure for nominal objects with attribute couplings to capture a global picture of attribute similarity. It involves the frequency-based intra-coupled similarity within an attribute and the inter-coupled similarity upon value co-occurrences between attributes, as well as their integration on the object level. In particular, four measures are designed for the inter-coupled similarity to calculate the similarity between two categorical values by considering their relationships with other attributes in terms of power set, universal set, joint set, and intersection set. The theoretical analysis reveals the equivalent accuracy and superior efficiency of the measure based on the intersection set, particularly for large-scale data sets. Intensive experiments of data structure and clustering algorithms incorporating the coupled dissimilarity metric achieve a significant performance improvement on state-of-the-art measures and algorithms on 13 UCI data sets, which is confirmed by the statistical analysis. The experiment results show that the proposed coupled attribute similarity is generic, and can effectively and efficiently capture the intrinsic and global interactions within and between attributes for especially large-scale categorical data sets. In addition, two new coupled categorical clustering algorithms, i.e., CROCK and CLIMBO are proposed, and they both outperform the original ones in terms of clustering quality on UCI data sets and bibliographic data.	en_US
dc.relation.ispartof	IEEE Transactions on Neural Networks and Learning Systems	en_US
dc.relation.isbasedon	10.1109/TNNLS.2014.2325872	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Coupled Attribute Similarity Learning on Categorical Data	en_US
dc.type	Journal Article
utslib.citation.volume	4	en_US
utslib.citation.volume	26	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.issue	4	en_US
pubs.publication-status	Published	en_US
pubs.volume	26	en_US

Abstract:

© 2012 IEEE. Attribute independence has been taken as a major assumption in the limited research that has been conducted on similarity analysis for categorical data, especially unsupervised learning. However, in real-world data sources, attributes are more or less associated with each other in terms of certain coupling relationships. Accordingly, recent works on attribute dependency aggregation have introduced the co-occurrence of attribute values to explore attribute coupling, but they only present a local picture in analyzing categorical data similarity. This is inadequate for deep analysis, and the computational complexity grows exponentially when the data scale increases. This paper proposes an efficient data-driven similarity learning approach that generates a coupled attribute similarity measure for nominal objects with attribute couplings to capture a global picture of attribute similarity. It involves the frequency-based intra-coupled similarity within an attribute and the inter-coupled similarity upon value co-occurrences between attributes, as well as their integration on the object level. In particular, four measures are designed for the inter-coupled similarity to calculate the similarity between two categorical values by considering their relationships with other attributes in terms of power set, universal set, joint set, and intersection set. The theoretical analysis reveals the equivalent accuracy and superior efficiency of the measure based on the intersection set, particularly for large-scale data sets. Intensive experiments of data structure and clustering algorithms incorporating the coupled dissimilarity metric achieve a significant performance improvement on state-of-the-art measures and algorithms on 13 UCI data sets, which is confirmed by the statistical analysis. The experiment results show that the proposed coupled attribute similarity is generic, and can effectively and efficiently capture the intrinsic and global interactions within and between attributes for especially large-scale categorical data sets. In addition, two new coupled categorical clustering algorithms, i.e., CROCK and CLIMBO are proposed, and they both outperform the original ones in terms of clustering quality on UCI data sets and bibliographic data.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121747