PSCAN: Fast and exact structural graph clustering

Chang, L; Li, W; Lin, X; Qin, L; Zhang, W

PSCAN: Fast and exact structural graph clustering

Chang, L Li, W Lin, X Qin, L

Zhang, W

Permalink

Publication Type:: Conference Proceeding
Citation:: 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, 2016, pp. 253 - 264
Issue Date:: 2016-06-22

Closed Access

	Filename	Description	Size
	07498245.pdf	Published version	328.08 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Chang, L	en_US
dc.contributor.author	Li, W	en_US
dc.contributor.author	Lin, X	en_US
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Zhang, W	en_US
dc.date.issued	2016-06-22	en_US
dc.identifier.citation	2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, 2016, pp. 253 - 264	en_US
dc.identifier.isbn	9781509020195	en_US
dc.identifier.uri	http://hdl.handle.net/10453/92750
dc.description.abstract	© 2016 IEEE. In this paper, we study the problem of structural graph clustering, a fundamental problem in managing and analyzing graph data. Given a large graph G = (V, E), structural graph clustering is to assign vertices in V to clusters and to identify the sets of hub vertices and outlier vertices as well, such that vertices in the same cluster are densely connected to each other while vertices in different clusters are loosely connected to each other. Firstly, we prove that the existing SCAN approach is worst-case optimal. Nevertheless, it is still not scalable to large graphs due to exhaustively computing structural similarity for every pair of adjacent vertices. Secondly, we make three observations about structural graph clustering, which present opportunities for further optimization. Based on these observations, in this paper we develop a new two-step paradigm for scalable structural graph clustering. Thirdly, following this paradigm, we present a new approach aiming to reduce the number of structural similarity computations. Moreover, we propose optimization techniques to speed up checking whether two vertices are structure-similar to each other. Finally, we conduct extensive performance studies on large real and synthetic graphs, which demonstrate that our new approach outperforms the state-of-the-art approaches by over one order of magnitude. Noticeably, for the twitter graph with 1 billion edges, our approach takes 25 minutes while the state-of-the-art approach cannot finish even after 24 hours.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DE140100999
dc.relation	http://purl.org/au-research/grants/arc/DP160101513
dc.relation.ispartof	2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016	en_US
dc.relation.isbasedon	10.1109/ICDE.2016.7498245	en_US
dc.title	PSCAN: Fast and exact structural graph clustering	en_US
dc.type	Conference Proceeding
utslib.for	080101 Adaptive Agents and Intelligent Robotics	en_US
utslib.for	080109 Pattern Recognition and Data Mining	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2016 IEEE. In this paper, we study the problem of structural graph clustering, a fundamental problem in managing and analyzing graph data. Given a large graph G = (V, E), structural graph clustering is to assign vertices in V to clusters and to identify the sets of hub vertices and outlier vertices as well, such that vertices in the same cluster are densely connected to each other while vertices in different clusters are loosely connected to each other. Firstly, we prove that the existing SCAN approach is worst-case optimal. Nevertheless, it is still not scalable to large graphs due to exhaustively computing structural similarity for every pair of adjacent vertices. Secondly, we make three observations about structural graph clustering, which present opportunities for further optimization. Based on these observations, in this paper we develop a new two-step paradigm for scalable structural graph clustering. Thirdly, following this paradigm, we present a new approach aiming to reduce the number of structural similarity computations. Moreover, we propose optimization techniques to speed up checking whether two vertices are structure-similar to each other. Finally, we conduct extensive performance studies on large real and synthetic graphs, which demonstrate that our new approach outperforms the state-of-the-art approaches by over one order of magnitude. Noticeably, for the twitter graph with 1 billion edges, our approach takes 25 minutes while the state-of-the-art approach cannot finish even after 24 hours.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/92750