Efficient structural node similarity computation on billion-scale graphs

Chen, X; Lai, L; Qin, L; Lin, X

Efficient structural node similarity computation on billion-scale graphs

Chen, X Lai, L Qin, L

Lin, X

Permalink

Publisher:: Springer Science and Business Media LLC
Publication Type:: Journal Article
Citation:: VLDB Journal, 2021, 30, (3), pp. 471-493
Issue Date:: 2021-05-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 1 Feb 2022

Adobe PDF

Download Accepted versionAdobe PDF (1.81 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Chen, X
dc.contributor.author	Lai, L
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062
dc.contributor.author	Lin, X
dc.date.accessioned	2022-01-31T03:56:31Z
dc.date.available	2022-01-31T03:56:31Z
dc.date.issued	2021-05-01
dc.identifier.citation	VLDB Journal, 2021, 30, (3), pp. 471-493
dc.identifier.issn	1066-8888
dc.identifier.issn	0949-877X
dc.identifier.uri	http://hdl.handle.net/10453/153932
dc.description.abstract	Structural node similarity is widely used in analyzing complex networks. As one of the structural node similarity metrics, role similarity has the good merit of indicating automorphism (isomorphism). Existing algorithms to compute role similarity (e.g., RoleSim and NED) suffer from severe performance bottlenecks and thus cannot handle large real-world graphs. In this paper, we propose a new framework, namely StructSim, to compute nodes’ role similarity. Under this framework, we first prove that StructSim is an admissible role similarity metric based on the maximum matching. While the maximum matching is still too costly to scale, we then devise the BinCount matching that not only is efficient to compute but also guarantees the admissibility of StructSim. BinCount-based StructSim admits a precomputed index to query a single pair of node in O(klog D) time, where k is a small user-defined parameter and D is the maximum node degree. To build the index, we further devise an FM-sketch-based technique that can handle graphs with billions of edges. Extensive empirical studies show that StructSim performs much better than the existing works regarding both effectiveness and efficiency when applied to compute structural node similarities on the real-world graphs.
dc.language	en
dc.publisher	Springer Science and Business Media LLC
dc.relation	http://purl.org/au-research/grants/arc/DP180103096
dc.relation	http://purl.org/au-research/grants/arc/FT200100787
dc.relation.ispartof	VLDB Journal
dc.relation.isbasedon	10.1007/s00778-021-00654-9
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.rights	This is a post-peer-review, pre-copyedit version of a Journal article published in the VLDB Journal The final authenticated version is available online at: https://link.springer.com/article/10.1007/s00778-021-00654-9
dc.subject	0804 Data Format, 0805 Distributed Computing, 0806 Information Systems
dc.subject.classification	Information Systems
dc.title	Efficient structural node similarity computation on billion-scale graphs
dc.type	Journal Article
utslib.citation.volume	30
utslib.for	0804 Data Format
utslib.for	0805 Distributed Computing
utslib.for	0806 Information Systems
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2022-02-01T00:00:00+1000Z
dc.date.updated	2022-01-31T03:56:30Z
pubs.issue	3
pubs.publication-status	Published
pubs.volume	30
utslib.citation.issue	3

Abstract:

Structural node similarity is widely used in analyzing complex networks. As one of the structural node similarity metrics, role similarity has the good merit of indicating automorphism (isomorphism). Existing algorithms to compute role similarity (e.g., RoleSim and NED) suffer from severe performance bottlenecks and thus cannot handle large real-world graphs. In this paper, we propose a new framework, namely StructSim, to compute nodes’ role similarity. Under this framework, we first prove that StructSim is an admissible role similarity metric based on the maximum matching. While the maximum matching is still too costly to scale, we then devise the BinCount matching that not only is efficient to compute but also guarantees the admissibility of StructSim. BinCount-based StructSim admits a precomputed index to query a single pair of node in O(klog D) time, where k is a small user-defined parameter and D is the maximum node degree. To build the index, we further devise an FM-sketch-based technique that can handle graphs with billions of edges. Extensive empirical studies show that StructSim performs much better than the existing works regarding both effectiveness and efficiency when applied to compute structural node similarities on the real-world graphs.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/153932