Scalable subgraph enumeration in MapReduce: a cost-oriented approach

Lai, L; Qin, L; Lin, X; Chang, L

Scalable subgraph enumeration in MapReduce: a cost-oriented approach

Lai, L Qin, L

Lin, X

Chang, L

Permalink

Publication Type:: Journal Article
Citation:: VLDB Journal, 2017, 26 (3), pp. 421 - 446
Issue Date:: 2017-06-01

Closed Access

	Filename	Description	Size
	10.1007%2Fs00778-017-0459-4.pdf	Published Version	1.71 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Lai, L	en_US
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Lin, X https://orcid.org/0000-0003-2396-7225	en_US
dc.contributor.author	Chang, L	en_US
dc.date.issued	2017-06-01	en_US
dc.identifier.citation	VLDB Journal, 2017, 26 (3), pp. 421 - 446	en_US
dc.identifier.issn	1066-8888	en_US
dc.identifier.uri	http://hdl.handle.net/10453/124470
dc.description.abstract	© 2017, Springer-Verlag Berlin Heidelberg. Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph enumeration fall short in handling large graphs due to the involvement of computationally intensive subgraph isomorphism operations. Thus, some recent researches focus on solving the problem using MapReduce. Nevertheless, exiting MapReduce approaches are not scalable to handle very large graphs since they either produce a huge number of partial results or consume a large amount of memory. Motivated by this, in this paper, we propose a new algorithm TwinTwigJoin based on a left-deep-join framework in MapReduce, in which the basic join unit is a TwinTwig (an edge or two incident edges of a node). We show that in the Erdös–Rényi random graph model, TwinTwigJoin is instance optimal in the left-deep-join framework under reasonable assumptions, and we devise an algorithm to compute the optimal join plan. We further discuss how our approach can be adapted to handle the power-law random graph model. Three optimization strategies are explored to improve our algorithm. Ultimately, by aggregating equivalent nodes into a compressed node, we construct the compressed graph, upon which the subgraph enumeration is further improved. We conduct extensive performance studies in several real graphs, one of which contains billions of edges. Our approach significantly outperforms existing solutions in all tests.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DE140100999
dc.relation.ispartof	VLDB Journal	en_US
dc.relation.isbasedon	10.1007/s00778-017-0459-4	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Scalable subgraph enumeration in MapReduce: a cost-oriented approach	en_US
dc.type	Journal Article
utslib.citation.volume	3	en_US
utslib.citation.volume	26	en_US
utslib.for	0804 Data Format	en_US
utslib.for	0805 Distributed Computing	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	3	en_US
pubs.publication-status	Published	en_US
pubs.volume	26	en_US

Abstract:

© 2017, Springer-Verlag Berlin Heidelberg. Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph enumeration fall short in handling large graphs due to the involvement of computationally intensive subgraph isomorphism operations. Thus, some recent researches focus on solving the problem using MapReduce. Nevertheless, exiting MapReduce approaches are not scalable to handle very large graphs since they either produce a huge number of partial results or consume a large amount of memory. Motivated by this, in this paper, we propose a new algorithm TwinTwigJoin based on a left-deep-join framework in MapReduce, in which the basic join unit is a TwinTwig (an edge or two incident edges of a node). We show that in the Erdös–Rényi random graph model, TwinTwigJoin is instance optimal in the left-deep-join framework under reasonable assumptions, and we devise an algorithm to compute the optimal join plan. We further discuss how our approach can be adapted to handle the power-law random graph model. Three optimization strategies are explored to improve our algorithm. Ultimately, by aggregating equivalent nodes into a compressed node, we construct the compressed graph, upon which the subgraph enumeration is further improved. We conduct extensive performance studies in several real graphs, one of which contains billions of edges. Our approach significantly outperforms existing solutions in all tests.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/124470