Computing Connected Components with linear communication cost in pregel-like systems

Feng, X; Chang, L; Lin, X; Qin, L; Zhang, W

Computing Connected Components with linear communication cost in pregel-like systems

Feng, X Chang, L Lin, X Qin, L

Zhang, W

Permalink

Publication Type:: Conference Proceeding
Citation:: 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, 2016, pp. 85 - 96
Issue Date:: 2016-06-22

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (824.03 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Feng, X	en_US
dc.contributor.author	Chang, L	en_US
dc.contributor.author	Lin, X	en_US
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Zhang, W	en_US
dc.date.issued	2016-06-22	en_US
dc.identifier.citation	2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, 2016, pp. 85 - 96	en_US
dc.identifier.isbn	9781509020195	en_US
dc.identifier.uri	http://hdl.handle.net/10453/102952
dc.description.abstract	© 2016 IEEE. The paper studies two fundamental problems in graph analytics: computing Connected Components (CCs) and computing BiConnected Components (BCCs) of a graph. With the recent advent of Big Data, developing effcient distributed algorithms for computing CCs and BCCs of a big graph has received increasing interests. As with the existing research efforts, in this paper we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur O(m × #supersteps) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to reduce the total communication costs from O(m×#supersteps) to O(m), for both computing CCs and computing BCCs. Moreover, the total computation costs of our techniques are smaller than that of the existing techniques in practice, though theoretically they are almost the same. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.	en_US
dc.relation.ispartof	2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016	en_US
dc.relation.isbasedon	10.1109/ICDE.2016.7498231	en_US
dc.title	Computing Connected Components with linear communication cost in pregel-like systems	en_US
dc.type	Conference Proceeding
utslib.for	080101 Adaptive Agents and Intelligent Robotics	en_US
utslib.for	080109 Pattern Recognition and Data Mining	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US

Abstract:

© 2016 IEEE. The paper studies two fundamental problems in graph analytics: computing Connected Components (CCs) and computing BiConnected Components (BCCs) of a graph. With the recent advent of Big Data, developing effcient distributed algorithms for computing CCs and BCCs of a big graph has received increasing interests. As with the existing research efforts, in this paper we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur O(m × #supersteps) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to reduce the total communication costs from O(m×#supersteps) to O(m), for both computing CCs and computing BCCs. Moreover, the total computation costs of our techniques are smaller than that of the existing techniques in practice, though theoretically they are almost the same. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/102952