Distributed computing connected components with linear communication cost

Feng, X; Chang, L; Lin, X; Qin, L; Zhang, W; Yuan, L

Distributed computing connected components with linear communication cost

Feng, X Chang, L Lin, X

Qin, L

Zhang, W

Yuan, L

Permalink

Publication Type:: Journal Article
Citation:: Distributed and Parallel Databases, 2018, 36 (3), pp. 555 - 592
Issue Date:: 2018-09-01

Closed Access

	Filename	Description	Size
	Feng2018_Article_DistributedComputingConnectedC.pdf	Published Version	1.74 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Feng, X	en_US
dc.contributor.author	Chang, L	en_US
dc.contributor.author	Lin, X https://orcid.org/0000-0003-2396-7225	en_US
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Zhang, W https://orcid.org/0000-0001-6572-2600	en_US
dc.contributor.author	Yuan, L	en_US
dc.date.issued	2018-09-01	en_US
dc.identifier.citation	Distributed and Parallel Databases, 2018, 36 (3), pp. 555 - 592	en_US
dc.identifier.issn	0926-8782	en_US
dc.identifier.uri	http://hdl.handle.net/10453/132764
dc.description.abstract	© 2018, Springer Science+Business Media, LLC, part of Springer Nature. The paper studies three fundamental problems in graph analytics, computing connected components (CCs), biconnected components (BCCs), and 2-edge-connected components (ECCs) of a graph. With the recent advent of big data, developing efficient distributed algorithms for computing CCs, BCCs and ECCs of a big graph has received increasing interests. As with the existing research efforts, we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur O(m× # supersteps) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to compute CCs and BCCs with O(m) total communication cost. The total computation costs of our techniques are also smaller than that of the existing techniques in practice, though theoretically almost the same. Moreover, we also study distributed computing ECCs. We are the first to study this problem and an approach with O(m) total communication cost is proposed. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.	en_US
dc.relation.ispartof	Distributed and Parallel Databases	en_US
dc.relation.isbasedon	10.1007/s10619-018-7232-6	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Distributed computing connected components with linear communication cost	en_US
dc.type	Journal Article
utslib.citation.volume	3	en_US
utslib.citation.volume	36	en_US
utslib.for	0804 Data Format	en_US
utslib.for	0805 Distributed Computing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Science/School of Life Sciences
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	3	en_US
pubs.publication-status	Published	en_US
pubs.volume	36	en_US

Abstract:

© 2018, Springer Science+Business Media, LLC, part of Springer Nature. The paper studies three fundamental problems in graph analytics, computing connected components (CCs), biconnected components (BCCs), and 2-edge-connected components (ECCs) of a graph. With the recent advent of big data, developing efficient distributed algorithms for computing CCs, BCCs and ECCs of a big graph has received increasing interests. As with the existing research efforts, we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur O(m× # supersteps) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to compute CCs and BCCs with O(m) total communication cost. The total computation costs of our techniques are also smaller than that of the existing techniques in practice, though theoretically almost the same. Moreover, we also study distributed computing ECCs. We are the first to study this problem and an approach with O(m) total communication cost is proposed. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/132764