Efficient triangle listing for billion-scale graphs

Zhang, H; Zhu, Y; Qin, L; Cheng, H; Yu, JX

Efficient triangle listing for billion-scale graphs

Zhang, H Zhu, Y Qin, L

Cheng, H Yu, JX

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, 2016, pp. 813 - 822
Issue Date:: 2016-01-01

Closed Access

	Filename	Description	Size
	07840674.pdf	Published version	266.54 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, H	en_US
dc.contributor.author	Zhu, Y	en_US
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Cheng, H	en_US
dc.contributor.author	Yu, JX https://orcid.org/0000-0002-9738-827X	en_US
dc.date.issued	2016-01-01	en_US
dc.identifier.citation	Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, 2016, pp. 813 - 822	en_US
dc.identifier.isbn	9781467390040	en_US
dc.identifier.uri	http://hdl.handle.net/10453/127278
dc.description.abstract	© 2016 IEEE. This paper addresses the classical triangle listing problem, which aims at enumerating all the tuples of three vertices connected with each other by edges. This problem has been intensively studied in internal and external memory, but it is still an urgent challenge in distributed environment where multiple machines across the network can be utilized to achieve good performance and scalability. As one of the de facto computing methodologies in distributed environment, MapReduce has been used in some of existing triangle listing algorithms. However, these algorithms usually need to shuffle a huge amount of intermediate data, which seriously hinders the scalability on large scale graphs. In this paper, we propose a new triangle listing algorithm in MapReduce, FTL, which utilizes a light weight data structure to substantially reduce the intermediate data transferred during the shuffle stage, and also is equipped with multiple-round techniques to ease the burden on memory and network bandwidth when dealing with graphs at billion scale. We prove that the size of the intermediate data can be well bounded near to the number of triangles in the graph. To further reduce the shuffle size in each round, we also devise a compact data structure to store the intermediate data, which can save space up to 2/3. The extensive experimental results show that our algorithms outperform existing competitors by several times on large real world graphs.	en_US
dc.relation.ispartof	Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016	en_US
dc.relation.isbasedon	10.1109/BigData.2016.7840674	en_US
dc.title	Efficient triangle listing for billion-scale graphs	en_US
dc.type	Conference Proceeding
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2016 IEEE. This paper addresses the classical triangle listing problem, which aims at enumerating all the tuples of three vertices connected with each other by edges. This problem has been intensively studied in internal and external memory, but it is still an urgent challenge in distributed environment where multiple machines across the network can be utilized to achieve good performance and scalability. As one of the de facto computing methodologies in distributed environment, MapReduce has been used in some of existing triangle listing algorithms. However, these algorithms usually need to shuffle a huge amount of intermediate data, which seriously hinders the scalability on large scale graphs. In this paper, we propose a new triangle listing algorithm in MapReduce, FTL, which utilizes a light weight data structure to substantially reduce the intermediate data transferred during the shuffle stage, and also is equipped with multiple-round techniques to ease the burden on memory and network bandwidth when dealing with graphs at billion scale. We prove that the size of the intermediate data can be well bounded near to the number of triangles in the graph. To further reduce the shuffle size in each round, we also devise a compact data structure to store the intermediate data, which can save space up to 2/3. The extensive experimental results show that our algorithms outperform existing competitors by several times on large real world graphs.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/127278