Scalable big graph processing in MapReduce

Publication Type:
Conference Proceeding
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014, pp. 827 - 838
Issue Date:
Filename Description Size
Thumbnail[2014 SIGMOD] Scalable Big Graph Processing in MapReduce.pdfPublished version511.06 kB
Adobe PDF
Full metadata record
MapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and fault-tolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce Class MRC and Minimal MapReduce ClassMMC to define the memory consumption, communication cost, CPU cost, and number of MapReduce rounds for an algorithm to execute in MapReduce. However, neither of them is designed for big graph processing in MapReduce, since the constraints inMMCcan be hardly achieved simultaneously on graphs and the conditions in MRC may induce scalability problems when processing big graph data. In this paper, we study scalable big graph processing in MapReduce. We introduce a Scalable Graph processing Class SGC by relaxing some constraints inMMCto make it suitable for scalable graph processing. We define two graph join operators in SGC, namely, EN join and NE join, using which a wide range of graph algorithms can be designed, including PageRank, breadth first search, graph keyword search, Connected Component (CC) computation, and Minimum Spanning Forest (MSF) computation. Remarkably, to the best of our knowledge, for the two fundamental graph problems CC and MSF computation, this is the first work that can achieve O(log(n)) MapReduce rounds with O(n + m) total communication cost in each round and constant memory consumption on each machine, where n and m are the number of nodes and edges in the graph respectively. We conducted extensive performance studies using two web-scale graphs Twitter-2010 and Friendster with different graph characteristics. The experimental results demonstrate that our algorithms can achieve high scalability in big graph processing. © 2014 ACM.
Please use this identifier to cite or link to this item: