Computing Connected Components with linear communication cost in pregel-like systems

The paper studies two fundamental problems in graph analytics: computing Connected Components (CCs) and computing BiConnected Components (BCCs) of a graph. With the recent advent of Big Data, developing effcient distributed algorithms for computing CCs and BCCs of a big graph has received increasing interests. As with the existing research efforts, in this paper we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur O(m × #supersteps) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to reduce the total communication costs from O(m×#supersteps) to O(m), for both computing CCs and computing BCCs. Moreover, the total computation costs of our techniques are smaller than that of the existing techniques in practice, though theoretically they are almost the same. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.


introduction
A graph G = (V, E) is usually used to model data and their complex relationships in many real applications; for example, in social networks, information networks, and communication networks. Computing Connected Components (CCs) and computing BiConnected Components (BCCs) of a graph are two fundamental operations in graph analytics and are of great importance [5,12,20,22,34]. Given an undirected graph G, a CC of G is a maximal subgraph that is connected (i.e., any pair of vertices are connected by a path). A BCC of G is a maximal subgraph such that it remains connected after removing any single vertex. For example, the graph in Figure 1.1 has two CCs: the subgraphs induced by vertices {v 1 , . . . , v 9 } and by {v 10 , v 11 }, respectively; the left CC is further divided into two BCCs: the subgraphs induced by vertices {v 4 , v 5 , v 8 } and by {v 1 , v 2 , v 3 , v 6 , v 7 , v 8 , v 9 }, respectively. Applications. Computing CCs is a key building block in processing large graphs. For example, CCs are basic structures for computing graph fractal dimensions when analyzing very large-scale web graphs [12]. Computing CCs also plays an important role in community detection in massive graphs by serving as a preprocessing step [7]. Computing BCCs is very important for analyzing large graphs. For example, BCCs can be used in the measurement study of topology patterns of large-scale community networks to measure their resilience to random failures [33]. Computing BCCs may also assist to identify the set of articulation points of a graph that typically belong to multiple BCCs. The articulation points are important to be identified in many applications; for example, in social networks [10], distributed networks [19], bioinformatics [20], and wireless sensor networks [31]. Distributed Computation. Driven by many recent applications involving large-scale graphs, there is a very strong demand to develop distributed computing techniques to process such large-scale graphs. For example, the topology of Facebook users is modeled as a graph with more than 1.4 billion vertices (i.e., users) and 0.4 trillion edges (i.e., relationships between users) in 2014 1 , and a snapshot of web graph in 2012 has 0.98 billion web pages and 42.6 billion hyperlinks 2 .
It is well known that regarding a single machine, CCs and BCCs of a graph can be computed by an in-memory algorithm in linear time (i.e., O(m)) [5,11] and by an external-memory algorithm with I/O cost O(m × log n × log log n) [18], where n and m are the number of vertices and the number of edges of the input graph, respectively. However, these techniques cannot be extended to the distributed computation due to their sequential computing nature. Thus, developing novel, efficient distributed techniques for computing CCs and BCCs of a large-scale graph has received increasing interests recently (e.g., [21,22,23,34]). Most of the existing techniques are based on open-source implementations of the Pregel system [16], including Giraph [3], GPS [23], and Pregel+ [34]. In this paper, for ease of a comparison to the existing techniques, we also present our techniques based on the Pregel system. Nevertheless, our techniques may be extended to other distributed systems, such as MapReduce [6], Spark [35], GraphLab [15], and GraphX [9]. Cost Estimation of Pregel Algorithms. Pregel is designed based on the Bulk Synchronous Parallel (BSP) model [32], with computation performed in a serial of supersteps. Denote the total number of supersteps of a Pregel algorithm as #supersteps.
The cost of one superstep of a BSP (also Pregel) algorithm on p workers (a.k.a cores) is (max p i=1 w i + max p i=1 h i × g + l) [32], where w i is the cost for local computation at worker i, h i is the number of messages sent or received by worker i regarding data transmissions, g is the ability of a communication network to deliver data, and l is the cost of a barrier synchronization; here, g and l are system-dependent parameters. Thus, the total running time of a Pregel algorithm is expected to be determined by the total data communication cost (H), total computation cost (W), and #supersteps (i.e., W p + H p × g + l × #supersteps). Existing Approaches. There are three existing algorithms for computing CCs in Pregel: hash-min, single-pivot, and S-V. The total costs for both data communication and computation of the three algorithms are O(m × #supersteps). hash-min [13,22] and single-pivot [23] adopt the same strategy by coloring vertices such that all vertices in a CC end up with the same color. Each vertex is initialized with a different color, and then in each superstep, each vertex resets its color to be the smallest one among its own color and its neighbors' colors. While being initially developed against MapReduce, hash-min is adopted in [23] as a baseline algorithm to evaluate single-pivot in Pregel. Here, single-pivot proposes a heuristic to speed up the computation; it computes the first CC by conducting BFS starting from a randomly selected vertex while other CCs are computed by a follow-up hash-min 3 . The #supersteps of both hash-min and single-pivot are O(δ), where δ is the largest diameter among CCs.
The third approach, S-V [34], significantly extends the PRAM based algorithm in [25]. The main idea is to use a star to span all vertices of a CC by the two phases below. Initially, it constructs a rooted forest by making each vertex u point to its neighbor vertex v as its parent where u.id < v.id and v.id is maximized. Then, the first phase, shortcutting, is to connect every vertex u to its grand parent by an edge to replace the edge from u to its current parent. The second phase, hooking, is to attach the root of a star to a vertex v in another tree if there exists a vertex u in the star with v as its neighbor and u.id < v.id -choose the vertex v with the largest id if there are several such vertices. S-V iteratively alternates the two phases till each tree is a star and no merges exist 4 . The #supersteps of S-V is O(log n).
For computing BCCs in Pregel, Yan et al. [34] presented an algorithm, T-V, to extend the PRAM algorithm in [29] to convert the problem of computing BCCs to computing CCs, and then apply the above techniques (i.e., hash-min, single-pivot, or S-V) to compute CCs. The converting process requires an additional O(log n) #supersteps [34]. Computing CCs. We conduct graph decomposition by growing BFS trees from randomly selected seed vertices. Clearly, a pair of separately grown BFSs belong to one CC if they share a common vertex. To ensure that each edge is visited at most twice in our algorithm, the unvisited neighbors of a common vertex in more than one BFSs are only extended once to the next level for a further extension in one BFS. For example, the graph in Figure 1.1 is decomposed into three subgraphs as shown in Figure 1.3(a), which are obtained by conducting BFS searches from v 1 , v 4 , and v 10 , respectively. g 1 and g 2 firstly overlap on {v 9 }; assuming v 9 is extended in g 2 , then (v 9 , v 7 ) will not be visited in g 1 . Finally, g 1 and g 2 are combined together to form a CC of G. At each superstep, we proceed BFS one level further and randomly select new seed vertices whose cardinality increases exponentially along with supersteps. Since the number of BFSs (i.e., seed vertices) is only a small fraction of the number of vertices in a graph (e.g., ≤ 0.4%, see Figure 5.2(d) in Section 5), the last step (i.e., combining) can be achieved through aggregator of Pregel systems at the master worker. We are able to show that the total data communication cost and #supersteps of our approach are O(m) and O(log n), respectively; moreover, if the input graph is connected (or has a giant CC), then the #supersteps of our approach becomes O(min{δ, log n}) (or with high probability).  Computing BCCs. Suppose the input graph G is connected, the central idea is as follows. We first construct a BFS tree T of G. Then, based on T , G may be treated as being decomposed into (m − n + 1) basic cycles {C 1 , . . . , C m−n+1 } each of which is induced by a non-tree edge; that is, a basic cycle consists of a non-tree edge (u, v) and the paths from u and v to their nearest common ancestor in T . We can prove that each basic cycle is biconnected and two biconnected subgraphs belong to the same BCC if they share a common edge. Therefore, our algorithm runs iteratively from the bot-tom to the root in T to identify the vertices at the current layer and their neighbors in the upper layer to be in the same BCC. For example, given the BFS tree depicted by solid edges in Figure 1.3(b), there are two basic cycles, . Thus, at the bottom layer, the algorithm identifies 3 , v 6 , v 7 }, and {v 4 , v 7 } to be in the same BCCs, respectively. Then, the algorithm moves to the second bottom layer, and identifies that {v 1 , v 2 , v 5 , v 6 } and {v 3 , v 6 , v 7 } should be in the same BCC including v 8 due to the common edge (v 6 , v 8 ).
To speed up the computation, we propose a vertex labeling approach. The total data communication cost of our approach is O(m), and #supersteps is O(log n + δ). Contributions. Our main contributions are as follows.
• We develop a new paradigm for computing CCs and BCCs to reduce the total data communication cost. • We propose a graph decomposition based approach for computing CCs of a graph with total data communication cost O(m) in O(log n) supersteps. • We propose a vertex labeling approach for computing BCCs with total data communication cost O(m) in O(log n + δ) supersteps.
We conduct extensive performance studies on large-scale graphs, and show that our approaches has significantly smaller communication volume than the existing approaches and are one order of magnitude faster than the existing techniques. Organization. A brief overview of related works immediately follows. In Section 2, we give preliminaries and our problem statement. The graph decomposition based paradigm and our approach for computing CCs are presented in Section 3, while our vertex labeling approach for computing BCCs is illustrated in Section 4. Section 5 presents our performance studies, and Section 6 finally concludes the paper. Related Works. Related works are categorized as below. 1) Computing CCs. Computing CCs by an in-memory algorithm over a single machine can be achieved in linear time regarding the input graph size by BFS or DFS [5]. PRAM algorithms were proposed in [1,25]. Algorithms based on MapReduce include [4,13,21,22], where the algorithm in [21] was independently developed from the Pregel version of S-V [34], and hash-min is used as a baseline to evaluate single-pivot in [23]. All these MapReduce and Pregel algorithms have total data communication costs O(m × #supersteps). As stated earlier, we propose a new approach in Pregel to reduce the total data communication cost to O(m).
2) Computing BCCs. Computing BCCs in the main memory of a single machine can be achieved in linear time based on DFS [11]. PRAM algorithms for computing BCCs were studied in [26,29]. The state-of-the-art algorithm T-V in Pregel [34] significantly extends the techniques in [29] to convert the problem of computing BCCs to computing CCs. In this paper, we develop a new approach with total data communication cost [34].
3) Graph Decomposition. The paradigm of graph decomposition was studied in [2,17,27,28] to decompose a graph into subgraphs with designated properties. It aims to minimize either the number of subgraphs, the maximum radius among subgraphs, or the number of cross partition/subgraph edges [2,17,27,28]. While our approach is also based on the paradigm of graph decomposition, the existing techniques are irrelevant since we target at inherently different problems. 4) Other Distributed Graph Processing Systems. Besides the Pregel system [16] and its open-source implementations [23,30,34], other distributed graph processing systems include GraphLab [15], MapReduce [6], Spark [35], and GraphX [9]. Pregel and GraphLab are very similar to each other though GraphLab supports both synchronous and asynchronous models and does not allow graph mutations [15]. MapReduce [6] is a general purpose distributed data processing system, and has recently shown to be able to process graphs [21]. Spark [35] is an in-memory distributed data processing system which improves upon MapReduce by keeping data in main memory. GraphX [9] is a system built on Spark to bridge the gap between graph processing systems and general purpose data processing systems. While we will present our techniques based on the Pregel system, our techniques may be easily modified to the above distributed systems. This is because our techniques are based on the vertex-centric programming model that can be easily implemented in the other systems.

Preliminary
In this paper, we focus on an unweighted undirected graph G = (V, E) [8] , where V is the set of vertices and E is the set of edges. Denote the number of vertices, |V|, and the number of edges, |E|, in G by n and m, respectively. Each vertex v ∈ V has a unique integer ID, denoted v.id. We denote an undirected edge between u and v by (u, v). Given a set V ⊆ V of vertices, the subgraph of G induced by V is defined as In the following, for presentation simplicity we refer an unweighted undirected graph as a graph. . . , v 7 }, respectively. In the following, we refer a BCC either by the set of vertices or by the set of edges in it. Note that, a vertex (e.g., v 4 ) may belong to more than one BCCs, while each edge belongs to a unique BCC. Problem Statement. Given a large-scale graph G, in this paper we study the problem of distributed computing all connected components (CCs) of G and the problem of distributed computing all biconnected components (BCCs) of G.
In this paper, for ease of a comparison to the existing techniques, we present our techniques based on the Pregel system which is introduced in below. Nevertheless, our techniques may be extended to other distributed systems, such as MapReduce [6], Spark [35], GraphLab [15], and GraphX [9].

The Pregel System
Pregel [16] is designed based on the Bulk Synchronous Parallel (BSP) model [32]. Initially, vertices of the input graph are distributed across a cluster of workers, where all adjacent edges of a vertex reside in the same worker. Then, computation tasks are performed in a serial of supersteps; let #supersteps denote the total number of supersteps in a Pregel algorithm.
In a superstep, each active vertex invokes a user-defined function, compute(). The compute function running on a vertex v, (1) performs computation based on v's current status and the messages v received from the previous superstep, (2) updates its status, (3) sends new messages to other vertices to be received in the next superstep, and (4) may (optionally) make v vote to halt. A halted vertex is reactivated if it receives messages. A Pregel program terminates when all vertices vote to halt and there is no message in transmit. Pregel is often regarded as a vertex-centric programming model since it performs computation for each vertex based on only the local information of the vertex itself and the messages it receives. Combiner and Aggregator. Pregel also extends BSP in several ways. Firstly, Pregel supports combiner (i.e., combine() function) to combine messages that are sent from vertices in one worker to the same vertex in another worker. Secondly, aggregator is also supported in Pregel. That is, in a superstep, every vertex can contribute some values to aggregator and a rule is specified to aggregate these values at the master worker; the result of aggregation is visible to all vertices in the next superstep. Thus, in Pregel, the master worker can act as a coordinator by conducting some computation for all workers. In this paper, we make use of the aggregator to design faster Pregel algorithms for computing CCs and BCCs.

Computing CCs
In order to reduce the communication cost of computing CCs, we develop a new paradigm in Section 3.1, based on which we present a new approach in Section 3.2 while analyses are given in Section 3.3.

A Graph Decomposition based Paradigm
Reducing Communication Cost. The existing approaches for computing CCs have total (data) communication costs and total computation costs O(m × #supersteps) (see One thing to notice is that the total communication cost of single-pivot becomes O(m) (or with high probability) if the input graph is connected (or has one giant CC and other small CCs). For example, YH is such a graph in our experiments; the communication volume of single-pivot on YH is similar to our algorithm and is much smaller than hash-min and S-V (see Figure 5.1(c) in Section 5). However, a Pregel algorithm has an inevitable computation cost of O(n×#supersteps) for checking at each superstep every vertex whether it is active. Consequently, the computation cost of single-pivot on YH, which has a large diameter, is very high (see Figure 5.1(b)) due to the large #supersteps (i.e., δ); this results in high total running time (see Figure 5.1(a)). Therefore, #supersteps is also an important factor to be optimized for Pregel algorithms. In this paper, we retain the computation cost and #supersteps of our algorithm to match the best of existing algorithms. Graph Decomposition based Paradigm. We develop a new graph decomposition based paradigm for computing CCs.
. . , g l }, such that each subgraph g i keeps a designated property (e.g., be connected) and l i=1 E i = E.

Algorithm 1: CC-Overview
Input: A graph G = (V, E) distributed across a set of workers Output: The set of CCs of G 1 Distributed compute a graph decomposition of G; 2 Union decomposed subgraphs into CCs; 3 return the set of CCs; The framework is illustrated in Algorithm 1, which first decomposes an input graph into a set of connected subgraphs and then unions the decomposed subgraphs into CCs. During graph decomposition, we assign a unique color to all vertices in a decomposed subgraph which is connected. Since subgraphs sharing common vertices belong to the same CC, a vertex may receive different colors and the decomposed subgraphs having these colors belong to the same CC. We mark down, during graph decomposition, the subgraphs that should be unioned, and dedicate the last superstep for the union operation to obtain the correct CCs. Since the number of colors (i.e., the number of decomposed subgraphs) usually is only a small fraction of the number of vertices in a graph (e.g., ≤ 0.4%, see Figure 5.2(d)), the union operation can be achieved through the aggregator of Pregel systems at the master worker.  For example, assume the graph in Figure 2.1 is decomposed into four connected subgraphs {g 1 , g 2 , g 3 , g 4 } as shown in Figure 3.1 and vertices in g 1 have color 1 and vertices in g 2 have color 7, then v 4 will receive colors 1 and 7, respectively, from subgraphs g 1 and g 2 . Therefore, the subgraph having color 1 (i.e., g 1 ) and the subgraph having color 7 (i.e., g 2 ) belong to the same CC, and we union g 1 and g 2 into a single CC. Similarly, we can obtain the other CC.

Our Algorithm
Following the paradigm in Algorithm 1, we propose a new approach for computing CCs, consisting of the following two phases: graph decomposition and subgraph union. Graph Decomposition. We compute a graph decomposition by simultaneously conducting BFS searches starting from a set of seed vertices. When running a BFS, we label all visited vertices by the color of the BFS, which is the seed vertex id of the BFS. Thus, a vertex may receive multiple colors, one from each BFS visiting it; we store all the received colors of a vertex v into v.cls, and set v's color, v.color, to be the first received color. Once a vertex is assigned a color, it propagates the color to all its neighbors. Therefore, the unvisited neighbors of a vertex are only extended in one BFS because a vertex is assigned a color only once; this guarantees that each edge is visited at most twice during graph decomposition. When all edges of the graph have been visited (i.e., all BFSs terminate), each subgraph induced by vertices with the same color is a decomposed subgraph; note that, here we say a vertex v have all colors it received (i.e., all colors in v.cls). Seed Vertex Selection. The largest diameter of the obtained subgraphs by graph decomposition, which is related to #supersteps, largely depends on the selection of seed vertices. For example, in Figure 2.1, assume the seed vertices are {v 1 , v 7 , v 8 , v 12 }, then we will obtain the graph decomposition in Figure 3.1; the largest diameter is 2. However, if the seed vertices are chosen as {v 1 , v 2 , v 8 , v 9 }, then the obtained subgraphs will be the subgraphs induced by 11 , v 12 }, respectively; the largest diameter will be 3.
We propose a randomized adaptive approach to seed vertex selection by iteratively selecting seed vertices and conducting BFSs. That is, in each superstep, we advance the existing BFSs by one level (i.e., visit the neighbors of the currently visited vertices), and also start BFSs from the newly selected seed vertices. To make #supersteps bounded, we increase the number of seed vertices to be selected as the iteration proceeds; this is controlled by a parameter β > 1. Specifically, we randomly select β i −β i−1 new vertices to be potential seed vertices for superstep i > 0 (i.e., 1 vertex for superstep 0). For each of the selected vertices, if it has already been visited by the existing BFSs, then we do nothing; otherwise, it is treated as a new seed vertex and we start a new BFS from it. Graph Decomposition Algorithm. The pseudocode of our graph decomposition algorithm is shown in Algorithm 2, denoted graph-decompose. We randomly permute all vertices in G by the approach in [24] (Line 1), such that the random vertex selection is achieved by sequentially selecting vertices according to the permutation order. For each vertex v, the position of v in the permutation, v.pos, and the color of v, v.color, are initialized at Line 3. Then, we go to iterations until every vertex has been assigned a color and all edges have been visited (Lines 5-14). In a superstep, we perform the following computations for each vertex v ∈ V. If v already has a color (i.e., v.color nil), we just add the set of colors v received in this superstep into v.cls (Line 14). Otherwise, v.color = nil. In this case, if v receives colors in this superstep (i.e., v is being visited by the existing BFSs), then we assign the first color v received to be its color (Lines 8-10); otherwise, if v is selected by the random selection process (i.e., v.pos ≤ β i ), then we start a new BFS from v (Lines 11-12). After v is assigned a color, we propagate its color to its neighbors (Line 13). Subgraph Union. In this phase, we union the subgraphs that belong to the same CC into a single subgraph by merging colors, based on the facts that each subgraph is  /* Master worker conducts the aggregation */ 3 Let C be the set of all colors in received messages; 4 Initialize a disjoint-set data structure for C; 5 for each set of received colors cls do 6 Let c be the first color in cls; 7 for each color c ∈ cls do union(c, c ); 8 for each color c ∈ C do find(c); /* Master worker sends aggregation to slaves */ 9 Broadcast the set of parents of colors in the disjoin-set data structure (i.e., {(c, parent(c)) | c ∈ C}) to all slave workers; Merging color is achieved through the aggregator in Pregel systems, as shown in Algorithm 3. It first collects the sets of received colors of vertices to the master worker (Lines 1-2), then conducts the merging at the master worker (Lines 3-8), and finally sends the aggregated information back to slave workers (Line 9). To process these merging operations efficiently, we adopt the union-find algorithm based on the disjointset data structure [5]. We organize the colors into a set of trees by storing the parent parent(c) of each color c, such that the root of each tree is the representative color of all colors in the tree. Initially, each color corresponds to a singleton tree (i.e., parent(c) = c, ∀c ∈ C) (Line 4). For each set of received colors, cls, we merge the first color c with every other color c ∈ cls (Lines 5-7); that is, we union the tree containing c and the tree containing c . Finally, we apply the find operation to all colors (Line 8), such that parent(c) stores the root/representative color of the tree containing c. Note that, union and find are two standard operations in the union-find algorithm [5].

Algorithm 4: update-color
Input: The set of (c, parent(c)) pairs received at slave workers Now, we replace each color v.color by its representative color parent(v.color). This guarantees that all vertices in the same CC are updated to have the same color. The pseudocode is given in Algorithm 4, denoted update-color. v (e) After update-color thus v 4 adds 7 to its set of received colors, as shown in Figure 3.2(c). Note that, in this superstep, neither v 4 , v 5 , nor v 6 propagate its newly recieved colors (i.e., colors, 7, 1, and 1, respectively) to its neighbors. At superstep 4, all edges have been visited by BFSs; thus, the graph decomposition ends and the result is shown in Figure 3.2(d). Vertices, v 4 , v 5 , v 6 , and v 10 , have multiple received colors; thus their sets of received colors are collected at the master worker for merging. That is, colors 1 and 7 are merged, and colors 8 and 12 are merged. Assume that the representative color of 1 and 7 is 1 and the representative color of 8 and 13 is 8, then after update-color the final colors of vertices are shown in Figure 3.2(e). We conclude that vertices {v 1 , . . . , v 7 } and {v 8 , . . . , v 12 } belong to two CCs, respectively.

Correctness and Complexity Analysis
We prove the correctness and give complexity analyses of our approach. Correctness. For any color c, let V c be the set of vertices with color c. Then, V c ∩ V c = ∅, ∀c c , because after update-color (Algorithm 4), every vertex has exactly one color. We prove the correctness of our approach by the following theorem. Theorem 3.1: For each assigned color c in Algorithm 4, the subgraph induced by V c , G[V c ], is a maximal connected subgraph (thus, it is a connected component). Proof Sketch: We first prove that G[V c ] is connected. It is obvious that after graphdecomposition (i.e., Algorithm 2), G[V c ] is connected for each color c, because the colors of vertices are assigned by a BFS. Then, after merge-color (i.e., Algorithm 3) we union the subgraphs that share a common vertex into a single subgraph (i.e., merge received colors of a single vertex). Thus, after update-color (i.e., Algorithm 4), G[V c ] is connected. Now, we prove the lemma by contradiction. Assume that there is a subgraph G[V c ] that is not a maximal connected subgraph; that is, there is a connected super graph g of Because all edges are visited by graph-decompose, u and v must have the same color after update-color. This contradict that v V c .
Thus, the theorem holds. Complexity Analyses. Now, we analyze the complexities of our approach regarding #supersteps, total communication cost, and total computation cost in the following. Number of Supersteps. In the worst case, graph-decompose stops after at most log β n (i.e., O(log n)) supersteps, because at that time all vertices would have been selected by the random seed vertex selection process. Moreover, if the input graph is connected (or has a giant CC), then graph-decompose terminates after at most O(δ) supersteps (or with high probability), because at that time all edges would have been visited by BFSs. In addition, merge-color and update-color take one superstep each. Thus, #supersteps is bounded by O(min{δ, log n}) (or with high probability) if the input graph is connected (or has a giant CC), and it is bounded by O(log n) in the worst case. Total Communication Cost. The total communication cost of our approach is O(m), as follows. Firstly, graph-decompose traverses the input graph only once by visiting each undirected edge at most twice, and one message is generated for each visited edge. Secondly, the number of messages collected at the master worker in Algorithm 3 is at most v∈V |v.cls| ≤ 2 × m. Moreover, to reduce the communication cost, we also aggregate the messages locally at slave workers before sending to the master worker, by an algorithm similar to Algorithm 3. Thus, the communication cost of merge-color is O(p × #colors), where p is the number of workers and #colors is the total number of colors generated.
The #colors is the same as the number of selected seed vertices. To make it small, we implement a heuristic by choosing the vertex with the maximum degree as the first seed vertex at superstep 0. Let ∆ be the maximum degree and d be the average degree. Then, the expected number of seed vertices selected at superstep i is is the expected number of vertices already visited by the existing BFSs. Since ∆ is large for real graphs due to the power-law graph model, n i (thus #colors) usually is small; for example, in our experiments in Section 5, #colors is only a small fraction (i.e., ≤ 0.4%) of the number of vertices in a graph (see

Computing BCCs
In this section, we propose a new vertex labeling technique, based on graph decomposition, to reduce the total communication cost for computing BCCs from O(m × #supersteps) to O(m). We first present the general idea in Section 4.1, and then illustrate our approach in Section 4.2, while analyses are given in Section 4.3. In the following, for ease of exposition we assume that the input graph is connected.

Vertex Labeling
Challenge. The main challenge of computing BCCs lies in the fact that a vertex may belong to several BCCs. Thus, a similar idea to CC computation algorithms (i.e., labeling each vertex by a color such that each CC corresponds to a subgraph induced by vertices with the same color) does not work for BCCs. Nevertheless, it is true that each edge can only participate in one BCC. Thus, the existing approaches compute BCCs by labeling edges instead [29,34]. To do so, they construct an auxiliary graph G for the input graph G by treating each edge in G as a vertex in G ; this reduces the problem of labeling edges in G to labeling vertices in G . However, the above process incurs O(m × #supersteps) communication cost. Intuition of Our Approach. We propose a new approach for computing BCCs with O(m) communication cost by directly labeling vertices of the input graph. Our main idea is based on cycles in a graph.
are not simple cycles, while (v 1 , v 3 , v 4 , v 1 ) is a simple cycle. In the following, we refer simple cycle as cycle for presentation simplicity.
We have the lemma below for BCCs based on cycles. Proof Sketch: First, we prove that "if there is a cycle containing both edges, then the two edges belong to the same BCC" by contradiction. Assume there are two edges, e 1 and e 2 , in a cycle C belong to two BCCs, BCC 1 and BCC 2 , respectively. Then BCC 1 ∪ C (i.e., the subgraph induced by edges in BCC 1 and C) must be biconnected; this is because BCC 1 and C have at least two common vertices, and after removing one vertex from BCC 1 ∪ C, both BCC 1 and C are still connected themselves and they are connected together by the (at least one) common vertices. This contradicts that BCC 1 is a maximal biconnected subgraph. Therefore, all edges in a cycle of G belong to the same BCC of G. Now, we prove that "if two edges belong to the same BCC, then there is a cycle containing both edges". Consider any two edges e 1 = (u 1 , u 2 ) and e 2 = (v 1 , v 2 ) belonging to the same BCC BCC, there are two cases: 1) e 1 and e 2 share a common vertex, and 2) e 1 and e 2 have no common vertex. Case 1. Without loss of generality, assume the common vertex is u 1 = v 1 . After removing u 1 from the BCC, there is still a path between u 2 and v 2 in BCC; thus, there is a cycle in G containing both e 1 and e 2 . Case 2. There must exist two vertex-disjoint (except u 1 and v 1 ) simple paths, P 1 and P 2 , in BCC between u 1 and v 1 [5]; thus P 1 ∪ P 2 is a cycle in BCC containing u 1 and v 1 . If (u 1 , u 2 ) is not in P 1 ∪ P 2 , then we can construct a new cycle to include e 1 by conducting a BFS in the BCC from u 2 after removing u 1 , and terminating the BFS once it reaches a vertex that is in P 1 or P 2 ; without loss of generality, assume the BFS reaches a vertex u in P 1 . Then the new cycle, formed by removing the subpath of P 1 from u 1 to u, and adding edge (u 1 , u 2 ) and the path from u 2 to u, is a simple cycle in BCC, and it contains edge e 1 and vertex v 1 . Similarly, we can construct a simple cycle to contain both edge e 1 and e 2 . Thus, there is a cycle in G containing both e 1 and e 2 .
Therefore, the lemma holds. For example, in Figure 4.1(a), edges (v 2 , v 5 ) and (v 3 , v 6 ) belong to the same BCC because both appear in the cycle (v 1 , v 2 , v 5 , v 9 , v 6 , v 3 , v 1 ); edges (v 4 , v 7 ) and (v 8 , v 11 ) belong to different BCCs because there is no cycle containing both edges. Computing BCCs. Following Lemma 4.1, we can compute BCCs of a graph by enumerating cycles and building the relationship among edges, such that two edges have a relation if and only if they appear together in a cycle. Therefore, the transitive relationships between edges define the BCCs of the graph. However, there can be exponential number of cycles in a graph [5]. To make the above idea of computing BCCs by enumerating cycles work, we reduce the number of cycles to m − n + 1 by considering only basic cycles. Definition 4.2: Given a spanning tree T of a graph G, we define a basic cycle for a non-tree edge (u, v) T as the cycle containing (u, v) and the unique path between u and v in T .
In the following, we assume there is a spanning tree T . Based on basic cycles, we have the following theorem. Theorem 4.1: Given any two edges e and e , they belong to the same BCC if and only if either 1) there is a basic cycle containing both e and e or 2) there is a chain of basic cycles, C 0 , . . . , C l , such that C i and C i+1 overlap on edges for every 0 ≤ i < l, and e ∈ C 0 and e ∈ C l . Proof Sketch: We prove the theorem by considering case 1 (i.e., there is a basic cycle containing e and e ) as a special case of case 2 (i.e., l = 0). (⇐=) Assume the overlap edge between C i and C i+1 is e i . Then, e and e 0 belong the same BCC, e i and e i+1 belong to the same BCC for all 0 ≤ i ≤ l − 2, and e l−2 and e belong to the same BCC. Therefore, e and e belong to the same BCC. (=⇒) From Lemma 4.1, we know that if e and e belong to the same BCC, then there is a cycle containing both e and e ; we assume cycle C is the minimal one in terms of number of non-tree edges among all such cycles. It has been proved in [14] that there is a set of basic cycles, C 0 , . . . , C l , such that C = C 0 · · · C l for any cycle C; it is easy to verify that each such basic cycle corresponds to a non-tree edge in C. Here, we consider a cycle as the set of edges in the cycle, and C 1 C 2 denotes the symmetric difference between C 1 and C 2 (i.e., C 1 ). Now, we construct a graph G c of these cycles, each vertex in G c corresponds to a cycle C i , and two vertices are connected by an edge if and only if the corresponding two cycles share at least one common edges. It is easy to prove that G c is connected (otherwise either C is not connected or it is not simple). Therefore, there exists a chain of basic cycles, C i 0 , . . . , C i l , such that C i j and C i j+1 have overlap on edges for all 0 ≤ j < l, and C i 0 and C i l contains e and e , respectively.
Consider the graph in Figure 4.1(a) with the spanning tree in Figure 4.1(b), e 1 = (v 11 , v 12 ) and e 2 = (v 12 , v 8 ) belong to the same BCC because they appear in the basic cycle C 1 = (v 8 , v 11 , v 12 , v 8 ). e 1 and e 3 = (v 12 , v 13 ) belong to the same BCC, because there exists another basic cycle C 2 = (v 8 , v 12 , v 13 , v 8 ) such that e 1 ∈ C 1 , e 3 ∈ C 2 and C 1 ∩ C 2 ∅. Labeling Vertices. From Theorem 4.1, we only need to enumerate basic cycles, which is affordable. To tackle the challenge stated at the beginning of this subsection, we propose a vertex labeling technique based on the lemmas below. Lemma 4.2: Given a rooted spanning tree T of a graph G, for each vertex u in G, the set of non-tree edges associated with u and the tree edge (u, p(u)) belong to the same BCC, where p(u) denotes the parent of u in T . Proof Sketch: This lemma directly follows from Theorem 4.1 and the fact that, for each non-tree edge e associated with u, there is a basic cycle containing both e and (u, p(u)). Thus, all these edges belong to the same BCC. Lemma 4.3: Given a rooted spanning tree T of a graph G, each BCC of G has a unique vertex that is closest to the root of T (denote it as the root of the BCC), and each vertex can be a non-root vertex in at most one BCC. Proof Sketch: For the first claim, we prove it by contradiction. Assume there are at least two vertices at the highest level, let u and v be two of the vertices. Then, there must be a path between u and v not using any nodes at higher levels than u and v (i.e., using only nodes inside the BCC). Moreover, there is also a path P between u and v using only nodes at higher levels than u and v (i.e., the path between u and v in T ). Therefore, there is a cycle containing P, which means that nodes in P also belong to the BCC. Contradiction. Thus, the claim holds.
For the second claim, we also prove it by contradiction. Assume there exists a vertex v that is non-root vertex of two BCCs, bcc 1 and bcc 2 . Then the tree edge (v, p(v)) from v to its parent p(v) in T must be in both bcc 1 and bcc 2 . This contradicts that each edge uniquely belongs a BCC. Thus, the claim holds.
Therefore, we can label vertices by colors; each vertex has the color of the unique BCC in which the vertex is a non-root vertex. The set of all vertices with the same color and their parents in the spanning tree corresponds to a BCC. Alternatively, edges can infer their colors as follows: if it is a non-tree edge, then its color is the same as the color of either of its two end-points; otherwise, it is a tree edge and its color is the same as the child vertex of the edge. Consequently, each BCC of G corresponds to the set of edges with the same color. Naive Vertex Labeling Algorithm. Given a rooted spanning tree of a graph G, a naive vertex labeling algorithm is to enumerate all basic cycles (i.e., decompose G into a set of basic cycles). For each basic cycle C, we label all vertices in C, except the unique vertex that is closest to the root, with the same color. Note that, during labeling vertices in a basic cycle C, if there are vertices in C that have already been labeled (i.e., their colors have been set by other basic cycles), then we collect all colors of such vertices (except the unique vertex that is closest to the root) and merge these colors, similar to merge-color and update-color in Section 3. However, this is time-consuming considering the expensive cost of enumerating all basic cycles and of relabeling vertices. We present an efficient vertex labeling algorithm in the next subsection.

Our Algorithm
We propose an efficient approach for labelling all vertices in G by traversing the graph in a bottom-up fashion. We first make the input graph G a layered graph regarding a BFS tree. Definition 4.3: Given a BFS tree T of a graph G, we make G a layered graph by assigning a level number to each vertex. The root vertex in T has a level number 0, and for every other vertex, its level number is one plus that of its parent in T .  For each vertex in a layered graph, we categorize its neighbors into upper-level neighbors (i.e., neighbors with smaller level numbers), same-level neighbors, and lower-level neighbors (i.e., neighbors with larger level numbers). For a vertex at level i, all its upper-level neighbors are at level (i − 1), and all its lower-level neighbors are at level (i + 1). For example, for v 4 in Figure 4.2, its upper-level neighbor is v 1 , its same-level neighbor is v 3 , and its lowerlevel neighbors are v 7 and v 8 . Note that, each vertex except the root vertex will have at least one upper-level neighbors including its parent vertex in T and possible other vertices corresponding to non-tree edges.
Given a layered graph, our algorithm labels vertices level-by-level in a bottom-up fashion based on the lemma below. Lemma 4.4: Given a layered graph, (rule-i) each vertex has the same color as its same-level neighbors; (rule-ii) if a vertex has at least two upper-level neighbors, then the vertex and all its upper-level neighbors have the same color. Proof Sketch: For rule-i, let the two vertices be u and v, then there is a non-tree edge between u and v. Then, there is a basic cycle containing (u, v), (u, p(u)), (v, p(v)).
Consequently, u and v should have the same color.
For rule-ii, without loss of generality, assume that a vertex u has two upper-level neighbors v and w. Then, there is a cycle containing (u, v), (u, w), (v, p(v)), (w, p(w)), since neither v nor w can be the root due to the graph G being layered according to a BFS tree. Therefore, u, v, and w should have the same color.
Consider the layered graph in Figure 4.2, v 3 and v 4 have the same color since they are same-level neighbors, v 9 have the same color as v 5 and v 6 since v 5 and v 6 are two upper-level neighbors of v 9 . However, from Lemma 4.4, we do not know the relationship between colors of v 2 and v 3 since they are neither directly connected nor connected through a common lower-level neighbor. From Theorem 4.1, we know that v 2 and v 3 should have the same color. We define a merge operation below so that we can apply Lemma 4.4 to label all vertices. Definition 4.4: Given a set S of vertices that are at the same level and have the same color, the merge operation is to merge all vertices in S into a single super-vertex.
For example, in Figure 4.2, vertices v 5 and v 6 are at the same level and have been assigned the same color due to the common lower-level neighbor v 9 . We merge them into a super-vertex, denoted v 5,6 . Now, v 5,6 has two upper-level neighbors, v 2 and v 3 ; thus, according to Lemma 4.4, they have the same color as v 5,6 (i.e., the color of v 5 and v 6 ). Therefore, we can continue this process to label all vertices. The Algorithm. Armed with Lemma 4.4 and the merge operation, we present our efficient vertex labeling algorithm in Algorithm 5, denoted BCC-Labeling. Firstly, we construct a layered graph (Line 1); that is, we conduct a BFS of G starting from a random vertex r and assign each vertex in G a BFS level number 1 . Secondly, we compute CCs of the subgraph of G, consisting of only edges whose two end-points are at the same level, to label vertices according to rule-i (Line 2). Note that, all vertices in the same CC (i.e., with the same color and at the same level) need to be merged into a super-vertex to iteratively apply rule-ii level-by-level. Instead of physically assigning all neighbors of vertices in a CC as the neighbors of the super-vertex, we store in v.sid the id of the super-vertex containing v while neighbors are still kept at the individual vertices. The super-vertex id can be the id of any vertex in the super-vertex, and all vertices in the same super-vertex will have the same super-vertex id. Then, we label vertices level-by-level in a bottom-up fashion. For vertices at the bottom level, we label all vertices in the same CC (indicated by v.sid) by a unique color (i.e., v.sid) (Line 4). After that, we go to iterations to label other vertices level-by-level in two phases (Lines 6-26). (Phase 1) Propagate colors from vertices at level l to vertices at level l − 1 (Lines 6-11). In order to apply rule-ii in Lemma 4.4, we need to check whether a super-vertex has at least two upper-level neighbors. Here, we ensure that vertices at level l belong to the same super-vertex (i.e., with the same sid) if and only if they have the same color; that is, u.sid = v.sid if and only if u.color = v.color. Thus, we use the vertex with id v.sid to check whether the super-vertex has at least two upper-level neighbors. However, instead of collecting all upper-level neighbors to the super-vertex, we only need to collect up to two neighbors from each vertex in the super-vertex (Line 7). We put these upper-level neighbors into a set S (Lines 9-10). If there are at least two upper-level neighbors (i.e., equivalently, |S | > 1), then we notify all vertices at level l with color v.color (i.e., in the super-vertex) to send their colors to their upper-level (Phase 2) Assign colors to vertices at level l − 1 (Lines 12-26). Line 12 moves up one level higher (i.e., l ← l − 1); thus, in the following, we will talk about vertices at level l. A vertex at level l may receive several colors propagated from vertices at level l + 1.
We merge colors such that only one color is assigned to each super-vertex at level l.
There are two cases for merging colors: 1) colors received by the same vertex should be merged; 2) colors received by vertices in the same CC (i.e., the same super-vertex) should be merged. Therefore, we send all colors received by vertices in the same CC to the super-vertex of the CC for merging (Lines 13-16). For each super-vertex, we assign one of the received colors to be its color (Lines 17-21) and this color is also assigned to all vertices in the super-vertex (i.e., CC) (Lines 22-23). If two super-vertices received the same color, then they should be merged into a single super-vertex according to the merge operation. This is achieved by the merge-color aggregator (Line 24), which shall be discussed shortly. Finally, we update the colors and super-vertex ids of vertices at level l (Lines 25-26); here, vertexID(c) denotes the super-vertex id of vertices with color c.
Algorithm 6: update-color 1 Let C be the set of all colors in the disjoint-set data structure; 2 for each color c in C do Find(c); 3 for each vertex v ∈ V do v.color ← parent(v.color); Note that, in the above bottom-up process, colors assigned to vertices are only tentative, because some colors assigned to vertices at lower levels may be merged at higher levels. For example, in Figure 4.2 after labeling vertices at level 3, v 9 and v 10 have colors 9 and 10, respectively; the two colors are merged into a single color when labeling vertices at level 2 because v 6 receives both color 9 and color 10. However, we do not update colors for vertices at lower levels in this bottom-up process; instead, we update colors for all vertices in a final updating process (i.e., Line 27), which is presented in Algorithm 6. merge-color Aggregator. The aggregator for merging colors and super-vertices is shown in Algorithm 7, denoted merge-color. It is similar to Algorithm 3. The only difference is that, we also merge super-vertices here; that is, two super-vertices are merged together if their sets of colors overlap. Thus, for each color c in the disjoint-set data structure, we also assign a super-vertex id to c, denoted vertexID(c), which is the super-vertex id of vertices with color c. After running the aggregator, parent(c) denotes the representative color of c (i.e., color c should be replaced by parent(c)), and vertexID(parent(c)) denotes the super-vertex id of all vertices at the current level with color parent(c). Running Example. Consider the layered graph in In assigning colors to vertices at level 2, we first check whether a super-vertex at level 3 should propagate its color to its upper-level neighbors according to rule-ii in Lemma 4.4; if it is, we propagate its color to its neighbors at level 2 (i.e., Phase 1). Here, the super-vertex of {v 11 , v 12 , v 13 } has only one upper-level neighbor, thus its color is not propagated. Both v 9 and v 10 have two upper-level neighbors, thus their colors V 7 (7) (-) V 6 (6) (-) V 10 (10) (10) (a) Level 3 V 6 (6) (9,10) V 10 (10) (b) Level 2 (phase 1) el2P2 parent(10) = 9 V 6 (5) (9) V 6 (5) (9) V 6 (5) (9) V 13 (11) (11) V 9 (9) (9) V 10 (10) (10) (e) Level 1 (phase 2) olor V 7 (5) (9) V 6 (5) (9)  Similarly, we assign colors to vertices at level 1. At Phase 1, we propagate colors of vertices at level 2 to their neighbors at level 1. Since the super-vertex {v 5 , v 6 , v 7 } has three upper-level neighbors (i.e., v 2 , v 3 , and v 4 ), its color is propagated to level 1; the result is shown in Figure 4.3(d). At Phase 2, super-vertices at level 1 collect their received colors and set their colors accordingly. Also the aggregator conducts the merging, and the result is shown in Figure 4.3(e).
Finally, we update colors of all vertices, and the result is shown in Figure 4.3(f). BCCs can be identified through the colors of vertices; that is, the set of vertices with the same color and their upper-level neighbors corresponds to a BCC. There are three BCCs, induced by vertices {v 1 , . . . , v 7 , v 9 , v 10 }, {v 4 , v 8 }, and {v 8 , v 11 , v 12 , v 13 }, respectively.

Correctness and Complexity Analysis
We prove the correctness and give complexity analyses of of our approach. Correctness. We prove the correctness of our approach by the following theorem. Theorem 4.2: Our vertex labeling approach labels all vertices, except the root vertex, of a BCC by a unique color. Proof Sketch: We first prove by induction that all vertices, except the root vertex, of a BCC are assigned to have the same color. Base Case: vertices in the BCC are at two different levels. Then, either there is only one vertex at the lower level, or there are multiple vertices at the lower level and they are connected to be a CC by edges. In both cases, all the vertices at the lower level would be assigned the same color. Inductive Case: assume it is true for any BCC with l different levels, now we prove it is true for any BCC with (l + 1) different levels. Without loss of generality, assume there are two CCs, cc 1 and cc 2 , defined by edges at level (l + 1). Then, all vertices in cc 1 would be assigned color a, and all vertices in cc 2 would be assigned color b. Moreover, a and b are propagated to vertices at level l; thus, they will be merged into a single color, which in turn finally updates the vertices at level (l + 1) to have the same color as vertices at level l by update-color. Thus, the claim is true. Now, we prove that, two vertices u and v that are non-root vertices of two BCCs, bcc 1 and bcc 2 , have different colors. The reason is that, the color of u is not propagated to the root of bcc 1 and the color of v is not propagated to the root of bcc 2 , and by removing the roots of bcc 1 and bcc 2 from the graph then u and v become disconnected. Thus, the color of u and the color of v will not meet at the same vertex and thus will not be merged. Consequently, the claim is true.
Thus, the theorem holds. Thus, Algorithm 5 correctly computes all BCCs of a graph. Complexity Analyses. Now, we analyze the complexities of our approach regarding #supersteps, total communication cost, and total computation cost in the following.

Experiments
We conduct extensive performance studies to evaluate the efficiency of our graph decomposition based approaches for computing CCs and for computing BCCs. Regarding computing CCs, we evaluate the following algorithms: • S-V: the algorithm proposed in [34].
• T-V(single-pivot): the combination of T-V algorithm in [34] with CC computation algorithm single-pivot.
Since most of the existing algorithms in our testings have been implemented in Pregel+ 1 , an open source C++ implementation of the Pregel system, we also implement our algorithms in Pregel+. Specifically, we implement single-pivot, GD-CC, GD-BCC, and T-V(single-pivot), while other algorithms are already in Pregel+. Note that, although T-V(single-pivot) is not considered in [34], it is very natural to adjust the T-V algorithm in [34] to use single-pivot for computing CCs. We compile all algorithms with GNU GCC with the -O3 optimization. Our experiments are conducted on a cluster of up to 25 Amazon EC2 r3.2xlarge instances (i.e., machines) with enhanced networking. Each r3.2xlarge instance has 4 cores and 60GB RAM and runs 4 workers; thus, we have up to 100 total workers. We evaluate the performance of all algorithms on both real and synthetic graphs as follows.

Experimental Results
Eval-I: Evaluating CC Computation Algorithms. The performance of the four CC computation algorithms on the four graphs is illustrated in Figure 5.1. Overall, our GD-CC approach performs the best and outperforms S-V, hash-min, single-pivot by up to 56, 5.6, 3.9 times, respectively (see Figure 5.1(a)). In  S-V hash-min single-pivot GD-CC   Eval-II: Influence of β on Our CC Computation Algorithm. It seems hard to give a theoretic result about the selection of β for our GD-CC algorithm. Thus, in this testing, we use experiments to select β, which decides how the number of selected seed vertices increases along with the supersteps. The results of running GD-CC with different β values are shown in Figure 5.2. In general, for small-diameter graphs, the total running time, communication volume, and #supersteps of GD-CC are not sensitive to β; this is because #supersteps on these graphs are O(δ) and they are small (see Figure 5.2(c)).
On the large-diameter graph YH, the #supersteps of GD-CC, which is O(log β n), decreases with larger β; however, the communication time increases with larger β due to selecting more seed vertices. As a result, the total running time of GD-CC first decreases and then increases with β = 2 as the turning point. Moreover, the #colors aggregated at the master worker increases for all graphs when β becomes larger than 2 (see Figure 5.2(d)). Therefore, we set β = 2 for GD-CC. One thing to notice is that when β = 2, the #colors aggregated at the master worker is only a small fraction (i.e., ≤ 0.4%) of the #vertices; thus, the merging of colors can be done through aggregator at the master worker.

Conclusion
In this paper, we proposed graph decomposition based approaches for distributed computing CCs and BCCs of a graph. Unlike existing approaches that have total data communication cost O(m × #supersteps), our new approaches have total data communication costs O(m). Moreover, the computation costs and #supersteps of our techniques are similar to (or even smaller than) those of the existing techniques, respectively. Experiments show that our approaches outperform existing approaches by generating much