Efficient Maximal Balanced Clique Enumeration in Signed Networks

Clique is one of the most fundamental models for cohesive subgraph mining in network analysis. Existing clique model mainly focuses on unsigned networks. In real world, however, many applications are modeled as signed networks with positive and negative edges. As the signed networks hold their own properties different from the unsigned networks, the existing clique model is inapplicable for the signed networks. Motivated by this, we propose the balanced clique model that considers the most fundamental and dominant theory, structural balance theory, for signed networks, and study the maximal balanced clique enumeration problem which computes all the maximal balanced cliques in a given signed network. We show that the maximal balanced clique enumeration problem is NP-Hard. A straightforward solution for the maximal balanced clique enumeration problem is to treat the signed network as two unsigned networks and leverage the off-the-shelf techniques for unsigned networks. However, such a solution is inefficient for large signed networks. To address this problem, in this paper, we first propose a new maximal balanced clique enumeration algorithm by exploiting the unique properties of signed networks. Based on the new proposed algorithm, we devise two optimization strategies to further improve the efficiency of the enumeration. We conduct extensive experiments on large real and synthetic datasets. The experimental results demonstrate the efficiency, effectiveness and scalability of our proposed algorithms.


INTRODUCTION
With the proliferation of graph applications, research efforts have been devoted to many fundamental problems in analyzing graph data [15,28,36,37,39,49,53,55]. Clique is one of the most fundamental cohesive subgraph models in graph analysis, which requires each pair of vertices has an edge. Due to the completeness requirement, clique model owns many interesting cohesiveness properties, such as the distance of any two vertices in a clique is one, every one vertex in a clique forms a dominate set of the clique and the diameter of a clique is one [38]. As a result, clique model has wide application scenarios in social network mining, financial analysis and computational biology and has been extensively investigated for decades. Existing studies on clique mainly focus on the unsigned networks, i.e., all the edges in the graph share the same property [4,12,13,51]. Unfortunately, relationships between two entities in many realworld applications have completely opposite properties, such as friend-foe relationships between users in social networks [11,23], support-dissent opinions in opinion networks [25], trust-distrust relationships in trust networks [26] and partnership-antagonism in protein-protein interaction networks [35]. Modelling these applications as signed networks with positive and negative edges allows them to capture more sophisticated semantics than unsigned networks [1,5,10,26,32,33]. Consequently, existing studies on clique ignoring the sign associated with each edge may be inappropriate to characterize the cohesive subgraphs in a signed network and there is an urgent need to define an exclusive clique model tailored for the signed networks.
For the signed networks, the most fundamental and dominant theory revealing the dynamics and construction of the signed networks is the structural balance theory [1,5,10,11,18,19,26,32,33]. The intuition underlying the structural balance theory can be described as the aphorisms: "The friend (resp. enemy) of my friend (resp. enemy) is my friend, the friend (resp. enemy) of my enemy (resp. friend) is my enemy". Specifically, a signed network G is structural balanced if G can be split into two subgraphs such that the edges in the same subgraph are positive and the edges between subgraphs are negative [18]. In a signed network, an imbalanced sub-structure is unstable and tends to evolve into a balanced state. Consider the graph G shown in Figure 1 (a). The negative edge between v 1 and v 2 makes G imbalanced. Closely observing G, we can find that v 1 and v 2 have a mutual "friend" v 3 and mutual "enemies" v 4 , v 5 and v 6 . It means v 1 and v 2 share more common grounds than differences. According to structural balance theory, v 1 and v 2 tend to be allies as time goes by. G ′ shown in Figure 1 (b) is the evolved balanced counterpart of G. In G ′ , the sign of the edge between v 1 and v 2 becomes positive. {v 1 , v 2 , v 3 } and {v 4 , v 5 , v 6 } form two alliances and the edges in the same alliance are positive and the edges connecting different alliances are negative. As illustrated in this example, structural balance reflects the key characteristics of the signed networks.
According to the above analysis, clique model is a fundamental cohesive subgraph model in graph analysis and can be used in many applications, but there is no appropriate counterpart in the signed networks. Meanwhile, the structure of the signed networks is expected to be balanced based on the structure balance theory. Motivated by this, we propose a maximal balanced clique model in this paper. Formally, given a signed network G, a maximal balanced clique C is a maximal subgraph of G such that (1) C is complete, i.e., every pair of vertices in C has an edge. (2) C is balanced, i.e., C can be divided into two parts such that the edges in the same part are positive and the edges connecting two parts are negative. This definition not only catches the essence of the clique model in the unsigned networks but also guarantees that a detected clique is stable in the signed networks. In this paper, we aim to devise efficient algorithms to enumerate all maximal balanced cliques in a given signed network. Applications. Maximal balanced clique enumeration can be used in many applications, for example: (1) Opinion leaders detection in opinion networks. Opinion leaders are people who are active in a community capturing the most representative opinions in the social networks [44]. Maximal balanced clique enumeration can be used to detect opinion leaders in the opinion networks. In an opinion network, each vertex represents a user and there is a positive/negative edge between two vertices if one user support/dissent another user. A maximal balanced clique in an opinion network represents a group of users that any two of them have an opinion with each other and can be further divided into two subgroups such that the intra-group users support each other and the inter-group users dissent each other. Since these users actively involve in the opinion networks (every two of them have an opinion with each other) and have their clear standpoints (support everyone in the same group and dissent everyone in the opposite group), the users in the maximal balanced cliques are good candidates of opinion leaders in the opinion network.
(2) Finding international alliances-rivalries groups. The international relationships between nations can be modeled as a signed network, where each vertex represents a nation, positive and negative edges indicate alliances and rivalries, respectively. Computing the maximal balanced cliques in such networks reveals hostile groups of allied forces, such as the Allied and Axis power during World War II or the North Atlantic Treaty Organization and the Warsaw Pact during the Cold War [3,11]. We can extend it to find the alliancesrivalries commercial groups among business organizations similarly, such as {Pepsi, KFC} vs {Coke, McDonald} [21].
(3) Synonym and antonym groups discovery. In a word network, each vertex represents a word and there is a positive edge between two synonyms and a negative edge between two antonyms [34]. In such signed networks, our model can discover synonym groups that are antonymous with each other, such as, {interior, internal, intimate} and {away, foreign, outer, outside, remote}. These discovered groups may be further used in applications such as automatic question generation [24] and semantic expansion [22]. Contributions. In this paper, we make the following contributions: (1) The first work to study the maximal balanced clique model. We formalize the balanced clique model in signed networks based on the structural balance theory. To the best of our knowledge, this is the first work considering the structural balance of the cliques in signed networks. We also prove the NP-Hardness of the problem.
(2) A new framework tailored for maximal balanced clique enumeration in signed networks. After investigating the drawbacks of the straightforward approach, we propose a new framework for the maximal balanced clique enumeration. Our new framework enumerates the maximal balanced cliques based on the signed network directly and its memory consumption is linear to the size of the input signed network.
(3) Two effective optimization strategies to further improve the enumeration performance. We explore two optimization strategies, inenumeration optimization and pre-enumeration optimization, to further improve the enumeration performance. The in-enumeration optimization can avoid the exploration for unpromising vertices during the enumeration while the pre-enumeration techniques can prune unpromising vertices and edges before enumeration. (4) Extensive performance studies on real and synthetic datasets. We conduct extensive experimental studies to evaluate the proposed algorithms on real and synthetic datasets, one of which contains 3 million vertices and 105 million edges. As shown in our experiments, the baseline approach only works on small datasets while our approach can complete the enumeration efficiently on both small and large datasets. Outline. Section 2 reviews the related work. Section 3 provides preliminaries including the definition of balanced clique model and problem statement. Section 4 introduces the baseline algorithm. Section 5 presents our new enumeration framework. Section 6 shows several optimization techniques. Section 7 reports the results of experimental studies. Section 8 concludes our paper.

RELATED WORK
Signed network analysis. Signed network analysis has attracted much attention in the literature. In these works, the theories explaining the potential social dynamics process in signed networks have been extensively studied. Among these theories, structural balance theory is the most fundamental and dominant one [58]. Structural balance theory is originally introduced in [19] and generalized in the graph formation in [5,18]. After that, structural balance theory is developed extensively [1,10,26,32,33]. In these works, it is interesting to mention that the authors in [32] model the evolving procedure of a signed network and theoretically prove that the network would evolve into a balanced clique when the mean value of the initial friendliness among the vertices µ ≤ 0. [58] provides a comprehensive survey on structural balanced theory.
Besides theories on signed networks, a large body of literature on mining signed networks has been emerged. Among them, the most closely related work to ours is [27] in which an (α, k)-clique model is proposed. Given a signed network G, an (α, k)-clique is defined as a maximal clique C such that the negative degree for each vertex in C is not greater than k and the positive degree for each vertex in C is not less than αk. Compared with our model, (α, k)clique model only considers the amount of positive and negative edges in the clique and the structural balance of the clique is totally ignored, which makes (α, k)-clique model essentially different from our model. In [17], a k-balanced trusted clique model is proposed. A k-balanced trusted clique is defined as a clique with k vertices consisting with positive edges only. Although the k-balanced trusted clique model has a similar name with our model, it ignores the negative edges in the clique, which means the information of the negative edges are totally missed.
Community detection in signed networks is also related to our work. For example, [8,16,[29][30][31]45] aim to find the antagonistic communities in a signed network. These works mainly focus on exploring several groups of dense subgraphs and most of them don't have a clear structural definition of their community model, while our work aims to enumerate the clique structure in a signed network. Moreover, these solutions generally involve a complicated optimization procedure, thereby, they are hard to handle large signed networks, while our proposed algorithm is scalable to enumerate all the maximal balanced cliques in large signed networks with hundreds of millions of edges as verified in our experiments. A survey on signed network mining can be found in [46]. Clique on unsigned networks. Clique model is one of the most fundamental cohesive subgraph models. [4] proposes an efficient algorithm for maximal clique enumeration based on backtracking search. [2] first considers the memory consumption during the maximal clique enumeration. Based on [4], more efficient algorithms for maximal clique enumeration are investigated [12,13,47]. [12] proposes a novel branch pruning strategy, pivot pruning, which can efficiently reduce the search space by ignoring the search process from the neighbors of the pivot. [57] studies the maximal biclique enumeration problem on bipartite graphs. [57] keeps growing the vertex set in one side and peeling the vertex set in another side to enumerate the maximal biciques. It also utilizes some techniques to further improve the enumeration performance, such as choosing vertex with small degree from candidate set to reduce the search tree depth and pruning vertices which may produce non-maximal bicliques. These techniques for biclique enumeration inspire our techniques presented in Section 6.1. [14] reviews recently advances in maximal clique enumeration. Based on clique, other cohesive subgraph models are also studied recently, such as k-core [43], ktruss [9,20], k-edge connected component [52,54,59], (r, s)-nuclei [40,41]. Note that our balanced clique model is different from the existing cohesive subgraph models on unsigned networks and it cannot be well solved by the existing works. If we just consider the positive edge in the signed network and use the traditional methods on unsigned networks for community detection, the found results would ignore the negative edges and half meaningful information in the signed network is lost.

PROBLEM STATEMENT
In this paper, we consider an undirected and unweighted signed network G = (V , E + , E − ), where V denotes the set of vertices, E + denotes the positive edges and E − denotes the negative edges connecting the vertices in G. We denote the number of vertices and number of edges by n and m, respectively, i.e., n = |V | and m = |E . For simplicity, we omit G in the notations if the context is self-evident.
Definition 3.2. (Maximal Balanced Clique) Given a signed network G = (V , E + , E − ), a maximal balanced clique C is a maximal subgraph of G that satisfies the following constraints: • • Balanced: C is balanced, i.e, it can be split into two sub- In this paper, we aim to enumerate all maximal balanced cliques in a given signed network. Since many real applications require that the number of vertices in C L and C R is not less than a fixed threshold, we add a size constraint on |C L | and |C R | s.t. |C L | ≥ k and |C R | ≥ k. With the size constraint, users can control the size of the returned maximal balanced cliques based on their specific requirements. We formalize the studied problem as follows: Problem Statement. Given a signed network G and an integer k, maximal balanced clique enumeration (MBCE) computes all the maximal balanced cliques C in G s.t. |C L | ≥ k and |C R | ≥ k for C.  Problem Hardness. The MBCE problem is NP-Hard, which can be proved following the NP-Hardness of maximal clique enumeration problem [6,42]. Given an unsigned network G = (V , E), we can transfer G to a signed network G ′ as follows: we first keep all the vertices of G in G ′ and all the edges of G as positive edges in G ′ ; then, we add a new vertex v to G ′ and connect v to all the remaining vertices in G ′ with negative edges. It's clear that each maximal clique C in G corresponds a maximal balanced clique {{v}, C} in G ′ (assume k = 1), and vice versa, which means the maximal clique enumeration problem in G can be reduced to the MBCE problem in G ′ . As the maximal clique enumeration problem is NP-Hard [6,42], our problem is also NP-Hard.

A BASELINE ALGORITHM
We first propose a baseline algorithm to address MBCE problem based on existing methods for maximal clique enumeration [13] and maximal biclique enumeration [57] in unsigned networks. For a signed network G = (V , E + , E − ), we can treat it as the combination of two unsigned networks G + = (V , E + ) and is a clique in G + and the subgraph induced by vertices in C L and C R in G − is a biclique. Therefore, we can enumerate the maximal balanced cliques in G in two steps: 1) compute all the maximal cliques in G + with [13]; 2) for each pair of the computed maximal cliques C i and C j in G + , compute the maximal bicliques in the bipartite subgraph induced by the vertices in C i and C j in G − with [57]. The returned maximal bicliques in G − are the maximal balanced cliques in G. The pseudocode of Baseline solution is shown in Algorithm 1. As the pseudocode is self-explained, we omit the description. Note that although all the maximal cliques in G + are enumerated in line 1 of Algorithm 1, Algorithm 1 does not require that the two component cliques of a maximal balanced clique are maximal in G + . Algorithm 1 just considers all maximal cliques as candidate subgraphs for further processing in step 2.
Example 4.1. Consider G in Figure 2, assume k = 2, Baseline first enumerates all the maximal cliques in G + with size not less than 2, such as 13 , v 15 }. After that, for each pair of computed maximal cliques, computes the maximal bicliques in the induced bipartite subgraph in G − .
1: enumerate the maximal cliques in G + = (V , E + ) with size not less than k by [13]; 2: for each pair of computed maximal cliques C i and C j do 3: enumerate the maximal bicliques in the bipartite subgraph induced by C i and C j in G − = (V , E − ) with size not less than k for both two parts by [57]; 4: remove the duplicate bicliques computed in line 3; 5 , v 6 }} which correspond C 1 and C 2 in G, respectively. The remaining maximal balanced cliques can be enumerated similarly. Proof. We first prove that all the maximal balanced cliques in G are found. Based on Definition 3.2, if there exists a maximal As Baseline considers all the maximal cliques in G + , C L and C R are not missed in step 1. Following Definition 3.2, C L and C R form a maximal biclique in G − . In step 2, Baseline enumerates all the maximal bicliques in the induced subgraph in G − by every pair of enumerated maximal cliques in step 1. Thus, Baseline can find all the maximal balanced cliques in G. Moreover, as a maximal balanced clique maybe contained in multiple pairs of maximal cliques, Baseline removes all the duplicates in line 4. Therefore, Baseline outputs each maximal balanced clique once. The theorem is proved. □ Drawbacks of baseline. Since Baseline does not consider the uniqueness of the signed networks and processes MBCE with the techniques for the unsigned networks, it has two drawbacks: • Memory consumption. Baseline has to store all the maximal cliques in G + in memory. The number of maximal cliques could be exponential to the number of vertices [12], which makes Baseline unable to handle large networks.

A NEW ENUMERATION FRAMEWORK
Revisiting baseline, the root leading to its drawbacks discussed above is that it treats the signed network as a specific combination of two unsigned networks and utilizes the existing techniques designed for the unsigned networks. Therefore, we have to explore new techniques by considering the uniqueness of signed networks to overcome the drawbacks of Baseline and improve the efficiency 10: if P L = ∅ and P R = ∅ and Q L = ∅ and Q R = ∅ then 11: if |C L | ≥ k and |C R | ≥ k then 12: output C = {C L , C R }; 13: return 14: Flag ←!Flag; 15: if Flag then 16: for each v ∈ P L do 17: for each v ∈ P R do 20: of the enumeration. In this section, we present a new enumeration framework which aims to address the memory consumption problem. In next section, we further optimize the enumeration framework to improve the efficiency.
Lemma 5.1. Given a signed network G, for a balanced clique C = Proof. It can be proved following Definition 3.2 directly. □ According to Lemma 5.1, if we maintain a balanced clique C = {C L , C R }, let P L be the set of vertices that are positive neighbors of all the vertices in C L and negative neighbors of all the vertices in C R , let P R be the set of vertices that are positive neighbors of all the vertices in C R and negative neighbors of all the vertices in C L , we can enlarge C by adding vertices from P L and P R into C L and C R , respectively. Furthermore, if we update the P L and P R based on the new C L and C R accordingly and repeat the above enlargement procedure, we can obtain a maximal balanced clique when no more vertices can be added into C L or C R . Algorithm. Following the above idea, our algorithm for MBCE is shown in Algorithm 2. For each vertex v i in G (line 2), we enumerate all the maximal balanced cliques containing v i (line [3][4][5][6][7][8]. Note that v 0 , v 1 , . . . , v n are in the degeneracy order [48] of G. We use C L and C R to maintain the balanced clique, which are initialized with v i and ∅, respectively (line 3). Similarly, we also initialize P L and P R as discussed above (line [4][5]. Moreover, we use Q L and Q R to record the vertices that have been processed to avoid outputting duplicate maximal balanced cliques (line 6-7). After initializing these six sets, Procedure MBCEnumUtil performs the maximal balanced clique enumeration based on the given six sets. If P L , P R , Q L and Q R are empty, which means current balanced clique C = {C L , C R } cannot be enlarged and it is a maximal balanced clique, MBCEnumUtil checks whether C L and C R satisfy the size constraint. If the size constraint is satisfied, it outputs the maximal balanced clique C (line [11][12]. Otherwise, MBCEnumUtil adds a vertex from P L to C L , updates the corresponding P L , P R , Q L and Q R , and recursively invokes itself to further enlarge the balanced clique (line 17). When v ∈ P L is processed, v is removed from P L and added in Q L (line 18).
Similar processing steps are applied on vertices in P R (line [19][20][21]. Variable Flag (line 1) is used to control the order of adding new vertex into C L or C R . With the switch operation in line 14, we can guarantee that we add vertex into C L , then into C R , recursively. Correctness of Algorithm 2. We show the correctness of Algorithm 2 from three aspects: (1) the balanced clique outputted in line 12 is maximal. Assume that a balanced clique C outputted in line 12 is not maximal, then based on the vertices maintained in P L and P R regarding C, at lease P L or P R is not empty, which contradicts with the outputting condition in line 10. Therefore, the balanced clique outputted in line 12 is maximal. A special case that needs to note is the balanced clique exploration caused by the initialization of P L and P R . For a vertex v i , its positive (negative) neighbors in v 0 , · · · , v i−1 are not added into P L (P R ). As a result, for a maximal balanced clique C containing v i and other vertices in v 0 , · · · , v i−1 , due to the initialization of P L and P R , the vertices in v 0 , · · · , v i−1 are not contained in C in Algorithm 2, and P L and P R are empty regarding C in line 10. However, in this case, Q L or Q R is not empty and C still cannot be outputted based on the condition in line 10. (2) Algorithm 2 outputs all the maximal balanced cliques in G. In line 2, Algorithm 2 visits each vertex v i . Based on the recursive structure of MBCEnumUtil, all the maximal balanced cliques containing v i are explored. Therefore, it can be proved. (3) No duplicate maximal balanced cliques are outputted in Algorithm 2. During the recursive enumeration procedure, when we finish the maximal balanced clique enumeration containing a vertex v, we add the vertex into Q L (line 6, line 18) or Q R (line 7, line 21). Therefore, when we explore a maximal balanced clique C containing a vertex v i and C has been outputted when processing v j (j < i). Then, v j will be in Q L or Q R in line 10 and C will not be outputted duplicately. Combining above three aspects together, the correctness of Algorithm 2 is proved.
Example 5.2. The enumeration procedure of MBCEnum can be illustrated as a search tree. Figure 3 shows part of the search tree when we conduct the MBCE on G in Figure 2 through MBCEnum. S 1 , S 2 , . . . represent different search states during the enumeration. At S 1 , we assume that we have a balanced clique C = {C L = {v 0 , v 1 }, C R = {v 5 , v 6 }}, P L ={v 2 , v 3 }, P R ={v 7 , v 8 } at this state. We first grow search branch by adding v 2 from P L into C L . Since v 7 and v 8 are not v 2 's negative neighbors, they are removed from P R at S 2 . Because P R is empty at S 2 , we keep expending C L by adding v 3 from P L . At S 3 , P L , R L , Q L and Q R are empty, we obtain a maxi- 5 , v 6 }} and this search branch starting from v 2 finishes. We return back to S 1 and v 2 is moved to Q L . Then, we add v 3 into C L at S 4 and add v 7 into C R at S 5 and obtain C 1 = {{0, 1, 3}, {5, 6, 7}}. The search continues in a similar way until all the vertices in P L and P R at S 1 are explored.
Based on Algorithm 2, it is clear that the memory consumption of our enumeration framework is linear to the size of the input signed network. Therefore, the drawback of large memory consumption in Baseline is avoided.

OPTIMIZATION STRATEGIES
Although Algorithm 2 addresses the memory consumption problem in MBCE, the efficiency of Algorithm 2 is disappointing. In this section, we present two optimization strategies, namely inenumeration optimization and pre-enumeration optimization, to further improve the efficiency of the enumeration.

In-Enumeration Optimization
Branch Pruning. Branch pruning aims to prune the unfruitful branches in the search tree of Algorithm 2 to improve the performance. Pivot Choosing. Consider the maximal balanced clique search procedure of Algorithm 2, assume that we currently have C L , C R , P L and P R , and we add a vertex v from P L to C L in line 17. After finishing the search starting from v, we do not need to further explore the positive neighbors of v in the for loop of line 16 and the negative neighbors of v in the for loop of line 19. The reasons are as follows: w.o.l.g, let v ′ be a positive neighbor of v, although we skip the maximal balanced clique search starting from v ′ , these maximal balanced cliques containing v ′ must be explored by the searching branches starting v or neighbors of v ′ . Therefore skipping the search starting from v's neighbors does not affect the correctness of Algorithm 2.
In this paper, to maximum the benefits of pivot technology, we define the local degree for a vertex v ∈ P L ∪Q L ( Candidate Selection. In the search procedure of Algorithm 2, heuristically, search starting from a vertex with small local degree will have a short and narrow search branch, which means the search starting from the vertex will be finished very fast. Moreover, due to the search finish of the vertex, the vertex will be added into the excluded set and it can be used to further prune other search branches. Therefore, instead of adding vertices from P L and P R into C L and C R randomly in line 16 and 19 of Algorithm 2, we add vertices in the increasing order of their local degrees. 1: line 1-7 of Algorithm 2; 2: MBCEnumUtil * (C L , C R , P L , P R , Q L , Q R ); 3: Procedure MBCEnumUtil * (C L , C R , P L , P R , Q L , Q R ) 4: line 10-13 of Algorithm 2; 5: if |C L | + |P L | < k or |C R | + |P R | < k then 6: return; Early Termination. We consider different conditions that we can terminate the search early in Algorithm 2. For a balanced clique C = {C L , C R }, the maximal possible size of C L (C R ) for the final maximal balanced clique is |C L | + |P L | (|C R | + |P R |). Based on the size constraint of k, we have the following rule: • ET Rule 1: If |C L | + |P L | < k or |C R | + |P R | < k, we can terminate current search directly.
In Algorithm 2, we use Q L and Q R to store such vertices that the maximal balanced cliques containing them have been enumerated. Therefore, during the enumeration, if there exists a vertex v ∈ Q L (Q R ) such that P L (P R ) ⊆ N + G (v) and P R (P L ) ⊆ N − G (v), then we can conclude that the maximal balanced cliques have been enumerated. Following this, we have our second rule: • ET Rule 2: If ∃v ∈ Q L , s.t., , then we can terminate current search directly.
In a certain search of Algorithm 2, if all the vertices in P L (P R ) consist a clique formed by positive edges and every vertex in P L (P R ) has negative edges to all the vertices in P R (P L ), then P L and P R consist a balanced clique. Then, based on Definition 3.2, C L ∪ P L and C R ∪ P R consist a maximal balanced clique. Therefore, we have our third early termination rule: • ET Rule 3: If ∀p l ∈ P L , s.t., P L ⊆ {{p l } ∪ N + G (p l )} and P R ⊆ N − G (p l ) and ∀p r ∈ P R , s.t., P R ⊆ {{p r } ∪ N + G (p r )} and P L ⊆ N − G (p r ), we can output C = (C L ∪ P L , C R ∪ P R ) and terminate current search directly.
Note that, in order to avoid outputting duplicate maximal balanced cliques, ET Rule 3 must be applied after ET Rule 2.
Algorithm. The maximal balanced clique enumeration algorithm with in-enumeration optimization strategies is shown in Algorithm 3. Since the pseudocode is self-explained, we omit the detailed description here. Theorem 6.1. Given a signed network G, the time complexity of Algorithm 3 to enumerate the maximal balanced cliques in G is O(σn · 3 σ /3 ), where σ is the degeneracy number of G.
Proof. Given a graph G, the degeneracy number of G is σ 1 . Let P = P L ∪ P R , Q = Q L ∪ Q R , we first prove the size constraint for P. In line 2 of Algorithm 2, we iterates v i in the degeneracy order of G and vertices with a lower order than v i are not included in P. Therefore, for P regarding v i , we have |P | ≤ σ . Then, we analyse the time complexity of MBCEnumUtil * . In detail, ET Rule 1 can be done in O(1) time. For ET Rule 2, we need to get local neighbors (within P) for each vertex in Q, it costs O(|Q ||P |) time. Similarly, for ET Rule 3, the time complexity for getting local neighbors for vertices in P is O(|P | 2 ). Moreover, pivot selection and candidates sort consume O(|P |+|Q |) time and O(|P | log |P |) time, respectively, based on above computation. So far, the time complexity is O(|P |(|P | + |Q |)). And because each recursion for MBCEnumUtil * can invoke at most |P | further recursion, so the further time complexity is O(|P | 2 (|P | + |Q |)). Now, we formulate the time complexity function for MBCEnumUtil * with parameters |P | and |Q |,

Pre-Enumeration Optimization
In pre-enumeration optimization, we aim to remove the unpromising vertices and edges that not contained in any maximal balanced cliques based on their structural information. We explore two optimization strategies based on the neighbors of a vertex and the common neighbors of an edge. Vertex Reduction. To reduce the size of a signed network, we first consider the neighbors of each vertex v, i.e., N + G (v) and N − G (v) to remove the unpromising vertices. We first define: Definition 6.2. ((l, r )-signed core) Given a signed network G = (V , E + , E − ), two integers l and r , a (l, r )-signed core is a maximal subgraph C of G, s.t., Lemma 6.3. Given a signed network G and threshold k, a maximal balanced clique satisfying the size constraint with k is contained in a (k − 1, k)-signed core.
Proof. We can prove it by contradiction. Assume there is a vertex v in a maximal balanced C satisfying the size constraint with 1 Given a graph G, its degeneracy number, namely σ , is the least d such that the vertices of G can be arranged in a sequence so that each vertex is adjacent to at most d of the vertices that follow it in the sequence [48]. : Vertex Reduction and Edge Reduction k but not in a (k − 1, k)-signed core. Based on Definition 3.2, the positive degree of v in C is not less than k −1 and the negative degree of v in C is not less than k. This contradicts with our assumption. Thus, the lemma holds. □ Therefore, in order to compute the maximal balanced cliques in a given signed network G with integer k, we only need to compute the maximal balanced cliques in the corresponding (k − 1, k)-signed core of G. The remaining problem is how to efficiently compute the (k − 1, k)-signed core. We propose a linear algorithm to address this problem, which is shown in Algorithm 4. Algorithm. Based on Definition 6.2, to compute the (k − 1, k)signed core in the signed network G, we only need to identify the vertices like v with d + G (v) < k − 1 or d − G (v) < k and remove them from G. Due to the removal of such vertices, more vertices will be with positive degree less than k − 1 or negative degree less than k, we can further remove these vertices until no such kind of vertices exist in G. Following this idea, in Algorithm 4, we first identify a vertex v with d + Since v will be removed from G, we decrease the positive degree by 1 for each positive neighbor of v (line 2-3) and decrease the negative degree by 1 for each negative neighbor of v (line [4][5]. Then, we remove v from G (line 6). The algorithm terminates when no vertex with . It is clear that Algorithm 4 correctly computes the (k − 1, k)-signed core of G. And we have the following theorem regarding its efficiency. Proof. In Algorithm 4, we use a queue to store vertices that should be removed in line 6. Since every vertex is pushed in and popped from the queue at most once, the total processing time for this part is O(n). Moreover, when a vertex is removed, we have to update the degrees for their neighbors once, the total time cost is O(m). Therefore, the time complexity of Algorithm 4 is O(n+m). □ Example 6.5. Let k = 2, Figure 4 shows an example of vertex reduction by Algorithm 4 on the signed network G in Figure 2. Edge Reduction. In this part, we explore the opportunities to remove unpromising edges with respect to MBCE by considering the common neighbors of an edge formed by different types of edges. Specifically, for a positive/negative edge (u, v), we define the edge common neighbor number:

PERFORMANCE STUDIES
In this section, we present our experimental results. All the experiments are performed on a machine with two Intel Xeon 2.2GHz CPUs and 64GB RAM running CentOS 7. Algorithms. We compare three algorithms: Baseline , MBCEnum and MBCEnum * . Baseline is the baseline solution shown in Section 4. MBCEnum is our algorithm shown in Section 5. MBCEnum * is the algorithm with the in-enumeration optimization shown in Section 6.1. Note that the pre-enumeration optimization strategies can be also used in Baseline and MBCEnum, thus, we apply them for all three algorithms for fairness. All algorithms are implemented in C++, using g++ complier with -O3. The time cost is measured as the amount of wall-clock time elapsed during the program's execution. If an algorithm cannot finish in 12 hours, we denote the processing time as INF. We evaluate our algorithms on real and synthetic signed networks. Real datasets. We first evaluate our algorithms on five real datasets. Slashdot and Epinions are signed networks in real world. Adj-WordNet, DBLP and Douban are signed networks used in [8], [27] and [50], respectively. Slashdot and Epinioins are downloaded from SNAP (http://snap.stanford.edu). Douban is from authors in [50]. AdjWordNet is downloaded from WordNet (https://wordnet. princeton.edu/). DBLP is downloaded from KONECT (http://konect. uni-koblenz.de/) and processed the same as shown in [27]. The details of each dataset are shown in Table 1. Exp-1: Efficiency when varying k. In this experiment, we evaluate the efficiency of three algorithms when varying k from 4 to 10 and the results are shown in Figure 6.
As shown in Figure 6, Baseline consumes the most time among three algorithms on all datasets when we vary k and it can only handle the small datasets. For example, on Slashdot (Figure 6 (a)),      MBCEnum and MBCEnum * are at least two orders of magnitude faster than Baseline . On Douban (Figure 6 (d)), Baseline cannot finish the enumeration in 12 hours. This is because Baseline does not consider the uniqueness of the signed networks and lots of unnecessary computations are involved in the enumeration of Baseline . MBCEnum is faster than Baseline on most of the test cases as MBCEnum takes the uniqueness of the signed networks into consideration and enumerates the maximal balanced cliques based on the signed network directly. MBCEnum * is the most efficient algorithm on all datasets when varying k due to the utilization of in-enumeration optimization strategies, which reveals the effectiveness of in-enumeration optimization strategies. Another phenomena shown in Figure 6 is that the running time of all algorithms decreases as k increases. This is because as k increases, the pruning power of the optimization strategies proposed in Section 6 strengthens. Exp-2: Evaluation of the pre-enumeration optimization. In this experiment, we evaluate the effectiveness and efficiency of the pre-enumeration optimization strategies proposed in Section 6.2. We report the number of pruned edges for VertexReduction and the sum of pruned edges for VertexReduction and EdgeReduction when varying k in Figure 7. Figure 8 shows the running time.  Figure 7, for VertexReduction, as k increases, the number of pruned edges increases as well. This is because as k increases, more vertices are not contained in the corresponding (k − 1, k)-signed core. These vertices together with their incident edges are pruned. Figure 7 also reveals that EdgeReduction prunes much more edges than VertexReduction. This is because EdgeReduction adopts a more restrict pruning condition. Figure 8 shows that as k increases, the running time of VertexReduction increases. Since as k increases, more vertices are explored by VertexReduction. On the other hand, the running time of them together decreases. This is because as k increases, more vertices are pruned by VertexReduction. As a result, EdgeReduction takes less time to conduct the pruning. Meanwhile, EdgeReduction is more time-consuming compared with VertexReduction. Thus, the running time together decreases. Exp-3: Scalability testing. In this experiment, we test the scalability of MBCEnum and MBCEnum * on two large datasets DBLP and Douban by varying their vertices from 20% to 100%. Figure 9 shows the results.
As shown in Figure 9, when n increases, the running time of both algorithms increases as well, but MBCEnum * outperforms MBCEnum for all cases on both datasets. For example, on DBLP, when we sample 20% vertices, the running time of MBCEnum and MBCEnum * is 0.6 seconds and 0.5 seconds, respectively, while when sampling 80% vertices, their running times are 770.6 seconds and 4.0 seconds, respectively. It shows that MBCEnum * has a good scalability in practice. Exp-4: Case study on AdjWordNet. In this experiment, we perform a case study on the real dateset AdjWordNet. In this dataset, two synonyms have a positive edge and two antonyms have a negative edge, and Table 2 shows some results obtained by our algorithm. As shown in Table 2, words in C L or C R have similar meaning while each word from C L is an antonym to all words in C R . This case study verifies that maximal balanced clique enumeration can be applied in the applications to find synonym and antonym groups on dictionary data. Exp-5: Efficiency on synthetic datasets. In this experiment, we evaluate our algorithms on synthetic datasets. We use the synthetic signed network generator, SRN, to generate the synthetic datasets with default settings [45,56]. We generate four synthetic signed networks SN1-4 (details in Table 3) in different sizes and evaluate the efficiency of MBCEnum * and MBCEnum on SN1-4 similarly as Exp-1. The results are shown in Figure 10.
As shown in Figure 10, the trends on the synthetic datasets are similar to that on the real datasets. MBCEnum * outperforms MBCEnum when we vary k, especially when k is small.

CONCLUSIONS
In this paper, we study the maximal balanced clique enumeration problem in signed networks. We propose a new enumeration algorithm tailored for signed networks. Based on the new enumeration algorithm, we explore two optimization strategies to further improve the efficiency of the enumeration algorithm. The experimental results on real and synthetic datasets demonstrate the efficiency, effectiveness and scalability of our solution.