Cohesive Subgraph Search in Big Graphs

Publication Type:
Thesis
Issue Date:
2020
Full metadata record
Graphs are widely used to model relationships in various applications, such as social science, biology, information technology, to name a few. Mining cohesive subgraphs is one of the fundamental problems in graph analytics, where the main aim is to find subgraphs with well-connected graph nodes/vertices. A variety of models have been proposed to capture the cohesiveness of subgraphs with different constraints. In this thesis, we study three cohesive subgraph models to investigate various real-life applications better. Firstly, we would like to detect the critical users whose leave will break the user engagement of the network, i.e., lead many other users to drop out. Accordingly, we propose the collapsed 𝘬-truss problem: detect 𝘣 vertices from a graph 𝘎, whose removal will lead to the smallest size 𝘬-truss, i.e., identifying some specific users to strengthen the user engagement of the network/graph. From the theoretical side, we deliver the complexity of this problem: NP-hard and in-approximate. From the practical side, we propose an efficient algorithm that can accelerate the computation by vitally reducing the number of candidates. Extensive experiments on real-life networks (graphs) demonstrate the effectiveness and efficiency of our proposed algorithm. Secondly, we study the minimum 𝘬-core search problem. Given a graph 𝘎, an integer 𝘬 and a set of query nodes 𝘘 = {𝘲}, we aim to find the smallest size of 𝘬-core subgraph containing all the query node 𝘲 ∈ 𝘘. As one of the most representative cohesive subgraph models, 𝘬-core model has recently received significant attention. It has been shown that this problem is NP-hard with a huge search space, and it is very challenging to find the optimal solution. There are several heuristic algorithms for this problem, but they rely on simple scoring functions, and there is no guarantee as to the size of the resulting subgraph compared with the optimal solution. Our empirical study also indicates that the size of their resulting subgraphs may be large in practice. In this thesis, we develop an effective and efficient progressive algorithm, namely PSA, to provide a good trade-off between the quality of the result and the search time. Novel lower and upper bound techniques for the minimum 𝘬-core search are designed. Our extensive experiments on several real-life graphs demonstrate the effectiveness and efficiency of the new techniques. Finally, we investigate the fortress-like cohesive subgraph, 𝘱-cohesion. Morris defines the 𝘱-cohesion by a connected subgraph in which every vertex has at least a fraction 𝘱 of its neighbors in the subgraph, i.e., at most a fraction (1 βˆ’ 𝘱) of its neighbors outside. We can find that a 𝘱-cohesion ensures not only inner-cohesiveness but also outer-sparseness. The textbook on networks by Easley and Kleinberg shows that 𝘱-cohesions are fortress-like cohesive subgraphs that can hamper the cascade’s entry, following the contagion model. Despite the elegant definition and promising properties, there is no existing study on 𝘱-cohesion regarding problem complexity and efficient computing algorithms to our best knowledge. In this thesis, we fill this gap by conducting a comprehensive theoretical analysis of the problem’s complexity and developing efficient computing algorithms. We focus on the minimal 𝘱-cohesion because they are elementary units of 𝘱-cohesions and the combination of multiple minimal 𝘱-cohesions is a larger 𝘱-cohesion. We demonstrate that the discovered minimal 𝘱-cohesions can be utilized to solve the MinSeed problem: finding the smallest set of initial adopters (seeds) such that all the network users are eventually influenced under the contagion model. Extensive experiments on several real-life social networks verify this model’s effectiveness and the efficiency of our algorithms.
Please use this identifier to cite or link to this item: