Efficient Substructure Mining in Large Networks

Publication Type:
Thesis
Issue Date:
2023
Full metadata record
Graph models have been widely applied to represent the relationships between objects or entities. In graph models, the objects or entities are represented by vertices, and their relationships are represented by edges. In graph analysis, a subset of vertices and edges with distinct patterns form a substructure in the graph. Cohesive subgraphs and the shortest paths are two typical types of substructures that are widely used in many real-world applications. Given the importance of these substructures, in this thesis, we study the problems of mining them on large graphs. Firstly, we study the cohesive subgraph substructures. We propose a novel cohesive subgraph model named the statistically significant cliques to fill the gap where most cohesive subgraph models do not consider the statistical significance. We propose an efficient branch-and-bound method with carefully designed pruning techniques to compute the maximal significant cliques. Secondly, we investigate the shortest path substructures. We specifically study the shortest path counting problem, which is an important closeness metric and also serves as the building block for betweenness centrality. We observe the limitations of the state-of-the-art method and the opportunities to improve. A more advanced index structure based on tree decomposition is designed for the shortest path count computation. We also provide efficient algorithms to construct such an index. Thirdly, we investigated the dynamic updating of our proposed index in response to graph updates. To achieve this, we developed a basic updating method that identifies the affected area in the index and updates the labels without requiring a complete recomputation. We then proposed enhanced algorithms to expedite these updates. We conduct extensive experiments, and the results validate the effectiveness and efficiency of our proposed methods.
Please use this identifier to cite or link to this item: