Efficient Connectivity Analysis and Path Management in Massive Graphs

Publication Type:
Thesis
Issue Date:
2023
Full metadata record
Graph analysis serves as a cornerstone in numerous real-world applications, spanning domains like social networks, knowledge graphs, fraud detection, and trajectory monitoring. In this thesis, we focus on three fundamental problems in graph analysis: reachability queries, path enumeration, and path compression, each nuanced by distinct constraints within expansive graphs. Our work advances a suite of innovative and efficient strategies to address these challenges. The first problem considers the reachability between nodes in a temporal graph, encapsulating connectivity insights. Specifically, vertices in a temporal graph are connected by edges with time stamps, and there could be multiple edges connecting two nodes. Existing approaches assume a time-respecting path, which fails to capture relationships between entities in the same group or activity. To address this, we propose the span-reachability model, which identifies relationships between entities in a given time period. We adopt a two-hop cover approach and propose an index-based method to efficiently answer span-reachability queries. In the second problem, we further explore the paths between given nodes to explore the connectivity in detail. We delve into the exploration of hop-constrained time interval $s$-$t$ path enumeration in temporal graphs. To effectively tackle this challenge, we first propose a data structure named TIPST bundle to avoid repeated visits and maintain intermediate results in a compact way. It leads to a significant reduction in space complexity for each bundle. We then propose an algorithm named TDDL-DFS leveraging the distance labels. In the search process, these labels are dynamically updated to prune the fruitless branches, which is a is a polynomial delay algorithm. The third problem deals with path compression in large graphs to handle the considerable scale of path sets as query results. The challenge comes with the numerous number of regularly generated paths in networks. We propose the Overlap-Free Frequent Subpath (OFFS) compression method that allows retrieval of any individual path while compressing the overall size. We leverage a lookup table to match frequent common subpaths to supernodes and adopt a bottom-up framework to construct the lookup table in given iterations. Each path is shortened by replacing subpaths with corresponding supernodes in the table. Several optimizations are proposed to improve the compression ratio and speed. We conduct extensive experiments on real-world datasets to demonstrate the effectiveness and efficiency of our proposed approaches. Our approaches outperform baseline methods significantly in terms of query answering time, compression ratio, and enumeration time.
Please use this identifier to cite or link to this item: