Computing Paths in Massive Graphs

Publication Type:
Thesis
Issue Date:
2023
Full metadata record
Graph is a widely used data structure for representing real-world entities and their relationships. Graph queries play an important role in the processing of graphs, and path queries are one of the most important types of query operations on graphs. A path query attempts to retrieve the path from one vertex to another, and has numerous real-world applications. Given the importance of path queries in real-world graphs, this thesis focuses on investigating efficient methods for computing path-related queries in massive graphs. First, we study how to efficiently process reachability queries on distributed graphs, which check whether there is a path from one vertex to another. Real-world massive graphs are typically distributed across multiple data centers due to their size. When performing reachability queries on these distributed graphs, reachability labeling methods ensure fast query processing through lightweight indexes. We propose a filtering-refinement framework for index construction. In addition, we design distributed labeling algorithms and batch-processing techniques to improve efficiency. Second, we study shortest path queries on complex graphs, which return the shortest path between two vertices. To handle shortest path queries, one option is to use traversal-based methods (e.g., breadth-first search); another option is to use extension-based methods. These two approaches make different trade-offs regarding query time and space cost, but it lacks a comprehensive study of their performance on real-world graphs. Moreover, extension-based approaches use additional attributes to extend the index, resulting in high space costs after extension. To address these issues, we thoroughly compare the two approaches and propose a new extension-based approach, Monotonic Landmark Labeling ($\MLL$), to reduce the required space cost while guaranteeing query time. Third, we study label-constrained shortest path queries on road networks, which request the shortest path between two vertices in labeled graphs such that all edges on the path satisfy the label constraint. Computing the shortest path between two vertices is a fundamental problem for road networks. Most existing works assume that edges in the road networks do not have labels, but in many real applications, the edges have labels, and label constraints may be placed on the edges of a valid shortest path. Therefore, we study the label-constrained shortest path queries. To process these queries efficiently, we adopt an index-based approach and propose efficient algorithms for query processing and index construction with good performance guarantees. We conduct extensive experiments for evaluation. The results validate the effectiveness and efficiency of our proposed methods.
Please use this identifier to cite or link to this item: