Measuring Graphs with Shortest Distances

Publication Type:
Issue Date:
Full metadata record
Shortest distances characterize the pair-wise relationships among nodes in a graph. Given a graph with a node set and an edge set, the shortest distance between two nodes is defined as the minimum path length between them. Computing the shortest distance between two nodes is a fundamental operation of graphs, which can be used both as a primary function and as a building block for applications. Given the important roles of shortest distances, this thesis focuses on the efficient computation of shortest-distance-based measures on graphs. Firstly, we investigate how to accelerate the index time of 2-hop labeling. 2-hop labeling approaches are widely adopted to speed up the online performance of shortest distance queries. The construction of the 2-hop labeling, however, can be exhaustive especially on big graphs. For a major category of large graphs, small-world networks, the state-of-the-art approach is Pruned Landmark Labeling (PLL). However, PLL’s strong sequential nature hinders it from being parallelized. It becomes an urgent issue on massive small-world networks whose index can hardly be constructed by a single thread within a reasonable time. Based on the dependency analysis of PLL, we propose a Parallelized Shortest-distance Labeling (PSL) scheme to exploit parallelism to shorten the index time. Secondly, we study how to reduce the index size of 2-hop labeling. While the index time can be shortened by parallelized labeling, the index size becomes the bottleneck for a massive real graph with a relatively large treewidth — 2-hop labeling can hardly be constructed due to the oversized index. We disclose the theoretical relationships between the graph treewidth and 2-hop labeling’s index size and query time. To scale up distance labeling, we propose Core-Tree (CT) Index to dramatically reduce the index size, thereby enabling CT-Index to handle massive graphs that no existing approaches can process. Thirdly, we compute and maintain the eccentricities of all nodes. Given a graph, eccentricity measures the shortest distance from each node to its farthest node. Existing eccentricity computation algorithms are not scalable enough to handle real large networks. Our solution optimizes existing eccentricity computation algorithms on their bottlenecks — one node eccentricity computation and the upper/lower bounds update — based on a line of original insights; it also provides the first algorithm on maintaining the eccentricities of a dynamic graph without recomputing the eccentricity distribution upon each edge update. Extensive empirical studies validate the efficiency of our techniques.
Please use this identifier to cite or link to this item: