I/O Cost Minimization: Reachability Queries Processing over Massive Graphs

Publisher:
ACM
Publication Type:
Conference Proceeding
Citation:
Proceedings of the 15th International Conference on Extending Database Technology, 2012, pp. 468 - 479
Issue Date:
2012-01
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2013002423OK.pdf393.41 kB
Adobe PDF
Given a directed graph G, a reachability query (u, v) asks whether there exists a path from a node u to a node v in G. The existing studies support reachability queries using indexing techniques, where both the graph and the index are required to reside in main memory. However, they cannot handle reachability queries on massive graphs, when the graph and the index cannot be entirely held in memory because of the high I/O cost. In this paper, we focus on how to minimize the I/O cost when answering reachability queries on massive graphs that cannot reside entirely in memory. First, we propose a new Yes-Label scheme, as a complement of the No-Label used in GRAIL [23], to reduce the number of intermediate results generated. Second, we show how to minimize the number of I/Os using a heap-on-disk data structure when traversing a graph. We also propose new methods to partition the heap-on-disk, in order to ensure that only sequential I/Os are performed. Third, we analyze our approaches and show how to extend our approaches to answer multiple reachability queries effectively. Finally, we conducted extensive performance studies on both large synthetic and large real graphs, and confirm the efficiency of our approaches.
Please use this identifier to cite or link to this item: