I/O cost minimization: Reachability queries processing over massive graphs

Publication Type:
Conference Proceeding
ACM International Conference Proceeding Series, 2012, pp. 468 - 479
Issue Date:
Filename Description Size
Thumbnail2013002423OK.pdf393.41 kB
Adobe PDF
Full metadata record
Given a directed graph G, a reachability query (u, v) asks whether there exists a path from a node u to a node v in G. The existing studies support reachability queries using indexing techniques, where both the graph and the index are required to reside in main memory. However, they cannot handle reachability queries on massive graphs, when the graph and the index cannot be entirely held in memory because of the high I/O cost. In this paper, we focus on how to minimize the I/O cost when answering reachability queries on massive graphs that cannot reside entirely in memory. First, we propose a new Yes-Label scheme, as a complement of the No-Label used in GRAIL [23], to reduce the number of intermediate results generated. Second, we show how to minimize the number of I/Os using a heap-on-disk data structure when traversing a graph. We also propose new methods to partition the heap-on-disk, in order to ensure that only sequential I/Os are performed. Third, we analyze our approaches and show how to extend our approaches to answer multiple reachability queries effectively. Finally, we conducted extensive performance studies on both large synthetic and large real graphs, and confirm the efficiency of our approaches. © 2012 ACM.
Please use this identifier to cite or link to this item: