Optimal Enumeration: Efficient Top-k Tree Matching

VLDB Endowment
Publication Type:
Conference Proceeding
Proceedings of the VLDB Endowment, 2015, 8 (5), pp. 533 - 544
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail[2015 PVLDB] Optimal Enumeration - Efficient Top-k Tree Matching.pdfPublished Version284.88 kB
Adobe PDF
Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number of matches in a graph; this may not only confuse users by providing too many results but also lead to high computational costs. In this paper, we study the problem of top-k tree pattern matching; that is, given a rooted tree T, compute its top-k matches in a directed graph G based on the twig-pattern matching semantics. We firstly present a novel and optimal enumeration paradigm based on the principle of Lawler’s procedure. We show that our enumeration algorithm runs in O(nT + log k) time in each round where nT is the number of nodes in T. Considering that the time complexity to output a match of T is O(nT ) and nT ≥ log k in practice, our enumeration technique is optimal. Moreover, the cost of generating top-1 match of T in our algorithm is O(mR) where mR is the number of edges in the transitive closure of a data graph G involving all relevant nodes to T. O(mR) is also optimal in the worst case without preknowledge of G. Consequently, our algorithm is optimal with the running time O(mR +k(nT +log k)) in contrast to the time complexity O(mR log k+knT (log k+dT )) of the existing technique where dT is the maximal node degree in T. Secondly, a novel priority based access technique is proposed, which greatly reduces the number of edges accessed and results in a significant performance improvement. Finally, we apply our techniques to the general form of top-k graph pattern matching problem (i.e., query is a graph) to improve the existing techniques. Comprehensive empirical studies demonstrate that our techniques may improve the existing techniques by orders of magnitude.
Please use this identifier to cite or link to this item: