Optimal Enumeration: Efficient Top-k Tree Matching

Publication Type:
Conference Proceeding
2015, 8 (5), pp. 533 - 544
Issue Date:
Full metadata record
Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number ofmatches in a graph; this may not only confuse users by providing too many results but also lead to high computational costs. In this paper, we study the problem of top-k tree pattern matching; that is, given a rooted tree T, compute its top-k matches in a directed graph G based on the twig-pattern matching semantics. We firstly present a novel and optimal enumeration paradigm based on the principle of Lawler's procedure. We show that our enumeration algorithm runs in O(n T + log k) time in each round where n T is the number of nodes in T. Considering that the time complexity to output a match of T is O(n T ) and n T = log k in practice, our enumeration technique is optimal. Moreover, the cost of generating top-1 match of T in our algorithm is O(m R ) where m R is the number of edges in the transitive closure of a data graph G involving all relevant nodes to T. O(m R ) is also optimal in the worst case without preknowledge of G. Consequently, our algorithm is optimal with the running time O(mR +k(n T +log k)) in contrast to the time complexity O(m R log k+kn T (log k+d T )) of the existing technique where d T is the maximal node degree in T. Secondly, a novel priority based access technique is proposed, which greatly reduces the number of edges accessed and results in a significant performance improvement. Finally, we apply our techniques to the general form of top-k graph pattern matching problem (i.e., query is a graph) to improve the existing techniques. Comprehensive empirical studies demonstrate that our techniques may improve the existing techniques by orders of magnitude.
Please use this identifier to cite or link to this item: