Optimal Enumeration: Efficient Top-k Tree Matching
- VLDB Endowment
- Publication Type:
- Conference Proceeding
- Proceedings of the VLDB Endowment, 2015, 8 (5), pp. 533 - 544
- Issue Date:
Files in This Item:
|[2015 PVLDB] Optimal Enumeration - Efficient Top-k Tree Matching.pdf||Published Version||284.88 kB|
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
The embargo period expires on 31 Jan 2016
Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number of matches in a graph; this may not only confuse users by providing too many results but also lead to high computational costs. In this paper, we study the problem of top-k tree pattern matching; that is, given a rooted tree T, compute its top-k matches in a directed graph G based on the twig-pattern matching semantics. We firstly present a novel and optimal enumeration paradigm based on the principle of Lawler’s procedure. We show that our enumeration algorithm runs in O(nT + log k) time in each round where nT is the number of nodes in T. Considering that the time complexity to output a match of T is O(nT ) and nT ≥ log k in practice, our enumeration technique is optimal. Moreover, the cost of generating top-1 match of T in our algorithm is O(mR) where mR is the number of edges in the transitive closure of a data graph G involving all relevant nodes to T. O(mR) is also optimal in the worst case without preknowledge of G. Consequently, our algorithm is optimal with the running time O(mR +k(nT +log k)) in contrast to the time complexity O(mR log k+knT (log k+dT )) of the existing technique where dT is the maximal node degree in T. Secondly, a novel priority based access technique is proposed, which greatly reduces the number of edges accessed and results in a significant performance improvement. Finally, we apply our techniques to the general form of top-k graph pattern matching problem (i.e., query is a graph) to improve the existing techniques. Comprehensive empirical studies demonstrate that our techniques may improve the existing techniques by orders of magnitude.
Please use this identifier to cite or link to this item: