Connected substructure similarity search

Publisher:
ACM
Publication Type:
Conference Proceeding
Citation:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 2010, pp. 903 - 914
Issue Date:
2010-01
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2013005464OK.pdf571.36 kB
Adobe PDF
Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment between two graphs is NP-complete. Hence, existing techniques adopt the filtering-and-verification framework with the focus on developing effective and efficient techniques to remove non-promising graphs. Nevertheless, existing filtering techniques may be still unable to effectively remove many "low" quality candidates. To resolve this, in this paper we propose a novel indexing technique, GrafD-Index, to index graphs according to their "distances" to features. We characterize a tight condition under which the distance-based triangular inequality holds. We then develop lower and upper bounding techniques that exploit the GrafD-Index to (1) prune non-promising graphs and (2) include graphs whose similarities are guaranteed to exceed the given similarity threshold. Considering that the verification phase is not well studied and plays the dominant role in the whole process, we devise efficient algorithms to verify candidates. A comprehensive experiment using real datasets demonstrates that our proposed methods significantly outperform existing methods.
Please use this identifier to cite or link to this item: