Finding top-k similar graphs in graph databases

Zhu, Y; Qin, L; Yu, JX; Cheng, H

Finding top-k similar graphs in graph databases

Zhu, Y Qin, L

Yu, JX

Cheng, H

Permalink

Publication Type:: Conference Proceeding
Citation:: ACM International Conference Proceeding Series, 2012, pp. 456 - 467
Issue Date:: 2012-07-10

Closed Access

	Filename	Description	Size
	2013002430OK.pdf		314.5 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhu, Y	en_US
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Yu, JX https://orcid.org/0000-0002-9738-827X	en_US
dc.contributor.author	Cheng, H	en_US
dc.date.issued	2012-07-10	en_US
dc.identifier.citation	ACM International Conference Proceeding Series, 2012, pp. 456 - 467	en_US
dc.identifier.isbn	9781450307901	en_US
dc.identifier.uri	http://hdl.handle.net/10453/28336
dc.description.abstract	Querying similar graphs in graph databases has been widely studied in graph query processing in recent years. Existing works mainly focus on subgraph similarity search and supergraph similarity search. In this paper, we study the problem of finding top-k graphs in a graph database that are most similar to a query graph. This problem has many applications, such as image retrieval and chemical compound structure search. Regarding the similarity measure, feature based and kernel based similarity measures have been used in the literature. But such measures are rough and may lose the connectivity information among substructures. In this paper, we introduce a new similarity measure based on the maximum common subgraph (MCS) of two graphs. We show that this measure can better capture the common and different structures of two graphs. Since computing the MCS of two graphs is NP-hard, we propose an algorithm to answer the top-k graph similarity query using two distance lower bounds with different computational costs, in order to reduce the number of MCS computations. We further introduce an indexing technique, which can better make use of the triangle property of similarities among graphs in the database to get tighter lower bounds. Three different indexing methods are proposed with different tradeoffs between pruning power and construction cost. We conducted extensive performance studies on large real datasets to evaluate the performance of our approaches. © 2012 ACM.	en_US
dc.relation.ispartof	ACM International Conference Proceeding Series	en_US
dc.relation.isbasedon	10.1145/2247596.2247650	en_US
dc.title	Finding top-k similar graphs in graph databases	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
dc.location.activity	Berlin, Germany	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Querying similar graphs in graph databases has been widely studied in graph query processing in recent years. Existing works mainly focus on subgraph similarity search and supergraph similarity search. In this paper, we study the problem of finding top-k graphs in a graph database that are most similar to a query graph. This problem has many applications, such as image retrieval and chemical compound structure search. Regarding the similarity measure, feature based and kernel based similarity measures have been used in the literature. But such measures are rough and may lose the connectivity information among substructures. In this paper, we introduce a new similarity measure based on the maximum common subgraph (MCS) of two graphs. We show that this measure can better capture the common and different structures of two graphs. Since computing the MCS of two graphs is NP-hard, we propose an algorithm to answer the top-k graph similarity query using two distance lower bounds with different computational costs, in order to reduce the number of MCS computations. We further introduce an indexing technique, which can better make use of the triangle property of similarities among graphs in the database to get tighter lower bounds. Three different indexing methods are proposed with different tradeoffs between pruning power and construction cost. We conducted extensive performance studies on large real datasets to evaluate the performance of our approaches. © 2012 ACM.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/28336