Supergraph Search in Graph Databases via Hierarchical Feature-Tree

Lyu, B; Qin, L; Lin, X; Chang, L; Yu, JX

Supergraph Search in Graph Databases via Hierarchical Feature-Tree

Lyu, B Qin, L

Lin, X Chang, L Yu, JX

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Knowledge and Data Engineering, 2019, 31 (2), pp. 385 - 400
Issue Date:: 2019-02-01

Closed Access

	Filename	Description	Size
	[2019 TKDE] Supergraph Search in Graph Databases via Hierarchical Feature-Tree.pdf	Published Version	2.18 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Lyu, B	en_US
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Lin, X	en_US
dc.contributor.author	Chang, L	en_US
dc.contributor.author	Yu, JX https://orcid.org/0000-0002-9738-827X	en_US
dc.date.issued	2019-02-01	en_US
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2019, 31 (2), pp. 385 - 400	en_US
dc.identifier.issn	1041-4347	en_US
dc.identifier.uri	http://hdl.handle.net/10453/131185
dc.description.abstract	© 1989-2012 IEEE. Supergraph search is a fundamental problem in graph databases that is widely applied in many application scenarios. Given a graph database and a query-graph, supergraph search retrieves all data-graphs contained in the query-graph from the graph database. Most existing solutions for supergraph search follow the pruning-and-verification framework, which prune false answers based on features in the pruning phase and perform subgraph isomorphism testings on the remaining graphs in the verification phase. However, they are not scalable to handle large-sized data-graphs and query-graphs due to three drawbacks. First, they rely on a frequent subgraph mining algorithm to select features which is expensive and cannot generate large features. Second, they require a costly verification phase. Third, they process features in a fixed order without considering their relationships to the query-graph. In this paper, we address the three drawbacks and propose new indexing and query processing algorithms. In indexing, we select features directly from the data-graphs without expensive frequent subgraph mining. The features form a feature-tree that contains all-sized features and both the cost sharing and pruning power of the features are considered. In query processing, we propose a new algorithm, where the order to process features is query-dependent by considering both the cost sharing and the pruning power. We explore two optimization strategies to further improve the algorithm efficiency. The first strategy applies a lightweight graph compression technique and the second strategy optimizes the inclusion of answers. We further introduce how to efficiently maintain the index incrementally when the graph database is updated dynamically. Moreover, we propose an approximation approach to significantly reduce the computational cost for large data-graphs and/or query-graphs while preserving a high result quality. Finally, we conduct extensive performance studies on two real large datasets to demonstrate the efficiency and effectiveness of our algorithms.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP160101513
dc.relation	http://purl.org/au-research/grants/arc/DP180103096
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	en_US
dc.relation.isbasedon	10.1109/TKDE.2018.2833124	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Supergraph Search in Graph Databases via Hierarchical Feature-Tree	en_US
dc.type	Journal Article
utslib.citation.volume	2	en_US
utslib.citation.volume	31	en_US
utslib.for	0802 Computation Theory and Mathematics	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	2	en_US
pubs.publication-status	Published	en_US
pubs.volume	31	en_US

Abstract:

© 1989-2012 IEEE. Supergraph search is a fundamental problem in graph databases that is widely applied in many application scenarios. Given a graph database and a query-graph, supergraph search retrieves all data-graphs contained in the query-graph from the graph database. Most existing solutions for supergraph search follow the pruning-and-verification framework, which prune false answers based on features in the pruning phase and perform subgraph isomorphism testings on the remaining graphs in the verification phase. However, they are not scalable to handle large-sized data-graphs and query-graphs due to three drawbacks. First, they rely on a frequent subgraph mining algorithm to select features which is expensive and cannot generate large features. Second, they require a costly verification phase. Third, they process features in a fixed order without considering their relationships to the query-graph. In this paper, we address the three drawbacks and propose new indexing and query processing algorithms. In indexing, we select features directly from the data-graphs without expensive frequent subgraph mining. The features form a feature-tree that contains all-sized features and both the cost sharing and pruning power of the features are considered. In query processing, we propose a new algorithm, where the order to process features is query-dependent by considering both the cost sharing and the pruning power. We explore two optimization strategies to further improve the algorithm efficiency. The first strategy applies a lightweight graph compression technique and the second strategy optimizes the inclusion of answers. We further introduce how to efficiently maintain the index incrementally when the graph database is updated dynamically. Moreover, we propose an approximation approach to significantly reduce the computational cost for large data-graphs and/or query-graphs while preserving a high result quality. Finally, we conduct extensive performance studies on two real large datasets to demonstrate the efficiency and effectiveness of our algorithms.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/131185