Computing structural statistics by keywords in databases

Qin, L; Yu, JX; Chang, L

Computing structural statistics by keywords in databases

Qin, L

Yu, JX

Chang, L

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Knowledge and Data Engineering, 2012, 24 (10), pp. 1731 - 1746
Issue Date:: 2012-08-30

Closed Access

	Filename	Description	Size
	2013005180OK.pdf		2.24 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Qin, L https://orcid.org/0000-0001-6068-5062	en_US
dc.contributor.author	Yu, JX https://orcid.org/0000-0002-9738-827X	en_US
dc.contributor.author	Chang, L	en_US
dc.date.issued	2012-08-30	en_US
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2012, 24 (10), pp. 1731 - 1746	en_US
dc.identifier.issn	1041-4347	en_US
dc.identifier.uri	http://hdl.handle.net/10453/28324
dc.description.abstract	Keyword search in RDBs has been extensively studied in recent years. The existing studies focused on finding all or top-k interconnected tuple-structures that contain keywords. In reality, the number of such interconnected tuple-structures for a keyword query can be large. It becomes very difficult for users to obtain any valuable information more than individual interconnected tuple-structures. Also, it becomes challenging to provide a similar mechanism like group-&-aggregate for those interconnected tuple-structures. In this paper, we study computing structural statistics keyword queries by extending the group-&-aggregate framework. We consider an RDB as a large directed graph where nodes represent tuples, and edges represent the links among tuples. Instead of using tuples as a member in a group, we consider rooted subgraphs. Such a rooted subgraph represents an interconnected tuple-structure among tuples and some of the tuples contain keywords. The dimensions of the rooted subgraphs are determined by dimensional keywords in a data driven fashion. Two rooted subgraphs are grouped into the same group if they are isomorphic based on the dimensions or in other words the dimensional keywords. The scores of the rooted subgraphs are computed by a user-given score function if the rooted subgraphs contain some of general keywords. Here, the general keywords are used to compute scores rather than determining dimensions. The aggregates are computed using an sql aggregate function for every group based on the scores computed. We give our motivation using a real data set. We propose new approaches to compute structural statistics keyword queries, perform extensive performance studies using two large real data sets and a large synthetic data set, and confirm the effectiveness and efficiency of our approach. © 2012 IEEE.	en_US
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	en_US
dc.relation.isbasedon	10.1109/TKDE.2012.78	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Computing structural statistics by keywords in databases	en_US
dc.type	Journal Article
utslib.citation.volume	10	en_US
utslib.citation.volume	24	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	10	en_US
pubs.publication-status	Published	en_US
pubs.volume	24	en_US

Abstract:

Keyword search in RDBs has been extensively studied in recent years. The existing studies focused on finding all or top-k interconnected tuple-structures that contain keywords. In reality, the number of such interconnected tuple-structures for a keyword query can be large. It becomes very difficult for users to obtain any valuable information more than individual interconnected tuple-structures. Also, it becomes challenging to provide a similar mechanism like group-&-aggregate for those interconnected tuple-structures. In this paper, we study computing structural statistics keyword queries by extending the group-&-aggregate framework. We consider an RDB as a large directed graph where nodes represent tuples, and edges represent the links among tuples. Instead of using tuples as a member in a group, we consider rooted subgraphs. Such a rooted subgraph represents an interconnected tuple-structure among tuples and some of the tuples contain keywords. The dimensions of the rooted subgraphs are determined by dimensional keywords in a data driven fashion. Two rooted subgraphs are grouped into the same group if they are isomorphic based on the dimensions or in other words the dimensional keywords. The scores of the rooted subgraphs are computed by a user-given score function if the rooted subgraphs contain some of general keywords. Here, the general keywords are used to compute scores rather than determining dimensions. The aggregates are computed using an sql aggregate function for every group based on the scores computed. We give our motivation using a real data set. We propose new approaches to compute structural statistics keyword queries, perform extensive performance studies using two large real data sets and a large synthetic data set, and confirm the effectiveness and efficiency of our approach. © 2012 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/28324