Discovering associations in very large databases by approximating

Zhang, S; Zhang, C

Discovering associations in very large databases by approximating

Zhang, S Zhang, C

Permalink

Publication Type:: Journal Article
Citation:: Acta Cybernetica, 2003, 16 (1), pp. 155 - 177
Issue Date:: 2003-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (2.07 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, S	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2003-01-01	en_US
dc.identifier.citation	Acta Cybernetica, 2003, 16 (1), pp. 155 - 177	en_US
dc.identifier.issn	0324-721X	en_US
dc.identifier.uri	http://hdl.handle.net/10453/5789
dc.description.abstract	Mining association rules has posed great challenge to the research community. Despite efforts in designing fast and efficient mining algorithms, it remains a time consuming process for very large databases. In this paper, we adopt a slightly different approach to this problem, which can mine approximate association rules quickly. By considering the database as a set of records that are randomly appended, we can apply the central limit theorem to estimate the size of a random subset of the database, and discover both positive and negative association rules by generating all possible useful itemsets from the random subset. However, because of approximation errors, it is possible for some valid rules to be missed, while other invalid rules may be generated. To deal with this problem, we adopt a two phase approach. First, we discover all promising approximate rules from a random sample of the database. Second, these approximate results are used as heuristic information in an efficient algorithm that requires only one-pass of the database to validate rules that have support and confidence close to the desired support and confidence values. We evaluated the proposed technique, and our experimental results demonstrate that the approach is efficient and promising.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP0343109
dc.relation.ispartof	Acta Cybernetica	en_US
dc.subject.classification	Computation Theory & Mathematics	en_US
dc.title	Discovering associations in very large databases by approximating	en_US
dc.type	Journal Article
utslib.citation.volume	1	en_US
utslib.citation.volume	16	en_US
utslib.for	0802 Computation Theory and Mathematics	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.issue	1	en_US
pubs.publication-status	Published	en_US
pubs.volume	16	en_US

Abstract:

Mining association rules has posed great challenge to the research community. Despite efforts in designing fast and efficient mining algorithms, it remains a time consuming process for very large databases. In this paper, we adopt a slightly different approach to this problem, which can mine approximate association rules quickly. By considering the database as a set of records that are randomly appended, we can apply the central limit theorem to estimate the size of a random subset of the database, and discover both positive and negative association rules by generating all possible useful itemsets from the random subset. However, because of approximation errors, it is possible for some valid rules to be missed, while other invalid rules may be generated. To deal with this problem, we adopt a two phase approach. First, we discover all promising approximate rules from a random sample of the database. Second, these approximate results are used as heuristic information in an efficient algorithm that requires only one-pass of the database to validate rules that have support and confidence close to the desired support and confidence values. We evaluated the proposed technique, and our experimental results demonstrate that the approach is efficient and promising.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/5789