Summarizing probabilistic frequent patterns: A fast approach

Liu, C; Chen, L; Zhang, C

Summarizing probabilistic frequent patterns: A fast approach

Liu, C Chen, L

Zhang, C

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, Part F128815 pp. 527 - 535
Issue Date:: 2013-08-11

Closed Access

	Filename	Description	Size
	2012003109OK.pdf		2.02 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Liu, C	en_US
dc.contributor.author	Chen, L https://orcid.org/0000-0002-6468-5729	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2013-08-11	en_US
dc.identifier.citation	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, Part F128815 pp. 527 - 535	en_US
dc.identifier.isbn	9781450321747	en_US
dc.identifier.uri	http://hdl.handle.net/10453/27529
dc.description.abstract	Copyright © 2013 ACM. Mining probabilistic frequent patterns from uncertain data has received a great deal of attention in recent years due to the wide applications. However, probabilistic frequent pattern mining suffers from the problem that an exponential number of result patterns are generated, which seriously hinders further evaluation and analysis. In this paper, we focus on the problem of mining probabilistic representative frequent patterns (P-RFP), which is the minimal set of patterns with adequately high probability to represent all frequent patterns. Observing the bottleneck in checking whether a pattern can probabilistically represent another, which involves the computation of a joint probability of the supports of two patterns, we introduce a novel approximation of the joint probability with both theoretical and empirical proofs. Based on the approximation, we propose an Approximate P-RFP Mining (APM) algorithm, which effectively and efficiently compresses the set of probabilistic frequent patterns. To our knowledge, this is the first attempt to analyze the relationship between two probabilistic frequent patterns through an approximate approach. Our experiments on both synthetic and real-world datasets demonstrate that the APM algorithm accelerates P-RFP mining dramatically, orders of magnitudes faster than an exact solution. Moreover, the error rate of APM is guaranteed to be very small when the database contains hundreds transactions, which further affirms APM is a practical solution for summarizing probabilistic frequent patterns.	en_US
dc.relation	http://purl.org/au-research/grants/arc/LP120100566
dc.relation.ispartof	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining	en_US
dc.relation.isbasedon	10.1145/2487575.2487618	en_US
dc.title	Summarizing probabilistic frequent patterns: A fast approach	en_US
dc.type	Conference Proceeding
utslib.citation.volume	Part F128815	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
dc.location.activity	Chicago, Illinois USA	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	Part F128815	en_US

Abstract:

Copyright © 2013 ACM. Mining probabilistic frequent patterns from uncertain data has received a great deal of attention in recent years due to the wide applications. However, probabilistic frequent pattern mining suffers from the problem that an exponential number of result patterns are generated, which seriously hinders further evaluation and analysis. In this paper, we focus on the problem of mining probabilistic representative frequent patterns (P-RFP), which is the minimal set of patterns with adequately high probability to represent all frequent patterns. Observing the bottleneck in checking whether a pattern can probabilistically represent another, which involves the computation of a joint probability of the supports of two patterns, we introduce a novel approximation of the joint probability with both theoretical and empirical proofs. Based on the approximation, we propose an Approximate P-RFP Mining (APM) algorithm, which effectively and efficiently compresses the set of probabilistic frequent patterns. To our knowledge, this is the first attempt to analyze the relationship between two probabilistic frequent patterns through an approximate approach. Our experiments on both synthetic and real-world datasets demonstrate that the APM algorithm accelerates P-RFP mining dramatically, orders of magnitudes faster than an exact solution. Moreover, the error rate of APM is guaranteed to be very small when the database contains hundreds transactions, which further affirms APM is a practical solution for summarizing probabilistic frequent patterns.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/27529