USpan: An efficient algorithm for mining high utility sequential patterns

Yin, J; Zheng, Z; Cao, L

USpan: An efficient algorithm for mining high utility sequential patterns

Yin, J Zheng, Z Cao, L

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 660 - 668
Issue Date:: 2012-09-14

Closed Access

	Filename	Description	Size
	2012001213OK.pdf	Published version	1.47 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yin, J	en_US
dc.contributor.author	Zheng, Z	en_US
dc.contributor.author	Cao, L https://orcid.org/0000-0003-1562-9429	en_US
dc.date.issued	2012-09-14	en_US
dc.identifier.citation	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 660 - 668	en_US
dc.identifier.isbn	9781450314626	en_US
dc.identifier.uri	http://hdl.handle.net/10453/30869
dc.description.abstract	Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behavior analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. In frequent pattern mining, a recent effort has been to incorporate utility into the pattern selection framework, so that high utility (frequent or infrequent) patterns are mined which address typical business concerns such as dollar value associated with each pattern. In this paper, we incorporate utility into sequential pattern mining, and a generic framework for high utility sequence mining is defined. An efficient algorithm, USpan, is presented to mine for high utility sequential patterns. In USpan, we introduce the lexicographic quantitative sequence tree to extract the complete set of high utility sequences and design concatenation mechanisms for calculating the utility of a node and its children with two effective pruning strategies. Substantial experiments on both synthetic and real datasets show that USpan efficiently identifies high utility sequences from large scale data with very low minimum utility. © 2012 ACM.	en_US
dc.relation.ispartof	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining	en_US
dc.relation.isbasedon	10.1145/2339530.2339636	en_US
dc.title	USpan: An efficient algorithm for mining high utility sequential patterns	en_US
dc.type	Conference Proceeding
utslib.for	080109 Pattern Recognition and Data Mining	en_US
dc.location.activity	beijing
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behavior analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. In frequent pattern mining, a recent effort has been to incorporate utility into the pattern selection framework, so that high utility (frequent or infrequent) patterns are mined which address typical business concerns such as dollar value associated with each pattern. In this paper, we incorporate utility into sequential pattern mining, and a generic framework for high utility sequence mining is defined. An efficient algorithm, USpan, is presented to mine for high utility sequential patterns. In USpan, we introduce the lexicographic quantitative sequence tree to extract the complete set of high utility sequences and design concatenation mechanisms for calculating the utility of a node and its children with two effective pruning strategies. Substantial experiments on both synthetic and real datasets show that USpan efficiently identifies high utility sequences from large scale data with very low minimum utility. © 2012 ACM.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/30869