Mining frequent serial episodes over uncertain sequence data

Wan, L; Chen, L; Zhang, C

Mining frequent serial episodes over uncertain sequence data

Wan, L Chen, L

Zhang, C

Permalink

Publication Type:: Conference Proceeding
Citation:: ACM International Conference Proceeding Series, 2013, pp. 215 - 226
Issue Date:: 2013-05-02

Closed Access

	Filename	Description	Size
	2012001912OK.pdf		1.43 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wan, L	en_US
dc.contributor.author	Chen, L https://orcid.org/0000-0002-6468-5729	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2013-05-02	en_US
dc.identifier.citation	ACM International Conference Proceeding Series, 2013, pp. 215 - 226	en_US
dc.identifier.isbn	9781450315975	en_US
dc.identifier.uri	http://hdl.handle.net/10453/26312
dc.description.abstract	Data uncertainty has posed many unique challenges to nearly all types of data mining tasks, creating a need for uncertain data mining. In this paper, we focus on the particular task of mining probabilistic frequent serial episodes (P-FSEs) from uncertain sequence data, which applies to many real applications including sensor readings as well as customer purchase sequences. We first define the notion of P-FSEs, based on the frequentness probabilities of serial episodes under possible world semantics. To discover P-FSEs over an uncertain sequence, we propose: 1) an exact approach that computes the accurate frequentness probabilities of episodes; 2) an approximate approach that approximates the frequency of episodes using probability models; 3) an optimized approach that efficiently prunes a candidate episode by estimating an upper bound of its frequentness probability using approximation techniques. We conduct extensive experiments to evaluate the performance of the developed data mining algorithms. Our experimental results show that: 1) while existing research demonstrates that approximate approaches are orders of magnitudes faster than exact approaches, for P-FSE mining, the efficiency improvement of the approximate approach over the exact approach is marginal; 2) although it has been recognized that the normal distribution based approximation approach is fairly accurate when the data set is large enough, for P-FSE mining, the binomial distribution based approximation achieves higher accuracy when the the number of episode occurrences is limited; 3) the optimized approach clearly outperforms the other two approaches in terms of the runtime, and achieves very high accuracy. © 2013 ACM.	en_US
dc.relation.ispartof	ACM International Conference Proceeding Series	en_US
dc.relation.isbasedon	10.1145/2452376.2452403	en_US
dc.title	Mining frequent serial episodes over uncertain sequence data	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
dc.location.activity	Genoa, Italy	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Data uncertainty has posed many unique challenges to nearly all types of data mining tasks, creating a need for uncertain data mining. In this paper, we focus on the particular task of mining probabilistic frequent serial episodes (P-FSEs) from uncertain sequence data, which applies to many real applications including sensor readings as well as customer purchase sequences. We first define the notion of P-FSEs, based on the frequentness probabilities of serial episodes under possible world semantics. To discover P-FSEs over an uncertain sequence, we propose: 1) an exact approach that computes the accurate frequentness probabilities of episodes; 2) an approximate approach that approximates the frequency of episodes using probability models; 3) an optimized approach that efficiently prunes a candidate episode by estimating an upper bound of its frequentness probability using approximation techniques. We conduct extensive experiments to evaluate the performance of the developed data mining algorithms. Our experimental results show that: 1) while existing research demonstrates that approximate approaches are orders of magnitudes faster than exact approaches, for P-FSE mining, the efficiency improvement of the approximate approach over the exact approach is marginal; 2) although it has been recognized that the normal distribution based approximation approach is fairly accurate when the data set is large enough, for P-FSE mining, the binomial distribution based approximation achieves higher accuracy when the the number of episode occurrences is limited; 3) the optimized approach clearly outperforms the other two approaches in terms of the runtime, and achieves very high accuracy. © 2013 ACM.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/26312