E-NSP: Efficient negative sequential pattern mining

Cao, L; Dong, X; Zheng, Z

E-NSP: Efficient negative sequential pattern mining

Cao, L

Dong, X Zheng, Z

Permalink

Publication Type:: Journal Article
Citation:: Artificial Intelligence, 2016, 235 pp. 156 - 182
Issue Date:: 2016-06-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (1.02 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Cao, L https://orcid.org/0000-0003-1562-9429	en_US
dc.contributor.author	Dong, X	en_US
dc.contributor.author	Zheng, Z	en_US
dc.date.issued	2016-06-01	en_US
dc.identifier.citation	Artificial Intelligence, 2016, 235 pp. 156 - 182	en_US
dc.identifier.issn	0004-3702	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121699
dc.description.abstract	© 2016 The Authors. Published by Elsevier B.V. As an important tool for behavior informatics, negative sequential patterns (NSP) (such as missing medical treatments) are critical and sometimes much more informative than positive sequential patterns (PSP) (e.g. using a medical service) in many intelligent systems and applications such as intelligent transport systems, healthcare and risk management, as they often involve non-occurring but interesting behaviors. However, discovering NSP is much more difficult than identifying PSP due to the significant problem complexity caused by non-occurring elements, high computational cost and huge search space in calculating negative sequential candidates (NSC). So far, the problem has not been formalized well, and very few approaches have been proposed to mine for specific types of NSP, which rely on database re-scans after identifying PSP in order to calculate the NSC supports. This has been shown to be very inefficient or even impractical, since the NSC search space is usually huge. This paper proposes a very innovative and efficient theoretical framework: Set theory-based NSP mining (ST-NSP), and a corresponding algorithm, e-NSP, to efficiently identify NSP by involving only the identified PSP, without re-scanning the database. Accordingly, negative containment is first defined to determine whether a data sequence contains a negative sequence based on set theory. Second, an efficient approach is proposed to convert the negative containment problem to a positive containment problem. The NSC supports are then calculated based only on the corresponding PSP. This not only avoids the need for additional database scans, but also enables the use of existing PSP mining algorithms to mine for NSP. Finally, a simple but efficient strategy is proposed to generate NSC. Theoretical analyses show that e-NSP performs particularly well on datasets with a small number of elements in a sequence, a large number of itemsets and low minimum support. e-NSP is compared with two currently available NSP mining algorithms via intensive experiments on three synthetic and six real-life datasets from aspects including data characteristics, computational costs and scalability. e-NSP is tens to thousands of times faster than baseline approaches, and offers a sound and effective approach for efficient mining of NSP in large scale datasets by directly using existing PSP mining algorithms.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP130102691
dc.relation.ispartof	Artificial Intelligence	en_US
dc.relation.isbasedon	10.1016/j.artint.2016.03.001	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	E-NSP: Efficient negative sequential pattern mining	en_US
dc.type	Journal Article
utslib.citation.volume	235	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0802 Computation Theory and Mathematics	en_US
utslib.for	1702 Cognitive Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	open_access	*
pubs.publication-status	Published	en_US
pubs.volume	235	en_US

Abstract:

© 2016 The Authors. Published by Elsevier B.V. As an important tool for behavior informatics, negative sequential patterns (NSP) (such as missing medical treatments) are critical and sometimes much more informative than positive sequential patterns (PSP) (e.g. using a medical service) in many intelligent systems and applications such as intelligent transport systems, healthcare and risk management, as they often involve non-occurring but interesting behaviors. However, discovering NSP is much more difficult than identifying PSP due to the significant problem complexity caused by non-occurring elements, high computational cost and huge search space in calculating negative sequential candidates (NSC). So far, the problem has not been formalized well, and very few approaches have been proposed to mine for specific types of NSP, which rely on database re-scans after identifying PSP in order to calculate the NSC supports. This has been shown to be very inefficient or even impractical, since the NSC search space is usually huge. This paper proposes a very innovative and efficient theoretical framework: Set theory-based NSP mining (ST-NSP), and a corresponding algorithm, e-NSP, to efficiently identify NSP by involving only the identified PSP, without re-scanning the database. Accordingly, negative containment is first defined to determine whether a data sequence contains a negative sequence based on set theory. Second, an efficient approach is proposed to convert the negative containment problem to a positive containment problem. The NSC supports are then calculated based only on the corresponding PSP. This not only avoids the need for additional database scans, but also enables the use of existing PSP mining algorithms to mine for NSP. Finally, a simple but efficient strategy is proposed to generate NSC. Theoretical analyses show that e-NSP performs particularly well on datasets with a small number of elements in a sequence, a large number of itemsets and low minimum support. e-NSP is compared with two currently available NSP mining algorithms via intensive experiments on three synthetic and six real-life datasets from aspects including data characteristics, computational costs and scalability. e-NSP is tens to thousands of times faster than baseline approaches, and offers a sound and effective approach for efficient mining of NSP in large scale datasets by directly using existing PSP mining algorithms.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121699