Negative sequential pattern mining

Zheng, Z

Negative sequential pattern mining

Zheng, Z

Permalink

Publication Type:: Thesis
Issue Date:: 2012

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (69.56 kB)

Adobe PDF

Download thesisAdobe PDF (2.16 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zheng, Z
dc.date.accessioned	2013-04-30T06:15:49Z
dc.date.available	2013-04-30T06:15:49Z
dc.date.issued	2012
dc.identifier.uri	http://hdl.handle.net/10453/21884
dc.description	University of Technology, Sydney. Faculty of Engineering and Information Technology.	en_US
dc.description.abstract	Sequential pattern mining provides an important way to obtain special patterns from sequence data. It produces important insights on bioinformatics data, web-logs, customer transaction data, and so on. Different from traditional positive sequential pattern (PSP) mining, negative sequential pattern (NSP) mining takes negative itemsets into account besides positive ones. It would be more interesting in applications where non-occurring itemsets need to be considered. This thesis reports our previous and the latest research outcomes in this area. The contributions of the thesis are as following. • A comprehensive literature review of negative frequent pattern mining is described. • A general framework of the NSP mining is proposed. It can be used to describe the big picture of both PSP and NSP mining problems. • Three innovative algorithms are proposed to mine NSP efficiently. • Extensive experiments about the three algorithms on either synthetic or real-world datasets show that the proposed methods can find NSP efficiently. • A case study describes a real-life application on customer claims analysis in health insurance industry. Three algorithms of NSP mining are proposed in this thesis, listed as below: (1) The first algorithm Neg-GSP (Zheng, Zhao, Zuo & Cao 2009) is based on a PSP mining algorithm GSP (Srikant & Agrawal 1996). Neg-GSP deals with negative problem by introducing new methods of joining and generating candidates, which borrow ideas from GSP algorithm. And also, an effective pruning method to reduce the number of candidates is proposed as well. (2) The second one is a Genetic Algorithm based algorithm (Zheng, Zhao, Zuo & Cao 2010), which is called GA-NSP. It is proposed to find NSP with novel crossover and mutation operations, which are efficient at passing good genes on to next generations. An effective dynamic fitness function and a pruning method are also provided to improve performance. (3) The third algorithm e-NSP (Dong, Zheng, Cao, Zhao, Zhang, Li, Wei & Ou 2011) is based on the Set Theory. It mines NSP by only involving the identified PSP, without re-scanning the database. In this way, mining NSP does not require any additional database scans. It facilitates the existing PSP mining algorithms to mine NSP. It offers a new strategy for efficient mining of NSP. The results of extensive experiments about the three algorithms show that they can find NSP efficiently. They have good performance compared with some other existing NSP mining algorithms, such as PNSP (Hsueh, Lin & Chen 2008). If we compare the problem statements of the above three methods, Neg-GSP and GA-NSP share the same definitions, e-NSP uses stronger constraints since it requires clear boundary to follow the Set Theory. When comparing their performances, GA-NSP algorithm slightly outperforms Neg-GSP in terms of execution time, but it may miss some patterns in the complete result sets due to limitations of Genetic Algorithm. Apparently, e-NSP is the most efficient and effective one since it does not need to scan datasets to calculate the support of NSP. Although adding stronger constraints on e-NSP makes the search space much smaller than what it is under the normal definitions, it is still very practicable while being used in some real-life applications. Following that, NSP mining case studies coming from health insurance industry are introduced. Based on real-life customer claims datasets, we use the proposed NSP mining methods to find PSP and NSP on solving two business issues, one is in ancillary service over-service analysis, another is fraud claim detection. Both of the two case studies demonstrate the benefits gained from mining NSP.	en_US
dc.format	Thesis (PhD)	en_US
dc.language.iso	en	en_US
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/21884/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	au.edu.uts.lib/ppc
dc.title	Negative sequential pattern mining	en_US
dc.type	Thesis
utslib.copyright.status	open_access

Abstract:

Sequential pattern mining provides an important way to obtain special patterns from sequence data. It produces important insights on bioinformatics data, web-logs, customer transaction data, and so on. Different from traditional positive sequential pattern (PSP) mining, negative sequential pattern (NSP) mining takes negative itemsets into account besides positive ones. It would be more interesting in applications where non-occurring itemsets need to be considered. This thesis reports our previous and the latest research outcomes in this area. The contributions of the thesis are as following. • A comprehensive literature review of negative frequent pattern mining is described. • A general framework of the NSP mining is proposed. It can be used to describe the big picture of both PSP and NSP mining problems. • Three innovative algorithms are proposed to mine NSP efficiently. • Extensive experiments about the three algorithms on either synthetic or real-world datasets show that the proposed methods can find NSP efficiently. • A case study describes a real-life application on customer claims analysis in health insurance industry. Three algorithms of NSP mining are proposed in this thesis, listed as below: (1) The first algorithm Neg-GSP (Zheng, Zhao, Zuo & Cao 2009) is based on a PSP mining algorithm GSP (Srikant & Agrawal 1996). Neg-GSP deals with negative problem by introducing new methods of joining and generating candidates, which borrow ideas from GSP algorithm. And also, an effective pruning method to reduce the number of candidates is proposed as well. (2) The second one is a Genetic Algorithm based algorithm (Zheng, Zhao, Zuo & Cao 2010), which is called GA-NSP. It is proposed to find NSP with novel crossover and mutation operations, which are efficient at passing good genes on to next generations. An effective dynamic fitness function and a pruning method are also provided to improve performance. (3) The third algorithm e-NSP (Dong, Zheng, Cao, Zhao, Zhang, Li, Wei & Ou 2011) is based on the Set Theory. It mines NSP by only involving the identified PSP, without re-scanning the database. In this way, mining NSP does not require any additional database scans. It facilitates the existing PSP mining algorithms to mine NSP. It offers a new strategy for efficient mining of NSP. The results of extensive experiments about the three algorithms show that they can find NSP efficiently. They have good performance compared with some other existing NSP mining algorithms, such as PNSP (Hsueh, Lin & Chen 2008). If we compare the problem statements of the above three methods, Neg-GSP and GA-NSP share the same definitions, e-NSP uses stronger constraints since it requires clear boundary to follow the Set Theory. When comparing their performances, GA-NSP algorithm slightly outperforms Neg-GSP in terms of execution time, but it may miss some patterns in the complete result sets due to limitations of Genetic Algorithm. Apparently, e-NSP is the most efficient and effective one since it does not need to scan datasets to calculate the support of NSP. Although adding stronger constraints on e-NSP makes the search space much smaller than what it is under the normal definitions, it is still very practicable while being used in some real-life applications. Following that, NSP mining case studies coming from health insurance industry are introduced. Based on real-life customer claims datasets, we use the proposed NSP mining methods to find PSP and NSP on solving two business issues, one is in ancillary service over-service analysis, another is fraud claim detection. Both of the two case studies demonstrate the benefits gained from mining NSP.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/21884