Outlier detection in large high-dimensional data and its application in stock market surveillance

Luo, C

Outlier detection in large high-dimensional data and its application in stock market surveillance

Luo, C

Permalink

Publication Type:: Thesis
Issue Date:: 2011

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (3.37 MB)

Adobe PDF

Download thesisAdobe PDF (56.45 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Luo, C
dc.date.accessioned	2015-02-11T02:55:09Z
dc.date.available	2015-02-11T02:55:09Z
dc.date.issued	2011
dc.identifier.uri	http://hdl.handle.net/10453/33300
dc.description	University of Technology, Sydney. Faculty of Engineering and Information Technology.	en_US
dc.description.abstract	Outlier detection techniques play an important role in stock market surveillance that involves analysis of large volume of high-dimensional trading data. However, outlier detection in large high-dimensional data is very challenging and is not well addressed by existing techniques. Firstly, it is difficult to select useful and relevant features from high-dimensional data. Secondly, large high-dimensional data need more efficient algorithms. To attack the above issues brought by large high-dimensional data, this thesis presents two outlier detection models and one subspace clustering model. Firstly, an outlier mining model is proposed to detect the outliers from multiple complex stock market data. In order to improve the efficiency of outlier detection, a financial model is used to select the features to construct multiple datasets. This model is able to improve the precision of outlier mining on individual measurements. The experiments on real-world stock market data show that the proposed model is effective and outperforms traditional technologies. Secondly, in order to find relevant features automatically, an agent-based algorithm is proposed to discover subspace clusters in high dimensional data. Each data object is represented by an agent, and the agents move from one local environment to another to find optimal clusters in subspaces. Heuristic rules and objective functions are defined to guide the movements of agents, so that similar agents (data objects) go to one group. The experimental results show that our proposed agent-based subspace clustering algorithm performs better than existing subspace clustering methods on both F1 measure and Entropy. The running time of our algorithm is scalable with the size and dimensionality of data. Furthermore, an application of our technique to stock market surveillance demonstrates its effectiveness in real world applications. Finally, we propose a reference-based outlier detection model by agent-based subspace clustering. At first, agent-based subspace clustering is utilized to generate clusters in subspaces. After that, the centers of clusters, together with the corresponding subspaces, are used as references, and a reference-based model is employed to find outliers in relevant subspaces. The experimental results on real-world datasets prove that the proposed model is able to effectively and efficiently identify outliers in subspaces. In summary, this thesis research on outlier detection techniques on high-dimensional data and its application in stock market surveillance. The proposed models are novel and effective. They have shown their potentials in real business.	en_US
dc.format	Thesis (PhD)	en_US
dc.language.iso	en	en_US
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/33300/8/02whole.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	au.edu.uts.lib/ppc
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.title	Outlier detection in large high-dimensional data and its application in stock market surveillance	en_US
dc.type	Thesis
utslib.copyright.status	open_access

Abstract:

Outlier detection techniques play an important role in stock market surveillance that involves analysis of large volume of high-dimensional trading data. However, outlier detection in large high-dimensional data is very challenging and is not well addressed by existing techniques. Firstly, it is difficult to select useful and relevant features from high-dimensional data. Secondly, large high-dimensional data need more efficient algorithms. To attack the above issues brought by large high-dimensional data, this thesis presents two outlier detection models and one subspace clustering model. Firstly, an outlier mining model is proposed to detect the outliers from multiple complex stock market data. In order to improve the efficiency of outlier detection, a financial model is used to select the features to construct multiple datasets. This model is able to improve the precision of outlier mining on individual measurements. The experiments on real-world stock market data show that the proposed model is effective and outperforms traditional technologies. Secondly, in order to find relevant features automatically, an agent-based algorithm is proposed to discover subspace clusters in high dimensional data. Each data object is represented by an agent, and the agents move from one local environment to another to find optimal clusters in subspaces. Heuristic rules and objective functions are defined to guide the movements of agents, so that similar agents (data objects) go to one group. The experimental results show that our proposed agent-based subspace clustering algorithm performs better than existing subspace clustering methods on both F1 measure and Entropy. The running time of our algorithm is scalable with the size and dimensionality of data. Furthermore, an application of our technique to stock market surveillance demonstrates its effectiveness in real world applications. Finally, we propose a reference-based outlier detection model by agent-based subspace clustering. At first, agent-based subspace clustering is utilized to generate clusters in subspaces. After that, the centers of clusters, together with the corresponding subspaces, are used as references, and a reference-based model is employed to find outliers in relevant subspaces. The experimental results on real-world datasets prove that the proposed model is able to effectively and efficiently identify outliers in subspaces. In summary, this thesis research on outlier detection techniques on high-dimensional data and its application in stock market surveillance. The proposed models are novel and effective. They have shown their potentials in real business.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/33300