Selective value coupling learning for detecting outliers in high-dimensional categorical data

Pang, G; Xu, H; Cao, L; Zhao, W

Selective value coupling learning for detecting outliers in high-dimensional categorical data

Pang, G

Xu, H Cao, L

Zhao, W

Permalink

Publication Type:: Conference Proceeding
Citation:: International Conference on Information and Knowledge Management, Proceedings, 2017, Part F131841 pp. 807 - 816
Issue Date:: 2017-11-06

Closed Access

	Filename	Description	Size
	Pang-CIKM17.pdf	Published version	941.53 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Pang, G https://orcid.org/0000-0002-9877-2716	en_US
dc.contributor.author	Xu, H	en_US
dc.contributor.author	Cao, L https://orcid.org/0000-0003-1562-9429	en_US
dc.contributor.author	Zhao, W	en_US
dc.date.issued	2017-11-06	en_US
dc.identifier.citation	International Conference on Information and Knowledge Management, Proceedings, 2017, Part F131841 pp. 807 - 816	en_US
dc.identifier.isbn	9781450349185	en_US
dc.identifier.uri	http://hdl.handle.net/10453/127476
dc.description.abstract	© 2017 Association for Computing Machinery. This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP130102691
dc.relation.ispartof	International Conference on Information and Knowledge Management, Proceedings	en_US
dc.relation.isbasedon	10.1145/3132847.3132994	en_US
dc.title	Selective value coupling learning for detecting outliers in high-dimensional categorical data	en_US
dc.type	Conference Proceeding
utslib.citation.volume	Part F131841	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	Part F131841	en_US

Abstract:

© 2017 Association for Computing Machinery. This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/127476