K-farthest-neighbors-based concept boundary determination for support vector data description

Xiao, Y; Liu, B; Cao, L

K-farthest-neighbors-based concept boundary determination for support vector data description

Xiao, Y Liu, B Cao, L

Permalink

Publication Type:: Conference Proceeding
Citation:: International Conference on Information and Knowledge Management, Proceedings, 2010, pp. 1701 - 1704
Issue Date:: 2010-12-01

Closed Access

	Filename	Description	Size
	2010001626OK.pdf		739.76 kB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xiao, Y	en_US
dc.contributor.author	Liu, B	en_US
dc.contributor.author	Cao, L https://orcid.org/0000-0003-1562-9429	en_US
dc.date.issued	2010-12-01	en_US
dc.identifier.citation	International Conference on Information and Knowledge Management, Proceedings, 2010, pp. 1701 - 1704	en_US
dc.identifier.isbn	9781450300995	en_US
dc.identifier.uri	http://hdl.handle.net/10453/16125
dc.description.abstract	Support vector data description (SVDD) is very useful for one-class classification. However, it incurs high time complexity in handling large scale data. In this paper, we propose a novel and efficient method, named K-Farthest-Neighbors-based Concept Boundary Detection (KFN-CBD for short), to improve the SVDD learning efficiency on large datasets. This work is motivated by the observation that SVDD classifier is determined by support vectors (SVs), and removing the non-support vectors (non-SVs) will not change the classifier but will reduce computational costs. Our approach consists of two steps. In the first step, we propose the K-farthest-neighbors method to identify the samples around the hyper-sphere surface, which are more likely to be SVs. At the same time, a new tree search strategy of M-tree is presented to speed up the K-farthest neighbor query. In the second step, the non-SVs are eliminated from the training set, and only the identified boundary samples are used to train the SVDD classifier. By removing the non-SVs, the training time of SVDD can be substantially reduced. Extensive experiments have shown that KFN-CBD achieves around 6 times speedup compared to the standard SVDD, and obtains the comparable classification quality as the entire dataset used. © 2010 ACM.	en_US
dc.relation.ispartof	International Conference on Information and Knowledge Management, Proceedings	en_US
dc.relation.isbasedon	10.1145/1871437.1871708	en_US
dc.title	K-farthest-neighbors-based concept boundary determination for support vector data description	en_US
dc.type	Conference Proceeding
utslib.for	080201 Analysis of Algorithms and Complexity	en_US
utslib.for	080109 Pattern Recognition and Data Mining	en_US
dc.location.activity	Toronto, Ontario, Canada,	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Support vector data description (SVDD) is very useful for one-class classification. However, it incurs high time complexity in handling large scale data. In this paper, we propose a novel and efficient method, named K-Farthest-Neighbors-based Concept Boundary Detection (KFN-CBD for short), to improve the SVDD learning efficiency on large datasets. This work is motivated by the observation that SVDD classifier is determined by support vectors (SVs), and removing the non-support vectors (non-SVs) will not change the classifier but will reduce computational costs. Our approach consists of two steps. In the first step, we propose the K-farthest-neighbors method to identify the samples around the hyper-sphere surface, which are more likely to be SVs. At the same time, a new tree search strategy of M-tree is presented to speed up the K-farthest neighbor query. In the second step, the non-SVs are eliminated from the training set, and only the identified boundary samples are used to train the SVDD classifier. By removing the non-SVs, the training time of SVDD can be substantially reduced. Extensive experiments have shown that KFN-CBD achieves around 6 times speedup compared to the standard SVDD, and obtains the comparable classification quality as the entire dataset used. © 2010 ACM.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/16125