K-Farthest-Neighbors-Based Concept Boundary Determination for Support Vector Data Description

Publication Type:
Conference Proceeding
Proceedings of the 19th ACM International Conference on Information and Knowledge Management & Co-Located Workshops, 2010, pp. 1701 - 1704
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2010001626OK.pdf739.76 kB
Adobe PDF
Support vector data description (SVDD) is very useful for oneclass classification. However, it incurs high time complexity in handling large scale data. In this paper, we propose a novel and efficient method, named K-Farthest-Neighbors-based Concept Boundary Detection (KFN-CBD for short), to improve the SVDD learning efficiency on large datasets. This work is motivated by the observation that SVDD classifier is determined by support vectors (SVs), and removing the non-support vectors (non-SVs) will not change the classifier but will reduce computational costs. Our approach consists of two steps. In the first step, we propose the K-farthest-neighbors method to identify the samples around the hyper-sphere surface, which are more likely to be SVs. At the same time, a new tree search strategy of M-tree is presented to speed up the K-farthest neighbor query. In the second step, the non-SVs are eliminated from the training set, and only the identified boundary samples are used to train the SVDD classifier. By removing the non-SVs, the training time of SVDD can be substantially reduced. Extensive experiments have shown that KFN-CBDachieves around 6 times speedup compared to the standard SVDD, and obtains the comparable classification quality as the entire dataset used.
Please use this identifier to cite or link to this item: