K-farthest-neighbors-based concept boundary determination for support vector data description

Publication Type:
Conference Proceeding
International Conference on Information and Knowledge Management, Proceedings, 2010, pp. 1701 - 1704
Issue Date:
Filename Description Size
Thumbnail2010001626OK.pdf739.76 kB
Adobe PDF
Full metadata record
Support vector data description (SVDD) is very useful for one-class classification. However, it incurs high time complexity in handling large scale data. In this paper, we propose a novel and efficient method, named K-Farthest-Neighbors-based Concept Boundary Detection (KFN-CBD for short), to improve the SVDD learning efficiency on large datasets. This work is motivated by the observation that SVDD classifier is determined by support vectors (SVs), and removing the non-support vectors (non-SVs) will not change the classifier but will reduce computational costs. Our approach consists of two steps. In the first step, we propose the K-farthest-neighbors method to identify the samples around the hyper-sphere surface, which are more likely to be SVs. At the same time, a new tree search strategy of M-tree is presented to speed up the K-farthest neighbor query. In the second step, the non-SVs are eliminated from the training set, and only the identified boundary samples are used to train the SVDD classifier. By removing the non-SVs, the training time of SVDD can be substantially reduced. Extensive experiments have shown that KFN-CBD achieves around 6 times speedup compared to the standard SVDD, and obtains the comparable classification quality as the entire dataset used. © 2010 ACM.
Please use this identifier to cite or link to this item: