SRS: Solving c-approximate nearest neighbor queries in high dimensional Euclidean space with a tiny index

Publication Type:
Conference Proceeding
Proceedings of the VLDB Endowment, 2014, 8 (1), pp. 1 - 12
Issue Date:
Full metadata record
Nearest neighbor searches in high-dimensional space have many important applications in domains such as data mining, and multimedia databases. The problem is challenging due to the phenomenon called "curse of dimensionality". An alternative solution is to consider algorithms that returns a c-approximate nearest neighbor (c-ANN) with guaranteed probabilities. Locality Sensitive Hashing (LSH) is among the most widely adopted method, and it achieves high effciency both in theory and practice. However, it is known to require an extremely high amount of space for indexing, hence limiting its scalability. In this paper, we propose several surprisingly simple methods to answer c-ANN queries with theoretical guarantees requiring only a single tiny index. Our methods are highly exible and support a variety of functionalities, such as nd-ing the exact nearest neighbor with any given probability. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to 1 billion high-dimensional points on a single commodity PC. © 2014 VLDB Endowment.
Please use this identifier to cite or link to this item: