I/O efficient approximate nearest neighbour search based on learned functions

Li, M; Zhang, Y; Sun, Y; Wang, W; Tsang, IW; Lin, X

I/O efficient approximate nearest neighbour search based on learned functions

Li, M

Zhang, Y Sun, Y Wang, W Tsang, IW Lin, X

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: Proceedings - International Conference on Data Engineering, 2020, 2020-April, pp. 289-300
Issue Date:: 2020-04-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted ManuscriptAdobe PDF (965 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Li, M https://orcid.org/0000-0003-3565-1180
dc.contributor.author	Zhang, Y
dc.contributor.author	Sun, Y
dc.contributor.author	Wang, W
dc.contributor.author	Tsang, IW
dc.contributor.author	Lin, X
dc.date	2020-04-20
dc.date.accessioned	2020-07-27T15:27:23Z
dc.date.available	2020-07-27T15:27:23Z
dc.date.issued	2020-04-01
dc.identifier.citation	Proceedings - International Conference on Data Engineering, 2020, 2020-April, pp. 289-300
dc.identifier.isbn	9781728129037
dc.identifier.issn	1084-4627
dc.identifier.uri	http://hdl.handle.net/10453/141874
dc.description.abstract	© 2020 IEEE. Approximate nearest neighbour search (ANNS) in high dimensional space is a fundamental problem in many applications, such as multimedia database, computer vision and information retrieval. Among many solutions, data-sensitive hashing-based methods are effective to this problem, yet few of them are designed for external storage scenarios and hence do not optimized for I/O efficiency during the query processing. In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os. The proposed index consists of several lists of point IDs, ordered by values that are obtained by learned hashing (i.e., mapping) functions on each corresponding data point. The functions are learned from the data and approximately preserve the order in the high-dimensional space. We consider two instantiations of the functions (linear and non-linear), both learned from the data with novel objective functions. We also develop an I/O efficient ANNS framework based on the index. Comprehensive experiments on six benchmark datasets show that our proposed methods with learned index structure perform much better than the state-of-the-art external memory-based ANNS methods in terms of I/O efficiency and accuracy.
dc.language	en
dc.publisher	IEEE
dc.relation	http://purl.org/au-research/grants/arc/LP150100671
dc.relation	http://purl.org/au-research/grants/arc/FT170100128
dc.relation	http://purl.org/au-research/grants/arc/DP180100106
dc.relation	http://purl.org/au-research/grants/arc/DP180103096
dc.relation.ispartof	Proceedings - International Conference on Data Engineering
dc.relation.ispartof	2020 IEEE 36th International Conference on Data Engineering (ICDE)
dc.relation.isbasedon	10.1109/ICDE48307.2020.00032
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.title	I/O efficient approximate nearest neighbour search based on learned functions
dc.type	Conference Proceeding
utslib.citation.volume	2020-April
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	open_access	*
dc.date.updated	2020-07-27T15:27:16Z
pubs.finish-date	2020-04-24
pubs.publication-status	Published
pubs.start-date	2020-04-20
pubs.volume	2020-April

Abstract:

© 2020 IEEE. Approximate nearest neighbour search (ANNS) in high dimensional space is a fundamental problem in many applications, such as multimedia database, computer vision and information retrieval. Among many solutions, data-sensitive hashing-based methods are effective to this problem, yet few of them are designed for external storage scenarios and hence do not optimized for I/O efficiency during the query processing. In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os. The proposed index consists of several lists of point IDs, ordered by values that are obtained by learned hashing (i.e., mapping) functions on each corresponding data point. The functions are learned from the data and approximately preserve the order in the high-dimensional space. We consider two instantiations of the functions (linear and non-linear), both learned from the data with novel objective functions. We also develop an I/O efficient ANNS framework based on the index. Comprehensive experiments on six benchmark datasets show that our proposed methods with learned index structure perform much better than the state-of-the-art external memory-based ANNS methods in terms of I/O efficiency and accuracy.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/141874