A Cost-Effective LSH Filter for Fast Pairwise Mining

IEEE Computer Society
Publication Type:
Conference Proceeding
ICDM 2009, The Ninth IEEE International Conference on Data Mining, 2009, pp. 1088 - 1093
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2009001191OK.pdf882.3 kB
Adobe PDF
The pairwise mining problem is to discover pairwise objects having measures greater than the user-specified minimum threshold from a collection of objects. It is essential in a large variety of database and data-mining applications. Of late, there has been increasing interest in applying a Locality-Sensitive Hashing (LSH) scheme for pairwise mining. LSH-type methods have shown themselves to be simply implementable and capable of achieving significant performance gain in running time over most exact methods. However, the present LSH-type methods still suffer from some bottlenecks, such as the curse of threshold. In this paper, we proposed a novel LSHbased method, namely Cost-effective LSH filter (Ce-LSH for short), for pairwise mining. Compared with previous LSH-type methods, it uses a lower fixed number of LSH functions and is thus more cost-effective. Substantial experiments evidence that our method gives significant improvement in running time over existing LSH-type methods and some recently reported method based on upper-bound. Experimental results also indicate that it scales well even for a relatively low minimum threshold and for a fairly small miss ratio.
Please use this identifier to cite or link to this item: