A Review for Weighted MinHash Algorithms (Extended abstract)

Publisher:
IEEE
Publication Type:
Conference Proceeding
Citation:
2023 IEEE 39th International Conference on Data Engineering (ICDE), 2023, 2023-April, pp. 2553-2573
Issue Date:
2023-07-26
Filename Description Size
A_Review_for_Weighted_MinHash_Algorithms.pdfPublished version2.14 MB
Adobe PDF
Full metadata record
Data similarity computation is a fundamental research topic which underpins many high level applications based on similarity measures However the exact similarity computation has become daunting in large scale real world scenarios Currently MinHash is a popular technique for efficiently estimating the Jaccard similarity of binary sets and furthermore weighted MinHash is utilized to estimate the generalized Jaccard similarity of weighted sets This review focuses on categorizing and discussing the existing works of weighted MinHash algorithms Also we have developed a Python toolbox for the algorithms and released it in our github
Please use this identifier to cite or link to this item: