Visual phraselet: Refining spatial constraints for large scale image search

Publication Type:
Journal Article
IEEE Signal Processing Letters, 2013, 20 (4), pp. 391 - 394
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
06471751.pdfPublished Version843.34 kB
Adobe PDF
The Bag-of-Words (BoW) model is prone to the deficiency of spatial constraints among visual words. The state of the art methods encode spatial information via visual phrases. However, these methods discard the spatial context among visual phrases instead. To address the problem, this letter introduces a novel visual concept, the Visual Phraselet, as a kind of similarity measurement between images. The visual phraselet refers to the spatial consistent group of visual phrases. In a simple yet effective manner, visual phraselet filters out false visual phrase matches, and is much more discriminative than both visual word and visual phrase. To boost the discovery of visual phraselets, we apply the soft quantization scheme. Our method is evaluated through extensive experiments on three benchmark datasets (Oxford 5 K, Paris 6 K and Flickr 1 M). We report significant improvements as large as 54.6% over the baseline approach, thus validating the concept of visual phraselet. © 1994-2012 IEEE.
Please use this identifier to cite or link to this item: