Few-Shot Common-Object Reasoning Using Common-Centric Localization Network.

Publisher:
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:
Journal Article
Citation:
IEEE Trans Image Process, 2021, 30, pp. 4253-4262
Issue Date:
2021
Filename Description Size
Few-Shot_Common-Object_Reasoning_Using_Common-Centric_Localization_Network.pdfAccepted version2.52 MB
Adobe PDF
Full metadata record
In the few-shot common-localization task, given few support images without bounding box annotations at each episode, the goal is to localize the common object in the query image of unseen categories. The few-shot common-localization task involves common object reasoning from the given images, predicting the spatial locations of the object with different shapes, sizes, and orientations. In this work, we propose a common-centric localization (CCL) network for few-shot common-localization. The motivation of our common-centric localization network is to learn the common object features by dynamic feature relation reasoning via a graph convolutional network with conditional feature aggregation. First, we propose a local common object region generation pipeline to reduce background noises due to feature misalignment. Each support image predicts more accurate object spatial locations by replacing the query with the images in the support set. Second, we introduce a graph convolutional network with dynamic feature transformation to enforce the common object reasoning. To enhance the discriminability during feature matching and enable a better generalization in unseen scenarios, we leverage a conditional feature encoding function to alter visual features according to the input query adaptively. Third, we introduce a common-centric relation structure to model the correlation between the common features and the query image feature. The generated common features guide the query image feature towards a more common object-related representation. We evaluate our common-centric localization network on four datasets, i.e., CL-VOC-07, CL-VOC-12, CL-COCO, CL-VID. We obtain significant improvements compared to state-of-the-art. Our quantitative results confirm the effectiveness of our network.
Please use this identifier to cite or link to this item: