Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning
- Publisher:
- Institute of Electrical and Electronics Engineers (IEEE)
- Publication Type:
- Conference Proceeding
- Citation:
- Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, 2022-June, pp. 8709-8720
- Issue Date:
- 2022-01-01
In Progress
Filename | Description | Size | |||
---|---|---|---|---|---|
Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning.pdf | Accepted version | 7.7 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is being processed and is not currently available.
Our target is to learn visual correspondence from unlabeled videos. We develop Liir, a locality-aware inter-and intra-video reconstruction method that fills in three missing pieces, i.e., instance discrimination, location awareness, and spatial compactness, of self-supervised correspondence learning puzzle. First, instead of most existing efforts focusing on intra-video self-supervision only, we exploit cross-video affinities as extra negative samples within a unified, inter-and intra-video reconstruction scheme. This enables instance discriminative representation learning by contrasting desired intra-video pixel association against negative inter-video correspondence. Second, we merge position information into correspondence matching, and design a position shifting strategy to remove the side-effect of position encoding during inter-video affinity computation, making our Liir location-sensitive. Third, to make full use of the spatial continuity nature of video data, we impose a compactness-based constraint on correspondence matching, yielding more sparse and reliable solutions. The learned representation surpasses self-supervised state-of-the-arts on label propagation tasks including objects, semantic parts, and keypoints.
Please use this identifier to cite or link to this item: