Mining Inter-Video Proposal Relations for Video Object Detection
- Publisher:
- Springer Nature
- Publication Type:
- Conference Proceeding
- Citation:
- Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020, 12366 LNCS, pp. 431-446
- Issue Date:
- 2020-01-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
Mining_Inter_Video_Proposal_Relations_for_Video_Object_Detection.pdf | Submitted version | 2.19 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
Recent studies have shown that, context aggregating information from proposals in different frames can clearly enhance the performance of video object detection. However, these approaches mainly exploit the intra-proposal relation within single video, while ignoring the intra-proposal relation among different videos, which can provide important discriminative cues for recognizing confusing objects. To address the limitation, we propose a novel Inter-Video Proposal Relation module. Based on a concise multi-level triplet selection scheme, this module can learn effective object representations via modeling relations of hard proposals among different videos. Moreover, we design a Hierarchical Video Relation Network (HVR-Net), by integrating intra-video and inter-video proposal relations in a hierarchical fashion. This design can progressively exploit both intra and inter contexts to boost video object detection. We examine our method on the large-scale video object detection benchmark, i.e., ImageNet VID, where HVR-Net achieves the SOTA results. Codes and models are available at https://github.com/youthHan/HVRNet.
Please use this identifier to cite or link to this item: