Mining Inter-Video Proposal Relations for Video Object Detection

Han, M; Wang, Y; Chang, X; Qiao, Y

Mining Inter-Video Proposal Relations for Video Object Detection

Han, M Wang, Y Chang, X

Qiao, Y

Permalink

Publisher:: Springer Nature
Publication Type:: Conference Proceeding
Citation:: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020, 12366 LNCS, pp. 431-446
Issue Date:: 2020-01-01

Closed Access

	Filename	Description	Size
	Mining_Inter_Video_Proposal_Relations_for_Video_Object_Detection.pdf	Submitted version	2.19 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Han, M
dc.contributor.author	Wang, Y
dc.contributor.author	Chang, X https://orcid.org/0000-0002-7778-8807
dc.contributor.author	Qiao, Y
dc.date.accessioned	2023-03-31T10:30:20Z
dc.date.available	2023-03-31T10:30:20Z
dc.date.issued	2020-01-01
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020, 12366 LNCS, pp. 431-446
dc.identifier.isbn	9783030585884
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.uri	http://hdl.handle.net/10453/168989
dc.description.abstract	Recent studies have shown that, context aggregating information from proposals in different frames can clearly enhance the performance of video object detection. However, these approaches mainly exploit the intra-proposal relation within single video, while ignoring the intra-proposal relation among different videos, which can provide important discriminative cues for recognizing confusing objects. To address the limitation, we propose a novel Inter-Video Proposal Relation module. Based on a concise multi-level triplet selection scheme, this module can learn effective object representations via modeling relations of hard proposals among different videos. Moreover, we design a Hierarchical Video Relation Network (HVR-Net), by integrating intra-video and inter-video proposal relations in a hierarchical fashion. This design can progressively exploit both intra and inter contexts to boost video object detection. We examine our method on the large-scale video object detection benchmark, i.e., ImageNet VID, where HVR-Net achieves the SOTA results. Codes and models are available at https://github.com/youthHan/HVRNet.
dc.language	en
dc.publisher	Springer Nature
dc.relation	http://purl.org/au-research/grants/arc/DE190100626
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.relation.ispartofseries	Lecture Notes in Computer Science
dc.relation.isbasedon	10.1007/978-3-030-58589-1_26
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Mining Inter-Video Proposal Relations for Video Object Detection
dc.type	Conference Proceeding
utslib.citation.volume	12366 LNCS
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	closed_access	*
dc.date.updated	2023-03-31T10:30:19Z
pubs.publication-status	Published
pubs.volume	12366 LNCS

Abstract:

Recent studies have shown that, context aggregating information from proposals in different frames can clearly enhance the performance of video object detection. However, these approaches mainly exploit the intra-proposal relation within single video, while ignoring the intra-proposal relation among different videos, which can provide important discriminative cues for recognizing confusing objects. To address the limitation, we propose a novel Inter-Video Proposal Relation module. Based on a concise multi-level triplet selection scheme, this module can learn effective object representations via modeling relations of hard proposals among different videos. Moreover, we design a Hierarchical Video Relation Network (HVR-Net), by integrating intra-video and inter-video proposal relations in a hierarchical fashion. This design can progressively exploit both intra and inter contexts to boost video object detection. We examine our method on the large-scale video object detection benchmark, i.e., ImageNet VID, where HVR-Net achieves the SOTA results. Codes and models are available at https://github.com/youthHan/HVRNet.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/168989