Hierarchical Temporal Modeling with Mutual Distance Matching for Video Based Person Re-Identification

Li, P; Pan, P; Liu, P; Xu, M; Yang, Y

Hierarchical Temporal Modeling with Mutual Distance Matching for Video Based Person Re-Identification

Li, P

Pan, P Liu, P

Xu, M Yang, Y

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31, (2), pp. 503-511
Issue Date:: 2021-02-01

Closed Access

	Filename	Description	Size
	Hierarchical_Temporal_Modeling_With_Mutual_Distance_Matching_for_Video_Based_Person_Re-Identification.pdf	Published version	2.59 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, P https://orcid.org/0000-0003-1809-2137
dc.contributor.author	Pan, P
dc.contributor.author	Liu, P https://orcid.org/0000-0002-3170-3783
dc.contributor.author	Xu, M
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date.accessioned	2022-05-01T22:50:57Z
dc.date.available	2022-05-01T22:50:57Z
dc.date.issued	2021-02-01
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31, (2), pp. 503-511
dc.identifier.issn	1051-8215
dc.identifier.issn	1558-2205
dc.identifier.uri	http://hdl.handle.net/10453/156893
dc.description.abstract	Comparing to image-based person re-identification (re-ID) problems, video-based person re-ID can take advantage of more cues from appearance and temporal information, and therefore receives widespread attention recently. However, due to the different pose, occlusion, misalignment and multi-granularity in video sequences, those consequent inter-sequence variations and intra-sequence variations, inevitably makes the feature learning and matching in videos more difficult. Under this circumstance, it is necessary to design an effective discriminative representation learning mechanism, as well as a matching solution, to tackle these variations in video-based person re-ID. To this end, this paper introduces a multi-granularity temporal convolution network and a mutual distance matching measurement, aiming at alleviating the intra-sequence variation and the inter-sequence variation, respectively. Particularly, in the feature learning stage, we model different temporal granularities by hierarchically stacking temporal convolution blocks with different dilation factors. In the feature matching stage, we propose a clip-level probe-gallery mutual distance measurement and consider the most convincing clip pairs by top-k selection. We validate that our proposed method can achieve state-of-the-art results on three video-based person re-ID benchmarks, more than that, we conduct extensive ablation study to demonstrate conciseness and effectiveness of our method in video re-ID tasks.
dc.language	English
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation	http://purl.org/au-research/grants/arc/DP200100938
dc.relation.ispartof	IEEE Transactions on Circuits and Systems for Video Technology
dc.relation.isbasedon	10.1109/TCSVT.2020.2988034
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Hierarchical Temporal Modeling with Mutual Distance Matching for Video Based Person Re-Identification
dc.type	Journal Article
utslib.citation.volume	31
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2022-05-01T22:50:55Z
pubs.issue	2
pubs.publication-status	Published
pubs.volume	31
utslib.citation.issue	2

Abstract:

Comparing to image-based person re-identification (re-ID) problems, video-based person re-ID can take advantage of more cues from appearance and temporal information, and therefore receives widespread attention recently. However, due to the different pose, occlusion, misalignment and multi-granularity in video sequences, those consequent inter-sequence variations and intra-sequence variations, inevitably makes the feature learning and matching in videos more difficult. Under this circumstance, it is necessary to design an effective discriminative representation learning mechanism, as well as a matching solution, to tackle these variations in video-based person re-ID. To this end, this paper introduces a multi-granularity temporal convolution network and a mutual distance matching measurement, aiming at alleviating the intra-sequence variation and the inter-sequence variation, respectively. Particularly, in the feature learning stage, we model different temporal granularities by hierarchically stacking temporal convolution blocks with different dilation factors. In the feature matching stage, we propose a clip-level probe-gallery mutual distance measurement and consider the most convincing clip pairs by top-k selection. We validate that our proposed method can achieve state-of-the-art results on three video-based person re-ID benchmarks, more than that, we conduct extensive ablation study to demonstrate conciseness and effectiveness of our method in video re-ID tasks.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/156893