SFGN: Representing the sequence with one super frame for video person re-identification

Pan, X; Luo, H; Jiang, W; Zhang, J; Gu, J; Li, P

SFGN: Representing the sequence with one super frame for video person re-identification

Pan, X Luo, H Jiang, W Zhang, J Gu, J Li, P

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Knowledge-Based Systems, 2022, 249, pp. 108884
Issue Date:: 2022-08-05

Closed Access

	Filename	Description	Size
	1-s2.0-S095070512200421X-main.pdf	Published version	1.43 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Pan, X
dc.contributor.author	Luo, H
dc.contributor.author	Jiang, W
dc.contributor.author	Zhang, J
dc.contributor.author	Gu, J
dc.contributor.author	Li, P https://orcid.org/0000-0003-1809-2137
dc.date.accessioned	2023-07-09T20:34:36Z
dc.date.available	2023-07-09T20:34:36Z
dc.date.issued	2022-08-05
dc.identifier.citation	Knowledge-Based Systems, 2022, 249, pp. 108884
dc.identifier.issn	0950-7051
dc.identifier.uri	http://hdl.handle.net/10453/171373
dc.description.abstract	Video-based person re-identification (V-Re-ID) is more robust than image-based person re-identification (I-Re-ID) because of the additional temporal information. However, the high storage overhead of video sequences largely stems the applications of V-Re-ID. To reduce the storage overhead, we propose to represent each video sequence with only one frame. However, directly picking one frame from each sequence will reduce the performance dramatically. Thus, we propose a brand-new framework called super frame generation network (SFGN), which can encode the spatial–temporal information of a video sequence into a generated frame, which is called “super frame” to distinguish from the directly picked “key frame”. To achieve super frames of high visual quality and representation ability, we carefully design the specific-frame-feature fused skip-connection generator (SFSG). SFSG takes the role of a feature encoder and the co-trained image model can be seen as the corresponding feature decoder. To reduce the information loss in the encoding–decoding process, we further propose the feature recovery loss (FRL). To the best of our knowledge, we are the first to propose and relieve this issue. Extensive experiments on Mars, iLIDS-VID, and PRID2011 show that the proposed SFGN can generate super frames of high visual quality and representation ability. For the code, please visit the project website:.
dc.language	en
dc.publisher	Elsevier
dc.relation.ispartof	Knowledge-Based Systems
dc.relation.isbasedon	10.1016/j.knosys.2022.108884
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	08 Information and Computing Sciences, 15 Commerce, Management, Tourism and Services, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	4602 Artificial intelligence
dc.subject.classification	4605 Data management and data science
dc.subject.classification	4611 Machine learning
dc.title	SFGN: Representing the sequence with one super frame for video person re-identification
dc.type	Journal Article
utslib.citation.volume	249
utslib.for	08 Information and Computing Sciences
utslib.for	15 Commerce, Management, Tourism and Services
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2023-07-09T20:34:34Z
pubs.publication-status	Published
pubs.volume	249

Abstract:

Video-based person re-identification (V-Re-ID) is more robust than image-based person re-identification (I-Re-ID) because of the additional temporal information. However, the high storage overhead of video sequences largely stems the applications of V-Re-ID. To reduce the storage overhead, we propose to represent each video sequence with only one frame. However, directly picking one frame from each sequence will reduce the performance dramatically. Thus, we propose a brand-new framework called super frame generation network (SFGN), which can encode the spatial–temporal information of a video sequence into a generated frame, which is called “super frame” to distinguish from the directly picked “key frame”. To achieve super frames of high visual quality and representation ability, we carefully design the specific-frame-feature fused skip-connection generator (SFSG). SFSG takes the role of a feature encoder and the co-trained image model can be seen as the corresponding feature decoder. To reduce the information loss in the encoding–decoding process, we further propose the feature recovery loss (FRL). To the best of our knowledge, we are the first to propose and relieve this issue. Extensive experiments on Mars, iLIDS-VID, and PRID2011 show that the proposed SFGN can generate super frames of high visual quality and representation ability. For the code, please visit the project website:.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/171373