A Stacking Ensemble Approach for Supervised Video Summarization

An, Y; Zhao, S; Zhang, G

A Stacking Ensemble Approach for Supervised Video Summarization

An, Y Zhao, S Zhang, G

Permalink

Publisher:: ACM
Publication Type:: Conference Proceeding
Citation:: VSIP '22: Proceedings of the 2022 4th International Conference on Video, Signal and Image Processing, 2022, pp. 122-127
Issue Date:: 2022-11-25

Closed Access

	Filename	Description	Size
	3577164.3577183.pdf	Published version	1.7 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	An, Y
dc.contributor.author	Zhao, S
dc.contributor.author	Zhang, G https://orcid.org/0000-0003-4521-542X
dc.date	2022-11-25
dc.date.accessioned	2023-07-23T01:50:10Z
dc.date.available	2023-07-23T01:50:10Z
dc.date.issued	2022-11-25
dc.identifier.citation	VSIP '22: Proceedings of the 2022 4th International Conference on Video, Signal and Image Processing, 2022, pp. 122-127
dc.identifier.isbn	9781450397810
dc.identifier.uri	http://hdl.handle.net/10453/171605
dc.description.abstract	Existing video summarization methods are classified into either shot-level or frame-level methods, which are individually used in a general way. This paper investigates the underlying complementarity between the frame-level and shot-level methods, and a stacking ensemble approach is proposed for supervised video summarization. Firstly, we build up a stacking model to predict both the key frame probabilities and the temporal interest segments simultaneously. The two components are then combined via soft decision fusion to obtain the final scores of each frame in the video. A joint loss function is proposed for the model training. The ablation experimental results show that the proposed method outperforms both the two corresponding individual method. Furthermore, extensive experimental results on two benchmark datasets shows its superior performance in comparison with the state-of-the-art methods.
dc.language	en
dc.publisher	ACM
dc.relation.ispartof	VSIP '22: Proceedings of the 2022 4th International Conference on Video, Signal and Image Processing
dc.relation.ispartof	International Conference on Video, Signal and Image Processing
dc.relation.ispartofseries	ACM International Conference Proceeding Series
dc.relation.isbasedon	10.1145/3577164.3577183
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	A Stacking Ensemble Approach for Supervised Video Summarization
dc.type	Conference Proceeding
utslib.location.activity	Shanghai, China
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2023-07-23T01:50:08Z
pubs.finish-date	2022-11-27
pubs.place-of-publication	USA
pubs.publication-status	Published
pubs.start-date	2022-11-25
dc.location	USA

Abstract:

Existing video summarization methods are classified into either shot-level or frame-level methods, which are individually used in a general way. This paper investigates the underlying complementarity between the frame-level and shot-level methods, and a stacking ensemble approach is proposed for supervised video summarization. Firstly, we build up a stacking model to predict both the key frame probabilities and the temporal interest segments simultaneously. The two components are then combined via soft decision fusion to obtain the final scores of each frame in the video. A joint loss function is proposed for the model training. The ablation experimental results show that the proposed method outperforms both the two corresponding individual method. Furthermore, extensive experimental results on two benchmark datasets shows its superior performance in comparison with the state-of-the-art methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/171605