Perceptual Attributes Optimization for Multivideo Summarization

Nie, L; Hong, R; Zhang, L; Xia, Y; Tao, D; Sebe, N

Perceptual Attributes Optimization for Multivideo Summarization

Nie, L Hong, R Zhang, L Xia, Y Tao, D

Sebe, N

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Cybernetics, 2016, 46 (12), pp. 2991 - 3003
Issue Date:: 2016-12-01

Closed Access

	Filename	Description	Size
	07346444.pdf	Published Version	2.99 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Nie, L	en_US
dc.contributor.author	Hong, R	en_US
dc.contributor.author	Zhang, L	en_US
dc.contributor.author	Xia, Y	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.contributor.author	Sebe, N	en_US
dc.date.issued	2016-12-01	en_US
dc.identifier.citation	IEEE Transactions on Cybernetics, 2016, 46 (12), pp. 2991 - 3003	en_US
dc.identifier.issn	2168-2267	en_US
dc.identifier.uri	http://hdl.handle.net/10453/122682
dc.description.abstract	© 2016 IEEE. Nowadays, many consumer videos are captured by portable devices such as iPhone. Different from constrained videos that are produced by professionals, e.g., those for broadcast, summarizing multiple handheld videos from a same scenery is a challenging task. This is because: 1) these videos have dramatic semantic and style variances, making it difficult to extract the representative key frames; 2) the handheld videos are with different degrees of shakiness, but existing summarization techniques cannot alleviate this problem adaptively; and 3) it is difficult to develop a quality model that evaluates a video summary, due to the subjectiveness of video quality assessment. To solve these problems, we propose perceptual multiattribute optimization which jointly refines multiple perceptual attributes (i.e., video aesthetics, coherence, and stability) in a multivideo summarization process. In particular, a weakly supervised learning framework is designed to discover the semantically important regions in each frame. Then, a few key frames are selected based on their contributions to cover the multivideo semantics. Thereafter, a probabilistic model is proposed to dynamically fit the key frames into an aesthetically pleasing video summary, wherein its frames are stabilized adaptively. Experiments on consumer videos taken from sceneries throughout the world demonstrate the descriptiveness, aesthetics, coherence, and stability of the generated summary.	en_US
dc.relation.ispartof	IEEE Transactions on Cybernetics	en_US
dc.relation.isbasedon	10.1109/TCYB.2015.2493558	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Perceptual Attributes Optimization for Multivideo Summarization	en_US
dc.type	Journal Article
utslib.citation.volume	12	en_US
utslib.citation.volume	46	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0102 Applied Mathematics	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.issue	12	en_US
pubs.publication-status	Published	en_US
pubs.volume	46	en_US

Abstract:

© 2016 IEEE. Nowadays, many consumer videos are captured by portable devices such as iPhone. Different from constrained videos that are produced by professionals, e.g., those for broadcast, summarizing multiple handheld videos from a same scenery is a challenging task. This is because: 1) these videos have dramatic semantic and style variances, making it difficult to extract the representative key frames; 2) the handheld videos are with different degrees of shakiness, but existing summarization techniques cannot alleviate this problem adaptively; and 3) it is difficult to develop a quality model that evaluates a video summary, due to the subjectiveness of video quality assessment. To solve these problems, we propose perceptual multiattribute optimization which jointly refines multiple perceptual attributes (i.e., video aesthetics, coherence, and stability) in a multivideo summarization process. In particular, a weakly supervised learning framework is designed to discover the semantically important regions in each frame. Then, a few key frames are selected based on their contributions to cover the multivideo semantics. Thereafter, a probabilistic model is proposed to dynamically fit the key frames into an aesthetically pleasing video summary, wherein its frames are stabilized adaptively. Experiments on consumer videos taken from sceneries throughout the world demonstrate the descriptiveness, aesthetics, coherence, and stability of the generated summary.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/122682