Perceptual Attributes Optimization for Multivideo Summarization

Publication Type:
Journal Article
IEEE Transactions on Cybernetics, 2016, 46 (12), pp. 2991 - 3003
Issue Date:
Filename Description Size
07346444.pdfPublished Version2.99 MB
Adobe PDF
Full metadata record
© 2016 IEEE. Nowadays, many consumer videos are captured by portable devices such as iPhone. Different from constrained videos that are produced by professionals, e.g., those for broadcast, summarizing multiple handheld videos from a same scenery is a challenging task. This is because: 1) these videos have dramatic semantic and style variances, making it difficult to extract the representative key frames; 2) the handheld videos are with different degrees of shakiness, but existing summarization techniques cannot alleviate this problem adaptively; and 3) it is difficult to develop a quality model that evaluates a video summary, due to the subjectiveness of video quality assessment. To solve these problems, we propose perceptual multiattribute optimization which jointly refines multiple perceptual attributes (i.e., video aesthetics, coherence, and stability) in a multivideo summarization process. In particular, a weakly supervised learning framework is designed to discover the semantically important regions in each frame. Then, a few key frames are selected based on their contributions to cover the multivideo semantics. Thereafter, a probabilistic model is proposed to dynamically fit the key frames into an aesthetically pleasing video summary, wherein its frames are stabilized adaptively. Experiments on consumer videos taken from sceneries throughout the world demonstrate the descriptiveness, aesthetics, coherence, and stability of the generated summary.
Please use this identifier to cite or link to this item: