Semantic Pooling for Complex Event Analysis in Untrimmed Videos

Publication Type:
Journal Article
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (8), pp. 1617 - 1632
Issue Date:
Filename Description Size
07565615.pdfPublished Version683.17 kB
Adobe PDF
Full metadata record
© 1979-2012 IEEE. Pooling plays an important role in generating a discriminative video representation. In this paper, we propose a new semantic pooling approach for challenging event analysis tasks (e.g., event detection, recognition, and recounting) in long untrimmed Internet videos, especially when only a few shots/segments are relevant to the event of interest while many other shots are irrelevant or even misleading. The commonly adopted pooling strategies aggregate the shots indifferently in one way or another, resulting in a great loss of information. Instead, in this work we first define a novel notion of semantic saliency that assesses the relevance of each shot with the event of interest. We then prioritize the shots according to their saliency scores since shots that are semantically more salient are expected to contribute more to the final event analysis. Next, we propose a new isotonic regularizer that is able to exploit the constructed semantic ordering information. The resulting nearly-isotonic support vector machine classifier exhibits higher discriminative power in event analysis tasks. Computationally, we develop an efficient implementation using the proximal gradient algorithm, and we prove new and closed-form proximal steps. We conduct extensive experiments on three real-world video datasets and achieve promising improvements.
Please use this identifier to cite or link to this item: