Semantic Pooling for Complex Event Analysis in Untrimmed Videos
- Publication Type:
- Journal Article
- IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (8), pp. 1617 - 1632
- Issue Date:
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
© 1979-2012 IEEE. Pooling plays an important role in generating a discriminative video representation. In this paper, we propose a new semantic pooling approach for challenging event analysis tasks (e.g., event detection, recognition, and recounting) in long untrimmed Internet videos, especially when only a few shots/segments are relevant to the event of interest while many other shots are irrelevant or even misleading. The commonly adopted pooling strategies aggregate the shots indifferently in one way or another, resulting in a great loss of information. Instead, in this work we first define a novel notion of semantic saliency that assesses the relevance of each shot with the event of interest. We then prioritize the shots according to their saliency scores since shots that are semantically more salient are expected to contribute more to the final event analysis. Next, we propose a new isotonic regularizer that is able to exploit the constructed semantic ordering information. The resulting nearly-isotonic support vector machine classifier exhibits higher discriminative power in event analysis tasks. Computationally, we develop an efficient implementation using the proximal gradient algorithm, and we prove new and closed-form proximal steps. We conduct extensive experiments on three real-world video datasets and achieve promising improvements.
Please use this identifier to cite or link to this item: