They are not equally reliable: Semantic event search using differentiated concept classifiers

Publication Type:
Conference Proceeding
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, 2016-January pp. 1884 - 1893
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
cvpr16a.pdfAccepted Manuscript version2.74 MB
Adobe PDF
Complex event detection on unconstrained Internet videos has seen much progress in recent years. However, state-of-the-art performance degrades dramatically when the number of positive training exemplars falls short. Since label acquisition is costly, laborious, and time-consuming, there is a real need to consider the much more challenging semantic event search problem, where no example video is given. In this paper, we present a state-of-the-art event search system without any example videos. Relying on the key observation that events (e.g. dog show) are usually compositions of multiple mid-level concepts (e.g. "dog," "theater," and "dog jumping"), we first train a skip-gram model to measure the relevance of each concept with the event of interest. The relevant concept classifiers then cast votes on the test videos but their reliability, due to lack of labeled training videos, has been largely unaddressed. We propose to combine the concept classifiers based on a principled estimate of their accuracy on the unlabeled test videos. A novel warping technique is proposed to improve the performance and an efficient highly-scalable algorithm is provided to quickly solve the resulting optimization. We conduct extensive experiments on the latest TRECVID MEDTest 2014, MEDTest 2013 and CCV datasets, and achieve state-of-the-art performances.
Please use this identifier to cite or link to this item: