We are not equally negative: Fine-grained labeling for multimedia event detection

Publication Type:
Conference Proceeding
MM 2013 - Proceedings of the 2013 ACM Multimedia Conference, 2013, pp. 293 - 302
Issue Date:
Filename Description Size
wearenot.pdfPublished version1.71 MB
Adobe PDF
Full metadata record
Multimedia event detection (MED) is an effective technique for video indexing and retrieval. Current classifier training for MED treats the negative videos equally. However, many negative videos may resemble the positive videos in different degrees. Intuitively, we may capture more informative cues from the negative videos if we assign them fine-grained labels, thus benefiting the classifier learning. Aiming for this, we use a statistical method on both the positive and negative examples to get the decisive attributes of a specific event. Based on these decisive attributes, we assign the fine-grained labels to negative examples to treat them differently for more effective exploitation. The resulting fine-grained labels may be not accurate enough to characterize the negative videos. Hence, we propose to jointly optimize the fine-grained labels with the knowledge from the visual features and the attributes representations, which brings mutual reciprocality. Our model obtains two kinds of classifiers, one from the attributes and one from the features, which incorporate the informative cues from the finegrained labels. The outputs of both classifiers on the testing videos are fused for detection. Extensive experiments on the challenging TRECVID MED 2012 development set have validated the efficacy of our proposed approach. Copyright © 2013 ACM.
Please use this identifier to cite or link to this item: