Violent video detection based on MoSIFT feature and sparse coding

Publisher:
IEEE
Publication Type:
Conference Proceeding
Citation:
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
Issue Date:
2014-05-04
Full metadata record
Files in This Item:
Filename Description Size
v.pdfPublished version441.68 kB
Adobe PDF
To detect violence in a video, a common video description method is to apply local spatio-temporal description on the query video. Then, the low-level description is further summarized onto the high-level feature based on Bag-of-Words (BoW) model. However, traditional spatio-temporal descriptors are not discriminative enough. Moreover, BoW model roughly assigns each feature vector to only one visual word, therefore inevitably causing quantization error. To tackle the constrains, this paper employs Motion SIFT (MoSIFT) algorithm to extract the low-level description of a query video. To eliminate the feature noise, Kernel Density Estimation (KDE) is exploited for feature selection on the MoSIFT descriptor. In order to obtain the highly discriminative video feature, this paper adopts sparse coding scheme to further process the selected MoSIFTs. Encouraging experimental results are obtained based on two challenging datasets which record both crowded scenes and non-crowded scenes.
Please use this identifier to cite or link to this item: