Violence detection based on spatio-temporal feature and fisher vector

Publication Type:
Conference Proceeding
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, 11256 LNCS pp. 180 - 190
Issue Date:
Full metadata record
© Springer Nature Switzerland AG 2018. A novel framework based on local spatio-temporal features and a Bag-of-Words (BoW) model is proposed for violence detection. The framework utilizes Dense Trajectories (DT) and MPEG flow video descriptor (MF) as feature descriptors and employs Fisher Vector (FV) in feature coding. DT and MF algorithms are more descriptive and robust, because they are combinations of various feature descriptors, which describe trajectory shape, appearance, motion and motion boundary, respectively. FV is applied to transform low level features to high level features. FV method preserves much information, because not only the affiliations of descriptors are found in the codebook, but also the first and second order statistics are used to represent videos. Some tricks, that PCA, K-means++ and codebook size, are used to improve the final performance of video classification. In comprehensive consideration of accuracy, speed and application scenarios, the proposed method for violence detection is analysed. Experimental results show that the proposed approach outperforms the state-of-the-art approaches for violence detection in both crowd scenes and non-crowd scenes.
Please use this identifier to cite or link to this item: