Multi-frame feature-fusion-based model for violence detection

Publisher:
Springer Science and Business Media LLC
Publication Type:
Journal Article
Citation:
Visual Computer, 2020, pp. 1-17
Issue Date:
2020-01-01
Filename Description Size
Draft Journal v3_SH_20190608.pdfSubmitted version1.2 MB
Adobe PDF
Full metadata record
© 2020, Springer-Verlag GmbH Germany, part of Springer Nature. Human behavior detection is essential for public safety and monitoring. However, in human-based surveillance systems, it requires continuous human attention and observation, which is a difficult task. Detection of violent human behavior using autonomous surveillance systems is of critical importance for uninterrupted video surveillance. In this paper, we propose a novel method to detect fights or violent actions based on learning both the spatial and temporal features from equally spaced sequential frames of a video. Multi-level features for two sequential frames, extracted from the convolutional neural network’s top and bottom layers, are combined using the proposed feature fusion method to take into account the motion information. We also proposed Wide-Dense Residual Block to learn these combined spatial features from the two input frames. These learned features are then concatenated and fed to long short-term memory units for capturing temporal dependencies. The feature fusion method and use of additional wide-dense residual blocks enable the network to learn combined features from the input frames effectively and yields better accuracy results. Experimental results evaluated on four publicly available datasets: HockeyFight, Movies, ViolentFlow and BEHAVE show the superior performance of the proposed model in comparison with the state-of-the-art methods.
Please use this identifier to cite or link to this item: