MoWLD: a robust motion image descriptor for violence detection

Zhang, T; Jia, W; Yang, B; Yang, J; He, X; Zheng, Z

MoWLD: a robust motion image descriptor for violence detection

Zhang, T Jia, W

Yang, B Yang, J He, X

Zheng, Z

Permalink

Publication Type:: Journal Article
Citation:: Multimedia Tools and Applications, 2017, 76 (1), pp. 1419 - 1438
Issue Date:: 2017-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (486.64 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, T	en_US
dc.contributor.author	Jia, W https://orcid.org/0000-0002-0940-3338	en_US
dc.contributor.author	Yang, B	en_US
dc.contributor.author	Yang, J	en_US
dc.contributor.author	He, X https://orcid.org/0000-0001-8962-540X	en_US
dc.contributor.author	Zheng, Z	en_US
dc.date.issued	2017-01-01	en_US
dc.identifier.citation	Multimedia Tools and Applications, 2017, 76 (1), pp. 1419 - 1438	en_US
dc.identifier.issn	1380-7501	en_US
dc.identifier.uri	http://hdl.handle.net/10453/41300
dc.description.abstract	© 2015, Springer Science+Business Media New York. Automatic violence detection from video is a hot topic for many video surveillance applications. However, there has been little success in designing an algorithm that can detect violence in surveillance videos with high performance. Existing methods typically apply the Bag-of-Words (BoW) model on local spatiotemporal descriptors. However, traditional spatiotemporal features are not discriminative enough, and also the BoW model roughly assigns each feature vector to only one visual word and therefore ignores the spatial relationships among the features. To tackle these problems, in this paper we propose a novel Motion Weber Local Descriptor (MoWLD) in the spirit of the well-known WLD and make it a powerful and robust descriptor for motion images. We extend the WLD spatial descriptions by adding a temporal component to the appearance descriptor, which implicitly captures local motion information as well as low-level image appear information. To eliminate redundant and irrelevant features, the non-parametric Kernel Density Estimation (KDE) is employed on the MoWLD descriptor. In order to obtain more discriminative features, we adopt the sparse coding and max pooling scheme to further process the selected MoWLDs. Experimental results on three benchmark datasets have demonstrated the superiority of the proposed approach over the state-of-the-arts.	en_US
dc.relation.ispartof	Multimedia Tools and Applications	en_US
dc.relation.isbasedon	10.1007/s11042-015-3133-0	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.subject.classification	Software Engineering	en_US
dc.title	MoWLD: a robust motion image descriptor for violence detection	en_US
dc.type	Journal Article
utslib.citation.volume	1	en_US
utslib.citation.volume	76	en_US
utslib.for	0803 Computer Software	en_US
utslib.for	0805 Distributed Computing	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - CRIN - Realtime Information Networks
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
utslib.copyright.status	open_access
pubs.issue	1	en_US
pubs.publication-status	Published	en_US
pubs.volume	76	en_US

Abstract:

© 2015, Springer Science+Business Media New York. Automatic violence detection from video is a hot topic for many video surveillance applications. However, there has been little success in designing an algorithm that can detect violence in surveillance videos with high performance. Existing methods typically apply the Bag-of-Words (BoW) model on local spatiotemporal descriptors. However, traditional spatiotemporal features are not discriminative enough, and also the BoW model roughly assigns each feature vector to only one visual word and therefore ignores the spatial relationships among the features. To tackle these problems, in this paper we propose a novel Motion Weber Local Descriptor (MoWLD) in the spirit of the well-known WLD and make it a powerful and robust descriptor for motion images. We extend the WLD spatial descriptions by adding a temporal component to the appearance descriptor, which implicitly captures local motion information as well as low-level image appear information. To eliminate redundant and irrelevant features, the non-parametric Kernel Density Estimation (KDE) is employed on the MoWLD descriptor. In order to obtain more discriminative features, we adopt the sparse coding and max pooling scheme to further process the selected MoWLDs. Experimental results on three benchmark datasets have demonstrated the superiority of the proposed approach over the state-of-the-arts.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/41300