Violent video detection based on MoSIFT feature and sparse coding

Xu, L; Gong, C; Yang, J; Wu, Q; Yao, L

Violent video detection based on MoSIFT feature and sparse coding

Xu, L Gong, C Yang, J Wu, Q

Yao, L

Permalink

Publication Type:: Conference Proceeding
Citation:: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2014, pp. 3538 - 3542
Issue Date:: 2014-01-01

Closed Access

	Filename	Description	Size
	v.pdf	Published version	441.68 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xu, L	en_US
dc.contributor.author	Gong, C	en_US
dc.contributor.author	Yang, J	en_US
dc.contributor.author	Wu, Q https://orcid.org/0000-0001-5641-2483	en_US
dc.contributor.author	Yao, L	en_US
dc.date.issued	2014-01-01	en_US
dc.identifier.citation	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2014, pp. 3538 - 3542	en_US
dc.identifier.isbn	9781479928927	en_US
dc.identifier.issn	1520-6149	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121610
dc.description.abstract	To detect violence in a video, a common video description method is to apply local spatio-temporal description on the query video. Then, the low-level description is further summarized onto the high-level feature based on Bag-of-Words (BoW) model. However, traditional spatio-temporal descriptors are not discriminative enough. Moreover, BoW model roughly assigns each feature vector to only one visual word, therefore inevitably causing quantization error. To tackle the constrains, this paper employs Motion SIFT (MoSIFT) algorithm to extract the low-level description of a query video. To eliminate the feature noise, Kernel Density Estimation (KDE) is exploited for feature selection on the MoSIFT descriptor. In order to obtain the highly discriminative video feature, this paper adopts sparse coding scheme to further process the selected MoSIFTs. Encouraging experimental results are obtained based on two challenging datasets which record both crowded scenes and non-crowded scenes. © 2014 IEEE.	en_US
dc.relation.ispartof	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings	en_US
dc.relation.isbasedon	10.1109/ICASSP.2014.6854259	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Violent video detection based on MoSIFT feature and sparse coding	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
utslib.copyright.status	closed_access	*
pubs.publication-status	Published	en_US

Abstract:

To detect violence in a video, a common video description method is to apply local spatio-temporal description on the query video. Then, the low-level description is further summarized onto the high-level feature based on Bag-of-Words (BoW) model. However, traditional spatio-temporal descriptors are not discriminative enough. Moreover, BoW model roughly assigns each feature vector to only one visual word, therefore inevitably causing quantization error. To tackle the constrains, this paper employs Motion SIFT (MoSIFT) algorithm to extract the low-level description of a query video. To eliminate the feature noise, Kernel Density Estimation (KDE) is exploited for feature selection on the MoSIFT descriptor. In order to obtain the highly discriminative video feature, this paper adopts sparse coding scheme to further process the selected MoSIFTs. Encouraging experimental results are obtained based on two challenging datasets which record both crowded scenes and non-crowded scenes. © 2014 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121610