Robust spatial-temporal deep model for multimedia event detection

Yu, L; Sun, X; Huang, Z

Robust spatial-temporal deep model for multimedia event detection

Yu, L

Sun, X Huang, Z

Permalink

Publisher:: Elsevier BV
Publication Type:: Journal Article
Citation:: Neurocomputing, 2016, 213, pp. 48-53
Issue Date:: 2016-11-12

Closed Access

	Filename	Description	Size
	1-s2.0-S0925231216307275-main.pdf	Published version	1.06 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yu, L https://orcid.org/0000-0001-5260-885X
dc.contributor.author	Sun, X
dc.contributor.author	Huang, Z
dc.date.accessioned	2022-08-15T21:17:22Z
dc.date.available	2022-08-15T21:17:22Z
dc.date.issued	2016-11-12
dc.identifier.citation	Neurocomputing, 2016, 213, pp. 48-53
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/160253
dc.description.abstract	The task of multimedia event detection (MED) aims at training a set of models that can automatically detect the most event-relevant videos from large datasets. In this paper, we attempt to build a robust spatial-temporal deep neural network for large-scale video event detection. In our setting, each video follows a multiple instance assumption, where its visual segments contain both spatial and temporal properties of events. Regarding these properties, we try to implement the MED system by a two-step training phase: unsupervised recurrent video reconstruction and supervised fine-tuning. We conduct extensive experiments on the challenging TRECVID MED14 dataset, which indicate that with the consideration of both spatial and temporal information, the detection performance can be further boosted compared with the state-of-the-art MED models.
dc.language	en
dc.publisher	Elsevier BV
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2016.03.102
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Robust spatial-temporal deep model for multimedia event detection
dc.type	Journal Article
utslib.citation.volume	213
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
dc.date.updated	2022-08-15T21:17:21Z
pubs.publication-status	Published
pubs.volume	213

Abstract:

The task of multimedia event detection (MED) aims at training a set of models that can automatically detect the most event-relevant videos from large datasets. In this paper, we attempt to build a robust spatial-temporal deep neural network for large-scale video event detection. In our setting, each video follows a multiple instance assumption, where its visual segments contain both spatial and temporal properties of events. Regarding these properties, we try to implement the MED system by a two-step training phase: unsupervised recurrent video reconstruction and supervised fine-tuning. We conduct extensive experiments on the challenging TRECVID MED14 dataset, which indicate that with the consideration of both spatial and temporal information, the detection performance can be further boosted compared with the state-of-the-art MED models.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/160253