Robust spatial-temporal deep model for multimedia event detection
- Publisher:
- Elsevier BV
- Publication Type:
- Journal Article
- Citation:
- Neurocomputing, 2016, 213, pp. 48-53
- Issue Date:
- 2016-11-12
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
1-s2.0-S0925231216307275-main.pdf | Published version | 1.06 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
The task of multimedia event detection (MED) aims at training a set of models that can automatically detect the most event-relevant videos from large datasets. In this paper, we attempt to build a robust spatial-temporal deep neural network for large-scale video event detection. In our setting, each video follows a multiple instance assumption, where its visual segments contain both spatial and temporal properties of events. Regarding these properties, we try to implement the MED system by a two-step training phase: unsupervised recurrent video reconstruction and supervised fine-tuning. We conduct extensive experiments on the challenging TRECVID MED14 dataset, which indicate that with the consideration of both spatial and temporal information, the detection performance can be further boosted compared with the state-of-the-art MED models.
Please use this identifier to cite or link to this item: