Few-shot activity recognition with cross-modal memory network
- Publisher:
- ELSEVIER SCI LTD
- Publication Type:
- Journal Article
- Citation:
- Pattern Recognition, 2020, 108
- Issue Date:
- 2020-12-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
1-s2.0-S0031320320301515-main.pdf | Published version | 1.71 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
Deep learning based action recognition methods require large amount of labelled training data. However, labelling large-scale video data is time consuming and tedious. In this paper, we consider a more challenging few-shot action recognition problem where the training samples are few and rare. To solve this problem, memory network has been designed to use an external memory to remember the experience learned in training and then apply it to few-shot prediction during testing. However, existing memory-based methods just update the visual information with fixed label embeddings in the memory, which cannot adapt well to novel activities during testing. To alleviate the issue, we propose a novel end-to-end cross-modal memory network for few-shot activity recognition. Specifically, the proposed memory architecture stores the dynamic visual and textual semantics for some high-level attributes related to human activities. And the learned memory can provide effective multi-modal information for new activity recognition in the testing stage. Extensive experimental results on two video datasets, including HMDB51 and UCF101, indicate that our method could achieve significant improvements over other previous methods.
Please use this identifier to cite or link to this item: