UTS CETC D2DCRC submission at the TRECVID 2018 video to text description task

Publication Type:
Conference Proceeding
Citation:
2018 TREC Video Retrieval Evaluation, TRECVID 2018, 2020
Issue Date:
2020-01-01
Full metadata record
In this paper, we report our methods on the video to text description task of TRECVID 2018[1]. The task consists of two subtasks, i.e., Description generation and Matching & Ranking. In the description generation subtask, because no standard training data provided, we principally focused on saturating the generalization ability of our model. Instead of exploring complex models, we investigated the widely used LSTM based sequence to sequence model[10] and some of its variants, which are simple yet robust enough. Besides, we also reviewed some training strategies to expand the generalization ability of our model. In the matching and ranking subtask, we designed a two-branch deep model[6] to embed visual content and semantic content respectively. The model helps to project the information from different modalities into the common embedding space. Further, we examined some metric learning losses, like triplet loss and its variants, in our experiments.
Please use this identifier to cite or link to this item: