UTS CETC D2DCRC submission at the TRECVID 2018 video to text description task

Li, G; Wang, Z; Yang, Y

UTS CETC D2DCRC submission at the TRECVID 2018 video to text description task

Li, G Wang, Z Yang, Y

Permalink

Publication Type:: Conference Proceeding
Citation:: 2018 TREC Video Retrieval Evaluation, TRECVID 2018, 2020
Issue Date:: 2020-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (76.72 kB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Li, G
dc.contributor.author	Wang, Z
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date.accessioned	2021-06-25T05:24:29Z
dc.date.available	2021-06-25T05:24:29Z
dc.date.issued	2020-01-01
dc.identifier.citation	2018 TREC Video Retrieval Evaluation, TRECVID 2018, 2020
dc.identifier.uri	http://hdl.handle.net/10453/149758
dc.description.abstract	In this paper, we report our methods on the video to text description task of TRECVID 2018[1]. The task consists of two subtasks, i.e., Description generation and Matching & Ranking. In the description generation subtask, because no standard training data provided, we principally focused on saturating the generalization ability of our model. Instead of exploring complex models, we investigated the widely used LSTM based sequence to sequence model[10] and some of its variants, which are simple yet robust enough. Besides, we also reviewed some training strategies to expand the generalization ability of our model. In the matching and ranking subtask, we designed a two-branch deep model[6] to embed visual content and semantic content respectively. The model helps to project the information from different modalities into the common embedding space. Further, we examined some metric learning losses, like triplet loss and its variants, in our experiments.
dc.language	en
dc.relation.ispartof	2018 TREC Video Retrieval Evaluation, TRECVID 2018
dc.rights	info:eu-repo/semantics/openAccess
dc.title	UTS CETC D2DCRC submission at the TRECVID 2018 video to text description task
dc.type	Conference Proceeding
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
dc.date.updated	2021-06-25T05:24:27Z
pubs.publication-status	Published

Abstract:

In this paper, we report our methods on the video to text description task of TRECVID 2018[1]. The task consists of two subtasks, i.e., Description generation and Matching & Ranking. In the description generation subtask, because no standard training data provided, we principally focused on saturating the generalization ability of our model. Instead of exploring complex models, we investigated the widely used LSTM based sequence to sequence model[10] and some of its variants, which are simple yet robust enough. Besides, we also reviewed some training strategies to expand the generalization ability of our model. In the matching and ranking subtask, we designed a two-branch deep model[6] to embed visual content and semantic content respectively. The model helps to project the information from different modalities into the common embedding space. Further, we examined some metric learning losses, like triplet loss and its variants, in our experiments.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/149758