Movie fill in the blank with adaptive temporal attention and description update

Publication Type:
Conference Proceeding
International Conference on Information and Knowledge Management, Proceedings, 2017, Part F131841 pp. 1039 - 1048
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
p1039-chen.pdfPublished version3.34 MB
Adobe PDF
© 2017 ACM. Recently, a new type of video understanding task called Movie-Fillin- the-Blank (MovieFIB) has attracted many research attentions. Given a pair of movie clip and description with one blank word as input, MovieFIB aims to automatically predict the blank word. Because of the advantage in processing sequence data, Long-Short Term Memory (LSTM) has been used as a key component in existing MovieFIB methods to generate representations of videos and descriptions. However, most of these methods fail to emphasize the salient parts of videos. To address this problem, in this paper we propose to use a novel LSTM network called LSTM with Linguistic gate (LSTMwL), which exploits adaptive temporal attention for MovieFIB. Specifically, we first use LSTM to produce video features, which are then used to update the text representation. Finally, we put the updated text into two opposite directional LSTMwL layers to infer the blank word. Experimental results demonstrate that our approach outperforms state-of-the-art models for MovieFIB.
Please use this identifier to cite or link to this item: