Movie fill in the blank with adaptive temporal attention and description update

Chen, J; Shao, J; Shen, F; He, C; Gao, L; Shen, HT

Movie fill in the blank with adaptive temporal attention and description update

Chen, J Shao, J Shen, F He, C Gao, L Shen, HT

Permalink

Publication Type:: Conference Proceeding
Citation:: International Conference on Information and Knowledge Management, Proceedings, 2017, Part F131841 pp. 1039 - 1048
Issue Date:: 2017-11-06

Closed Access

	Filename	Description	Size
	p1039-chen.pdf	Published version	3.34 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Chen, J	en_US
dc.contributor.author	Shao, J	en_US
dc.contributor.author	Shen, F	en_US
dc.contributor.author	He, C	en_US
dc.contributor.author	Gao, L	en_US
dc.contributor.author	Shen, HT	en_US
dc.date.issued	2017-11-06	en_US
dc.identifier.citation	International Conference on Information and Knowledge Management, Proceedings, 2017, Part F131841 pp. 1039 - 1048	en_US
dc.identifier.isbn	9781450349185	en_US
dc.identifier.uri	http://hdl.handle.net/10453/127554
dc.description.abstract	© 2017 ACM. Recently, a new type of video understanding task called Movie-Fillin- the-Blank (MovieFIB) has attracted many research attentions. Given a pair of movie clip and description with one blank word as input, MovieFIB aims to automatically predict the blank word. Because of the advantage in processing sequence data, Long-Short Term Memory (LSTM) has been used as a key component in existing MovieFIB methods to generate representations of videos and descriptions. However, most of these methods fail to emphasize the salient parts of videos. To address this problem, in this paper we propose to use a novel LSTM network called LSTM with Linguistic gate (LSTMwL), which exploits adaptive temporal attention for MovieFIB. Specifically, we first use LSTM to produce video features, which are then used to update the text representation. Finally, we put the updated text into two opposite directional LSTMwL layers to infer the blank word. Experimental results demonstrate that our approach outperforms state-of-the-art models for MovieFIB.	en_US
dc.relation.ispartof	International Conference on Information and Knowledge Management, Proceedings	en_US
dc.relation.isbasedon	10.1145/10.1145/3132847.3132922	en_US
dc.title	Movie fill in the blank with adaptive temporal attention and description update	en_US
dc.type	Conference Proceeding
utslib.citation.volume	Part F131841	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Software
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	Part F131841	en_US

Abstract:

© 2017 ACM. Recently, a new type of video understanding task called Movie-Fillin- the-Blank (MovieFIB) has attracted many research attentions. Given a pair of movie clip and description with one blank word as input, MovieFIB aims to automatically predict the blank word. Because of the advantage in processing sequence data, Long-Short Term Memory (LSTM) has been used as a key component in existing MovieFIB methods to generate representations of videos and descriptions. However, most of these methods fail to emphasize the salient parts of videos. To address this problem, in this paper we propose to use a novel LSTM network called LSTM with Linguistic gate (LSTMwL), which exploits adaptive temporal attention for MovieFIB. Specifically, we first use LSTM to produce video features, which are then used to update the text representation. Finally, we put the updated text into two opposite directional LSTMwL layers to infer the blank word. Experimental results demonstrate that our approach outperforms state-of-the-art models for MovieFIB.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/127554