ContextualNet: Exploiting contextual information using LSTMs to improve image-based localization

Patel, M; Emery, B; Chen, YY

ContextualNet: Exploiting contextual information using LSTMs to improve image-based localization

Patel, M Emery, B Chen, YY

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - IEEE International Conference on Robotics and Automation, 2018, pp. 5890 - 5896
Issue Date:: 2018-09-10

Closed Access

	Filename	Description	Size
	08461124.pdf	Published version	1.59 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Patel, M	en_US
dc.contributor.author	Emery, B	en_US
dc.contributor.author	Chen, YY	en_US
dc.date.issued	2018-09-10	en_US
dc.identifier.citation	Proceedings - IEEE International Conference on Robotics and Automation, 2018, pp. 5890 - 5896	en_US
dc.identifier.isbn	9781538630815	en_US
dc.identifier.issn	1050-4729	en_US
dc.identifier.uri	http://hdl.handle.net/10453/133349
dc.description.abstract	© 2018 IEEE. Convolutional Neural Networks (CNN) have successfully been utilized for localization using a single monocular image [1]. Most of the work to date has either focused on reducing the dimensionality of data for better learning of parameters during training or on developing different variations of CNN models to improve pose estimation. Many of the best performing works solely consider the content in a single image, while the context from historical images is ignored. In this paper, we propose a combined CNN-LSTM which is capable of incorporating contextual information from historical images to better estimate the current pose. Experimental results achieved using a dataset collected in an indoor office space improved the overall system results to 0.8 m 2.5° at the third quartile of the cumulative distribution as compared with 1.5 m 3.0° achieved by PoseNet [1]. Furthermore, we demonstrate how the temporal information exploited by the CNN-LSTM model assists in localizing the robot in situations where image content does not have sufficient features. Keywords: Localization, Long Short Term Memory (LSTM), Computer Vision, Deep Neural Network, Pose estimation.	en_US
dc.relation.ispartof	Proceedings - IEEE International Conference on Robotics and Automation	en_US
dc.relation.isbasedon	10.1109/ICRA.2018.8461124	en_US
dc.title	ContextualNet: Exploiting contextual information using LSTMs to improve image-based localization	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2018 IEEE. Convolutional Neural Networks (CNN) have successfully been utilized for localization using a single monocular image [1]. Most of the work to date has either focused on reducing the dimensionality of data for better learning of parameters during training or on developing different variations of CNN models to improve pose estimation. Many of the best performing works solely consider the content in a single image, while the context from historical images is ignored. In this paper, we propose a combined CNN-LSTM which is capable of incorporating contextual information from historical images to better estimate the current pose. Experimental results achieved using a dataset collected in an indoor office space improved the overall system results to 0.8 m 2.5° at the third quartile of the cumulative distribution as compared with 1.5 m 3.0° achieved by PoseNet [1]. Furthermore, we demonstrate how the temporal information exploited by the CNN-LSTM model assists in localizing the robot in situations where image content does not have sufficient features. Keywords: Localization, Long Short Term Memory (LSTM), Computer Vision, Deep Neural Network, Pose estimation.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/133349