ContextualNet: Exploiting contextual information using LSTMs to improve image-based localization

Publication Type:
Conference Proceeding
Proceedings - IEEE International Conference on Robotics and Automation, 2018, pp. 5890 - 5896
Issue Date:
Filename Description Size
08461124.pdfPublished version1.59 MB
Adobe PDF
Full metadata record
© 2018 IEEE. Convolutional Neural Networks (CNN) have successfully been utilized for localization using a single monocular image [1]. Most of the work to date has either focused on reducing the dimensionality of data for better learning of parameters during training or on developing different variations of CNN models to improve pose estimation. Many of the best performing works solely consider the content in a single image, while the context from historical images is ignored. In this paper, we propose a combined CNN-LSTM which is capable of incorporating contextual information from historical images to better estimate the current pose. Experimental results achieved using a dataset collected in an indoor office space improved the overall system results to 0.8 m 2.5° at the third quartile of the cumulative distribution as compared with 1.5 m 3.0° achieved by PoseNet [1]. Furthermore, we demonstrate how the temporal information exploited by the CNN-LSTM model assists in localizing the robot in situations where image content does not have sufficient features. Keywords: Localization, Long Short Term Memory (LSTM), Computer Vision, Deep Neural Network, Pose estimation.
Please use this identifier to cite or link to this item: