DeepText: Detecting text from the wild with multi-ASPP-assembled deeplab

Publisher:
IEEE
Publication Type:
Conference Proceeding
Citation:
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2020, 00, pp. 208-213
Issue Date:
2020
Full metadata record
© 2019 IEEE. In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.
Please use this identifier to cite or link to this item: