A handwritten chinese text recognizer applying multi-level multimodal fusion network

Publication Type:
Conference Proceeding
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2020, 00, pp. 1464-1469
Issue Date:
Filename Description Size
08978158.pdfPublished Version1.37 MB
Adobe PDF
Full metadata record
© 2019 IEEE. Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.
Please use this identifier to cite or link to this item: