Detecting named entities in unstructured bengali manuscript images

Publisher:
IEEE
Publication Type:
Conference Proceeding
Citation:
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2020, 00, pp. 196-201
Issue Date:
2020
Full metadata record
© 2019 IEEE. In this paper, we undertake a task to find named entities directly from unstructured handwritten document images without any intermediate text/character recognition. Here, we do not receive any assistance from natural language processing. Therefore, it becomes more challenging to detect the named entities. We work on Bengali script which brings some additional hurdles due to its own unique script characteristics. Here, we propose a new deep neural network-based architecture to extract the latent features from a text image. The embedding is then fed to a BLSTM (Bidirectional Long Short-Term Memory) layer. After that, the attention mechanism is adapted to an approach for named entity detection. We perform experimentation on two publicly-available offline handwriting repositories containing 420 Bengali handwritten pages in total. The experimental outcome of our system is quite impressive as it attains 95.43% balanced accuracy on overall named entity detection.
Please use this identifier to cite or link to this item: