Detecting named entities in unstructured bengali manuscript images

Adak, C; Chaudhuri, BB; Lin, CT; Blumenstein, M

Detecting named entities in unstructured bengali manuscript images

Adak, C

Chaudhuri, BB Lin, CT Blumenstein, M

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2020, 00, pp. 196-201
Issue Date:: 2020

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download Accepted Manuscript VersionAdobe PDF (598.11 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Adak, C https://orcid.org/0000-0002-9085-2770
dc.contributor.author	Chaudhuri, BB
dc.contributor.author	Lin, CT
dc.contributor.author	Blumenstein, M https://orcid.org/0000-0002-9908-3744
dc.date	2019-09-20
dc.date.accessioned	2021-03-15T23:59:31Z
dc.date.available	2021-03-15T23:59:31Z
dc.date.issued	2020
dc.identifier.citation	Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2020, 00, pp. 196-201
dc.identifier.isbn	9781728128610
dc.identifier.issn	1520-5363
dc.identifier.uri	http://hdl.handle.net/10453/147198
dc.description.abstract	© 2019 IEEE. In this paper, we undertake a task to find named entities directly from unstructured handwritten document images without any intermediate text/character recognition. Here, we do not receive any assistance from natural language processing. Therefore, it becomes more challenging to detect the named entities. We work on Bengali script which brings some additional hurdles due to its own unique script characteristics. Here, we propose a new deep neural network-based architecture to extract the latent features from a text image. The embedding is then fed to a BLSTM (Bidirectional Long Short-Term Memory) layer. After that, the attention mechanism is adapted to an approach for named entity detection. We perform experimentation on two publicly-available offline handwriting repositories containing 420 Bengali handwritten pages in total. The experimental outcome of our system is quite impressive as it attains 95.43% balanced accuracy on overall named entity detection.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
dc.relation.ispartof	International Conference on Document Analysis and Recognition
dc.relation.isbasedon	10.1109/ICDAR.2019.00040
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Detecting named entities in unstructured bengali manuscript images
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Sydney, Australia
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Strength - QSI - Centre for Quantum Software and Information
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
dc.date.updated	2021-03-15T23:59:30Z
pubs.finish-date	2019-09-25
pubs.place-of-publication	Piscataway, USA
pubs.publication-status	Published
pubs.start-date	2019-09-20
pubs.volume	00
dc.location	Piscataway, USA

Abstract:

© 2019 IEEE. In this paper, we undertake a task to find named entities directly from unstructured handwritten document images without any intermediate text/character recognition. Here, we do not receive any assistance from natural language processing. Therefore, it becomes more challenging to detect the named entities. We work on Bengali script which brings some additional hurdles due to its own unique script characteristics. Here, we propose a new deep neural network-based architecture to extract the latent features from a text image. The embedding is then fed to a BLSTM (Bidirectional Long Short-Term Memory) layer. After that, the attention mechanism is adapted to an approach for named entity detection. We perform experimentation on two publicly-available offline handwriting repositories containing 420 Bengali handwritten pages in total. The experimental outcome of our system is quite impressive as it attains 95.43% balanced accuracy on overall named entity detection.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/147198