A handwritten chinese text recognizer applying multi-level multimodal fusion network

Xiu, Y; Wang, Q; Zhan, H; Lan, M; Lu, Y

A handwritten chinese text recognizer applying multi-level multimodal fusion network

Xiu, Y Wang, Q Zhan, H Lan, M Lu, Y

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2020, 00, pp. 1464-1469
Issue Date:: 2020

Closed Access

	Filename	Description	Size
	08978158.pdf	Published Version	1.37 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xiu, Y
dc.contributor.author	Wang, Q
dc.contributor.author	Zhan, H
dc.contributor.author	Lan, M
dc.contributor.author	Lu, Y
dc.date	2019-09-20
dc.date.accessioned	2021-04-08T06:29:38Z
dc.date.available	2021-04-08T06:29:38Z
dc.date.issued	2020
dc.identifier.citation	Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2020, 00, pp. 1464-1469
dc.identifier.isbn	9781728128610
dc.identifier.issn	1520-5363
dc.identifier.issn	2379-2140
dc.identifier.uri	http://hdl.handle.net/10453/147901
dc.description.abstract	© 2019 IEEE. Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
dc.relation.ispartof	International Conference on Document Analysis and Recognition
dc.relation.isbasedon	10.1109/ICDAR.2019.00235
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	A handwritten chinese text recognizer applying multi-level multimodal fusion network
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Sydney, Australia
utslib.for	0801 Artificial Intelligence and Image Processing
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2021-04-08T06:29:35Z
pubs.finish-date	2019-09-25
pubs.place-of-publication	Piscataway, USA
pubs.publication-status	Published
pubs.start-date	2019-09-20
pubs.volume	00
dc.location	Piscataway, USA

Abstract:

© 2019 IEEE. Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/147901