A new context-based method for restoring occluded text in natural scene images

Mittal, A; Shivakumara, P; Pal, U; Lu, T; Blumenstein, M; Lopresti, D

A new context-based method for restoring occluded text in natural scene images

Mittal, A Shivakumara, P Pal, U Lu, T Blumenstein, M

Lopresti, D

Permalink

Publisher:: Springer
Publication Type:: Conference Proceeding
Citation:: Document Analysis Systems, 2020, 12116 LNCS, pp. 466-480
Issue Date:: 2020-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (419.11 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Mittal, A
dc.contributor.author	Shivakumara, P
dc.contributor.author	Pal, U
dc.contributor.author	Lu, T
dc.contributor.author	Blumenstein, M https://orcid.org/0000-0002-9908-3744
dc.contributor.author	Lopresti, D
dc.date	2020-07-26
dc.date.accessioned	2021-05-24T07:10:14Z
dc.date.available	2021-05-24T07:10:14Z
dc.date.issued	2020-01-01
dc.identifier.citation	Document Analysis Systems, 2020, 12116 LNCS, pp. 466-480
dc.identifier.isbn	9783030570576
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.uri	http://hdl.handle.net/10453/149158
dc.description.abstract	Text recognition from natural scene images is an active research area because of its important real world applications, including multimedia search and retrieval, and scene understanding through computer vision. It is often the case that portions of text in images are missed due to occlusion with objects in the background. Therefore, this paper presents a method for restoring occluded text to improve text recognition performance. The proposed method uses the GOOGLE Vision API for obtaining labels for input images. We propose to use PixelLink-E2E methods for detecting text and obtaining recognition results. Using these results, the proposed method generates candidate words based on distance measures employing lexicons created through natural scene text recognition. We extract the semantic similarity between labels and recognition results, which results in a Global Context Score (GCS). Next, we use the Natural Language Processing (NLP) system known as BERT for extracting semantics between candidate words, which results in a Local Context Score (LCS). Global and local context scores are then fused for estimating the ranking for each candidate word. The word that gets the highest ranking is taken as the correction for text which is occluded in the image. Experimental results on a dataset assembled from standard natural scene datasets and our resources show that our approach helps to improve the text recognition performance significantly.
dc.language	en
dc.publisher	Springer
dc.relation.ispartof	Document Analysis Systems
dc.relation.ispartof	International Workshop on Document Analysis Systems
dc.relation.ispartofseries	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.relation.isbasedon	10.1007/978-3-030-57058-3_33
dc.rights	info:eu-repo/semantics/openAccess
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	A new context-based method for restoring occluded text in natural scene images
dc.type	Conference Proceeding
utslib.citation.volume	12116 LNCS
utslib.location.activity	Wuhan, China
utslib.for	0801 Artificial Intelligence and Image Processing
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Strength - QSI - Centre for Quantum Software and Information
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
dc.date.updated	2021-05-24T07:10:13Z
pubs.finish-date	2020-07-29
pubs.place-of-publication	Switzerland
pubs.publication-status	Published
pubs.start-date	2020-07-26
pubs.volume	12116 LNCS
dc.location	Switzerland

Abstract:

Text recognition from natural scene images is an active research area because of its important real world applications, including multimedia search and retrieval, and scene understanding through computer vision. It is often the case that portions of text in images are missed due to occlusion with objects in the background. Therefore, this paper presents a method for restoring occluded text to improve text recognition performance. The proposed method uses the GOOGLE Vision API for obtaining labels for input images. We propose to use PixelLink-E2E methods for detecting text and obtaining recognition results. Using these results, the proposed method generates candidate words based on distance measures employing lexicons created through natural scene text recognition. We extract the semantic similarity between labels and recognition results, which results in a Global Context Score (GCS). Next, we use the Natural Language Processing (NLP) system known as BERT for extracting semantics between candidate words, which results in a Local Context Score (LCS). Global and local context scores are then fused for estimating the ranking for each candidate word. The word that gets the highest ranking is taken as the correction for text which is occluded in the image. Experimental results on a dataset assembled from standard natural scene datasets and our resources show that our approach helps to improve the text recognition performance significantly.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/149158