Benchmarked multi-script Thai scene text dataset and its multi-class detection solution

Suwanwiwat, H; Das, A; Saqib, M; Pal, U

Benchmarked multi-script Thai scene text dataset and its multi-class detection solution

Suwanwiwat, H Das, A Saqib, M

Pal, U

Permalink

Publisher:: Springer Science and Business Media LLC
Publication Type:: Journal Article
Citation:: Multimedia Tools and Applications, 2021, 80, (8), pp. 11843-11863
Issue Date:: 2021-03-01

Closed Access

	Filename	Description	Size
	Suwanwiwat2021_Article_BenchmarkedMulti-scriptThaiSce.pdf	Published version	7.5 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Suwanwiwat, H
dc.contributor.author	Das, A
dc.contributor.author	Saqib, M https://orcid.org/0000-0003-4374-0888
dc.contributor.author	Pal, U
dc.date.accessioned	2022-03-20T19:51:23Z
dc.date.available	2022-03-20T19:51:23Z
dc.date.issued	2021-03-01
dc.identifier.citation	Multimedia Tools and Applications, 2021, 80, (8), pp. 11843-11863
dc.identifier.issn	1380-7501
dc.identifier.issn	1573-7721
dc.identifier.uri	http://hdl.handle.net/10453/155384
dc.description.abstract	Detecting text portion from scene images can be found to be one of the prevalent research topics. Text detection is considered challenging and non-interoperable since there could be multiple scripts in a scene image. Each of these scripts can have different properties, therefore, it is crucial to research the scene text detection based on the geographical location owing to different scripts. As no work on large-scale multi-script Thai scene text detection is found in the literature, the work conducted in this study focuses on multi-script text that includes Thai, English (Roman), Chinese or Chinese-like script, and Arabic. These scripts can generally be seen around Thailand. Thai script contains more consonants, vowels, and has numerals when compared to the Roman/ English script. Furthermore, the placement of letters, intonation marks, as well as vowels, are different from English or Chinese-like script. Hence, it could be considered challenging to detect and recognise the Thai text. This study proposed a multi-script dataset which includes the aforementioned scripts and numerals, along with a benchmarking employing Single Shot Multi-Box Detector (SSD) and Faster Regions with Convolutional Neural Networks (F-RCNN). The proposed dataset contains scene images which were recorded in Thailand. The dataset consists of 600 images, together with their manual detection annotation. This study also proposed a detection technique hypothesising a multiscript scene text detection problem as a multi-class detection problem which found to work more effective than legacy approaches. The experimental results from employing the proposed technique with the dataset achieved encouraging precision and recall rates when compared with such methods. The proposed dataset is available upon email request to the corresponding authors.
dc.language	en
dc.publisher	Springer Science and Business Media LLC
dc.relation.ispartof	Multimedia Tools and Applications
dc.relation.isbasedon	10.1007/s11042-020-10143-w
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0803 Computer Software, 0805 Distributed Computing, 0806 Information Systems
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	Software Engineering
dc.title	Benchmarked multi-script Thai scene text dataset and its multi-class detection solution
dc.type	Journal Article
utslib.citation.volume	80
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0803 Computer Software
utslib.for	0805 Distributed Computing
utslib.for	0806 Information Systems
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2022-03-20T19:51:19Z
pubs.issue	8
pubs.publication-status	Published
pubs.volume	80
utslib.citation.issue	8

Abstract:

Detecting text portion from scene images can be found to be one of the prevalent research topics. Text detection is considered challenging and non-interoperable since there could be multiple scripts in a scene image. Each of these scripts can have different properties, therefore, it is crucial to research the scene text detection based on the geographical location owing to different scripts. As no work on large-scale multi-script Thai scene text detection is found in the literature, the work conducted in this study focuses on multi-script text that includes Thai, English (Roman), Chinese or Chinese-like script, and Arabic. These scripts can generally be seen around Thailand. Thai script contains more consonants, vowels, and has numerals when compared to the Roman/ English script. Furthermore, the placement of letters, intonation marks, as well as vowels, are different from English or Chinese-like script. Hence, it could be considered challenging to detect and recognise the Thai text. This study proposed a multi-script dataset which includes the aforementioned scripts and numerals, along with a benchmarking employing Single Shot Multi-Box Detector (SSD) and Faster Regions with Convolutional Neural Networks (F-RCNN). The proposed dataset contains scene images which were recorded in Thailand. The dataset consists of 600 images, together with their manual detection annotation. This study also proposed a detection technique hypothesising a multiscript scene text detection problem as a multi-class detection problem which found to work more effective than legacy approaches. The experimental results from employing the proposed technique with the dataset achieved encouraging precision and recall rates when compared with such methods. The proposed dataset is available upon email request to the corresponding authors.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/155384