Arbitrarily-oriented multi-lingual text detection in video

Khare, V; Shivakumara, P; Paramesran, R; Blumenstein, M

Arbitrarily-oriented multi-lingual text detection in video

Khare, V Shivakumara, P Paramesran, R Blumenstein, M

Permalink

Publication Type:: Journal Article
Citation:: Multimedia Tools and Applications, 2017, 76 (15), pp. 16625 - 16655
Issue Date:: 2017-08-01

Closed Access

	Filename	Description	Size
	Arbitrarily-oriented multi-lingual text detection in video.pdf	Published Version	5.1 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Khare, V	en_US
dc.contributor.author	Shivakumara, P	en_US
dc.contributor.author	Paramesran, R	en_US
dc.contributor.author	Blumenstein, M https://orcid.org/0000-0002-9908-3744	en_US
dc.date.issued	2017-08-01	en_US
dc.identifier.citation	Multimedia Tools and Applications, 2017, 76 (15), pp. 16625 - 16655	en_US
dc.identifier.issn	1380-7501	en_US
dc.identifier.uri	http://hdl.handle.net/10453/114755
dc.description.abstract	© 2016, Springer Science+Business Media New York. Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and retrieval systems. In this paper, we propose to explore moments for identifying text candidates. We introduce a novel idea for determining automatic windows to extract moments for tackling multi-font and multi-sized text in video based on stroke width information. The temporal information is explored to find deviations between moving and non-moving pixels in successive frames iteratively, which results in static clusters containing caption text and dynamic clusters containing scene text, as well as background pixels. The gradient directions of pixels in static and dynamic clusters are analyzed to identify the potential text candidates. Furthermore, boundary growing is proposed that expands the boundary of potential text candidates until it finds neighbor components based on the nearest neighbor criterion. This process outputs text lines appearing in the video. Experimental results on standard video data, namely, ICDAR 2013, ICDAR 2015, YVT videos and on our own English and Multi-lingual videos demonstrate that the proposed method outperforms the state-of-the-art methods.	en_US
dc.relation.ispartof	Multimedia Tools and Applications	en_US
dc.relation.isbasedon	10.1007/s11042-016-3941-x	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.subject.classification	Software Engineering	en_US
dc.title	Arbitrarily-oriented multi-lingual text detection in video	en_US
dc.type	Journal Article
utslib.citation.volume	15	en_US
utslib.citation.volume	76	en_US
utslib.for	0803 Computer Software	en_US
utslib.for	0805 Distributed Computing	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Strength - QSI - Centre for Quantum Software and Information
utslib.copyright.status	closed_access
pubs.issue	15	en_US
pubs.publication-status	Published	en_US
pubs.volume	76	en_US

Abstract:

© 2016, Springer Science+Business Media New York. Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and retrieval systems. In this paper, we propose to explore moments for identifying text candidates. We introduce a novel idea for determining automatic windows to extract moments for tackling multi-font and multi-sized text in video based on stroke width information. The temporal information is explored to find deviations between moving and non-moving pixels in successive frames iteratively, which results in static clusters containing caption text and dynamic clusters containing scene text, as well as background pixels. The gradient directions of pixels in static and dynamic clusters are analyzed to identify the potential text candidates. Furthermore, boundary growing is proposed that expands the boundary of potential text candidates until it finds neighbor components based on the nearest neighbor criterion. This process outputs text lines appearing in the video. Experimental results on standard video data, namely, ICDAR 2013, ICDAR 2015, YVT videos and on our own English and Multi-lingual videos demonstrate that the proposed method outperforms the state-of-the-art methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/114755