A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images

Nandanwar, L; Shivakumara, P; Manna, S; Pal, U; Lu, T; Blumenstein, M

A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images

Nandanwar, L Shivakumara, P Manna, S Pal, U Lu, T Blumenstein, M

Permalink

Publisher:: Springer
Publication Type:: Conference Proceeding
Citation:: Pattern Recognition and Artificial Intelligence, 2020, 12068 LNCS, pp. 80-92
Issue Date:: 2020-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (435.17 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Nandanwar, L
dc.contributor.author	Shivakumara, P
dc.contributor.author	Manna, S
dc.contributor.author	Pal, U
dc.contributor.author	Lu, T
dc.contributor.author	Blumenstein, M https://orcid.org/0000-0002-9908-3744
dc.date	2020-10-19
dc.date.accessioned	2021-04-09T06:25:48Z
dc.date.available	2021-04-09T06:25:48Z
dc.date.issued	2020-01-01
dc.identifier.citation	Pattern Recognition and Artificial Intelligence, 2020, 12068 LNCS, pp. 80-92
dc.identifier.isbn	9783030598297
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.uri	http://hdl.handle.net/10453/147960
dc.description.abstract	Achieving better recognition rate for text in video action images is challenging due to multi-type texts with unpredictable backgrounds. We propose a new method for the classification of captions (which is edited text) and scene texts (which is part of an image in video images of Yoga, Concert, Teleshopping, Craft, and Recipe classes). The proposed method introduces a new fusion criterion-based on DCT and Fourier coefficients to extract features that represent good clarity and visibility of captions to separate them from scene texts. The variances for coefficients of corresponding pixels of DCT and Fourier images are computed to derive the respective weights. The weights and coefficients are further used to generate a fused image. Furthermore, the proposed method estimates sparsity in Canny edge image of each fused image to derive rules for classifying caption and scene texts. Lastly, the proposed method is evaluated on images of five above-mentioned action image classes to validate the derived rules. Comparative studies with the state-of-the-art methods on the standard databases show that the proposed method outperforms the existing methods in terms of classification. The recognition experiments before and after classification show that the recognition performance rate improves significantly after classification.
dc.language	en
dc.publisher	Springer
dc.relation.ispartof	Pattern Recognition and Artificial Intelligence
dc.relation.ispartof	International Conference on Pattern Recognition and Artificial Intelligence
dc.relation.ispartofseries	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.relation.isbasedon	10.1007/978-3-030-59830-3_7
dc.rights	info:eu-repo/semantics/openAccess
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images
dc.type	Conference Proceeding
utslib.citation.volume	12068 LNCS
utslib.location.activity	Zhongshan, China
utslib.for	0801 Artificial Intelligence and Image Processing
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Strength - QSI - Centre for Quantum Software and Information
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
dc.date.updated	2021-04-09T06:25:47Z
pubs.finish-date	2020-10-23
pubs.place-of-publication	Switzerland
pubs.publication-status	Published
pubs.start-date	2020-10-19
pubs.volume	12068 LNCS
dc.location	Switzerland

Abstract:

Achieving better recognition rate for text in video action images is challenging due to multi-type texts with unpredictable backgrounds. We propose a new method for the classification of captions (which is edited text) and scene texts (which is part of an image in video images of Yoga, Concert, Teleshopping, Craft, and Recipe classes). The proposed method introduces a new fusion criterion-based on DCT and Fourier coefficients to extract features that represent good clarity and visibility of captions to separate them from scene texts. The variances for coefficients of corresponding pixels of DCT and Fourier images are computed to derive the respective weights. The weights and coefficients are further used to generate a fused image. Furthermore, the proposed method estimates sparsity in Canny edge image of each fused image to derive rules for classifying caption and scene texts. Lastly, the proposed method is evaluated on images of five above-mentioned action image classes to validate the derived rules. Comparative studies with the state-of-the-art methods on the standard databases show that the proposed method outperforms the existing methods in terms of classification. The recognition experiments before and after classification show that the recognition performance rate improves significantly after classification.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/147960