A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images

Publication Type:
Conference Proceeding
Pattern Recognition and Artificial Intelligence, 2020, 12068 LNCS, pp. 80-92
Issue Date:
Filename Description Size
Nandanwar2020_Chapter_ANewDCT-FFTFusionBasedMethodFo.pdfPublished Version1.48 MB
Adobe PDF
Full metadata record
Achieving better recognition rate for text in video action images is challenging due to multi-type texts with unpredictable backgrounds. We propose a new method for the classification of captions (which is edited text) and scene texts (which is part of an image in video images of Yoga, Concert, Teleshopping, Craft, and Recipe classes). The proposed method introduces a new fusion criterion-based on DCT and Fourier coefficients to extract features that represent good clarity and visibility of captions to separate them from scene texts. The variances for coefficients of corresponding pixels of DCT and Fourier images are computed to derive the respective weights. The weights and coefficients are further used to generate a fused image. Furthermore, the proposed method estimates sparsity in Canny edge image of each fused image to derive rules for classifying caption and scene texts. Lastly, the proposed method is evaluated on images of five above-mentioned action image classes to validate the derived rules. Comparative studies with the state-of-the-art methods on the standard databases show that the proposed method outperforms the existing methods in terms of classification. The recognition experiments before and after classification show that the recognition performance rate improves significantly after classification.
Please use this identifier to cite or link to this item: