A Connected Component-Based Deep Learning Model for Multi-type Struck-Out Component Classification

Publisher:
Springer
Publication Type:
Conference Proceeding
Citation:
Document Analysis and Recognition – ICDAR 2021 Workshops, 2021, 12917 LNCS, pp. 158-173
Issue Date:
2021-01-01
Filename Description Size
Shivakumara2021_Chapter_AConnectedComponent-BasedDeepL.pdfPublished version3.15 MB
Adobe PDF
Full metadata record
Due to the presence of struck-out handwritten words in document images, the performance of different methods degrades for several important applications, such as handwriting recognition, writer, gender, fraudulent document identification, document age estimation, writer age estimation, normal/abnormal behavior of person analysis, and descriptive answer evaluation. This work proposes a new method which combines connected component analysis for text component detection and deep learning for classification of struck-out and non-struck-out words. For text component detection, the proposed method finds the stroke width to detect edges of texts in images, and then performs smoothing operations to remove noise. Furthermore, morphological operations are performed on smoothed images to label connected components as text by fixing bounding boxes. Inspired by the great success of deep learning models, we explore DenseNet for classifying struck-out and non-struck-out handwritten components by considering text components as input. Experimental results on our dataset demonstrate the proposed method outperforms the existing methods in terms of classification rate.
Please use this identifier to cite or link to this item: