Semi-supervised feature learning for improving writer identification

Chen, S; Wang, Y; Lin, CT; Ding, W; Cao, Z

Semi-supervised feature learning for improving writer identification

Chen, S Wang, Y Lin, CT

Ding, W Cao, Z

Permalink

Publication Type:: Journal Article
Citation:: Information Sciences, 2019, 482 pp. 156 - 170
Issue Date:: 2019-05-01

Closed Access

	Filename	Description	Size
	1-s2.0-S0020025519300283-main.pdf	Published Version	2.86 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Chen, S	en_US
dc.contributor.author	Wang, Y	en_US
dc.contributor.author	Lin, CT https://orcid.org/0000-0001-8371-8197	en_US
dc.contributor.author	Ding, W	en_US
dc.contributor.author	Cao, Z https://orcid.org/0000-0003-3656-0328	en_US
dc.date.accessioned	2020-04-17T01:04:58Z
dc.date.available	2020-04-17T01:04:58Z
dc.date.issued	2019-05-01	en_US
dc.identifier.citation	Information Sciences, 2019, 482 pp. 156 - 170	en_US
dc.identifier.issn	0020-0255	en_US
dc.identifier.uri	http://hdl.handle.net/10453/140060
dc.description.abstract	© 2019 Elsevier Inc. Data augmentation is typically used by supervised feature learning approaches for offline writer identification, but such approaches require a mass of additional training data and potentially lead to overfitting errors. In this study, a semi-supervised feature learning pipeline is proposed to improve the performance of writer identification by training with extra unlabeled data and the original labeled data simultaneously. Specifically, we propose a weighted label smoothing regularization (WLSR) method for data augmentation, which assigns a weighted uniform label distribution to the extra unlabeled data. The WLSR method regularizes the convolutional neural network (CNN) baseline to allow more discriminative features to be learned to represent the properties of different writing styles. The experimental results on well-known benchmark datasets (ICDAR2013 and CVL) showed that our proposed semi-supervised feature learning approach significantly improves the baseline measurement and perform competitively with existing writer identification approaches. Our findings provide new insights into offline writer identification.	en_US
dc.relation.ispartof	Information Sciences	en_US
dc.relation.isbasedon	10.1016/j.ins.2019.01.024	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Semi-supervised feature learning for improving writer identification	en_US
dc.type	Journal Article
utslib.citation.volume	482	en_US
utslib.for	01 Mathematical Sciences	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access	*
pubs.publication-status	Published	en_US
pubs.volume	482	en_US

Abstract:

© 2019 Elsevier Inc. Data augmentation is typically used by supervised feature learning approaches for offline writer identification, but such approaches require a mass of additional training data and potentially lead to overfitting errors. In this study, a semi-supervised feature learning pipeline is proposed to improve the performance of writer identification by training with extra unlabeled data and the original labeled data simultaneously. Specifically, we propose a weighted label smoothing regularization (WLSR) method for data augmentation, which assigns a weighted uniform label distribution to the extra unlabeled data. The WLSR method regularizes the convolutional neural network (CNN) baseline to allow more discriminative features to be learned to represent the properties of different writing styles. The experimental results on well-known benchmark datasets (ICDAR2013 and CVL) showed that our proposed semi-supervised feature learning approach significantly improves the baseline measurement and perform competitively with existing writer identification approaches. Our findings provide new insights into offline writer identification.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/140060