PersoNER: Persian Named-Entity Recognition

Publisher:
COLING
Publication Type:
Conference Proceeding
Citation:
Website Proceedings of the 26th International Conference on Computational Linguistics, 2016
Issue Date:
2016-12-11
Full metadata record
Files in This Item:
Filename Description Size
proceedings Coling 2016 - toc.pdfPublished version600.86 kB
Adobe PDF
cameraready Coling 2016.pdfAccepted Manuscript version507.31 kB
Adobe PDF
reviews Coling 2016.docxAccepted Manuscript version20.49 kB
Microsoft Word XML
Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network
Please use this identifier to cite or link to this item: