BILSTM-CRF for Persian named-entity recognition armanpersonercorpus: The first entity-annotated Persian dataset

Publication Type:
Conference Proceeding
LREC 2018 - 11th International Conference on Language Resources and Evaluation, 2019, pp. 4427 - 4431
Issue Date:
Filename Description Size
e2e07662a055d77997b617394f02d66f7eb7.pdfPublished version353.4 kB
Adobe PDF
Full metadata record
© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved. Named-entity recognition (NER) can still be regarded as work in progress for a number of Asian languages due to the scarcity of annotated corpora. For this reason, with this paper we publicly release an entity-annotated Persian dataset and we present a performing approach for Persian NER based on a deep learning architecture. In addition to the entity-annotated dataset, we release a number of word embeddings (including GloVe, skip-gram, CBOW and Hellinger PCA) trained on a sizable collation of Persian text. The combination of the deep learning architecture (a BiLSTM-CRF) and the pre-trained word embeddings has allowed us to achieve a 77.45% CoNLL F1 score, a result that is more than 12 percentage points higher than the best previous result and interesting in absolute terms.
Please use this identifier to cite or link to this item: