BILSTM-CRF for Persian named-entity recognition armanpersonercorpus: The first entity-annotated Persian dataset
- Publication Type:
- Conference Proceeding
- LREC 2018 - 11th International Conference on Language Resources and Evaluation, 2019, pp. 4427 - 4431
- Issue Date:
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved. Named-entity recognition (NER) can still be regarded as work in progress for a number of Asian languages due to the scarcity of annotated corpora. For this reason, with this paper we publicly release an entity-annotated Persian dataset and we present a performing approach for Persian NER based on a deep learning architecture. In addition to the entity-annotated dataset, we release a number of word embeddings (including GloVe, skip-gram, CBOW and Hellinger PCA) trained on a sizable collation of Persian text. The combination of the deep learning architecture (a BiLSTM-CRF) and the pre-trained word embeddings has allowed us to achieve a 77.45% CoNLL F1 score, a result that is more than 12 percentage points higher than the best previous result and interesting in absolute terms.
Please use this identifier to cite or link to this item: