BILSTM-CRF for Persian named-entity recognition armanpersonercorpus: The first entity-annotated Persian dataset
- Publication Type:
- Conference Proceeding
- Citation:
- LREC 2018 - 11th International Conference on Language Resources and Evaluation, 2019, pp. 4427 - 4431
- Issue Date:
- 2019-01-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
e2e07662a055d77997b617394f02d66f7eb7.pdf | Published version | 353.4 kB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved. Named-entity recognition (NER) can still be regarded as work in progress for a number of Asian languages due to the scarcity of annotated corpora. For this reason, with this paper we publicly release an entity-annotated Persian dataset and we present a performing approach for Persian NER based on a deep learning architecture. In addition to the entity-annotated dataset, we release a number of word embeddings (including GloVe, skip-gram, CBOW and Hellinger PCA) trained on a sizable collation of Persian text. The combination of the deep learning architecture (a BiLSTM-CRF) and the pre-trained word embeddings has allowed us to achieve a 77.45% CoNLL F1 score, a result that is more than 12 percentage points higher than the best previous result and interesting in absolute terms.
Please use this identifier to cite or link to this item: