Persian named entity recognition with structural prediction methods
- Publisher:
- Walter de Gruyter GmbH
- Publication Type:
- Chapter
- Citation:
- Persian Computational Linguistics and NLP, 2023, 2, pp. 149-184
- Issue Date:
- 2023-05-22
Closed Access
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
Named-entity recognition (NER) is still a challenging task for a number of languages due to the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. The ArmanPersoNERCorpus, the first manually-annotated Persian NER corpus that we have released in 2018, has enabled supervised prediction of named entities in the Persian language. Tackling NER as a sequential labelling task, in this chapter we compare the performance of a range of sequential classifiers on the aforementioned annotated corpus. The applied classifiers include a conventional CRF, the SVM-HMM, the Jordan recurrent neural network, the BiLSTM-CRF (a state-of-the-art deep learning architecture for the NER task), and the Flair neural language model. Moreover, the performance of each classifier has been assessed using various word embeddings as pre-trained feature vectors, including Hellinger PCA, CBOW, skip-gram, GloVe and fastText. The combination of the Flair model and the fastText word embeddings has achieved the highest CoNLL-F1 score of 78.59 % which outperforms the previous state-of-the-art on Persian NER by 1.14 percentage points.
Please use this identifier to cite or link to this item: