Persian named entity recognition with structural prediction methods

Publisher:
Walter de Gruyter GmbH
Publication Type:
Chapter
Citation:
Persian Computational Linguistics and NLP, 2023, 2, pp. 149-184
Issue Date:
2023-05-22
Filename Description Size
M.pdf1.14 MB
Adobe PDF
Full metadata record
Named-entity recognition (NER) is still a challenging task for a number of languages due to the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. The ArmanPersoNERCorpus, the first manually-annotated Persian NER corpus that we have released in 2018, has enabled supervised prediction of named entities in the Persian language. Tackling NER as a sequential labelling task, in this chapter we compare the performance of a range of sequential classifiers on the aforementioned annotated corpus. The applied classifiers include a conventional CRF, the SVM-HMM, the Jordan recurrent neural network, the BiLSTM-CRF (a state-of-the-art deep learning architecture for the NER task), and the Flair neural language model. Moreover, the performance of each classifier has been assessed using various word embeddings as pre-trained feature vectors, including Hellinger PCA, CBOW, skip-gram, GloVe and fastText. The combination of the Flair model and the fastText word embeddings has achieved the highest CoNLL-F1 score of 78.59 % which outperforms the previous state-of-the-art on Persian NER by 1.14 percentage points.
Please use this identifier to cite or link to this item: