Persian named entity recognition with structural prediction methods

Poostchi, H; Piccardi, M

Persian named entity recognition with structural prediction methods

Poostchi, H Piccardi, M

Permalink

Publisher:: Walter de Gruyter GmbH
Publication Type:: Chapter
Citation:: Persian Computational Linguistics and NLP, 2023, 2, pp. 149-184
Issue Date:: 2023-05-22

Closed Access

	Filename	Description	Size
	M.pdf		1.14 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Poostchi, H
dc.contributor.author	Piccardi, M https://orcid.org/0000-0001-9250-6604
dc.date.accessioned	2023-11-14T23:16:57Z
dc.date.available	2023-11-14T23:16:57Z
dc.date.issued	2023-05-22
dc.identifier.citation	Persian Computational Linguistics and NLP, 2023, 2, pp. 149-184
dc.identifier.isbn	9783110616545
dc.identifier.uri	http://hdl.handle.net/10453/173415
dc.description.abstract	Named-entity recognition (NER) is still a challenging task for a number of languages due to the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. The ArmanPersoNERCorpus, the first manually-annotated Persian NER corpus that we have released in 2018, has enabled supervised prediction of named entities in the Persian language. Tackling NER as a sequential labelling task, in this chapter we compare the performance of a range of sequential classifiers on the aforementioned annotated corpus. The applied classifiers include a conventional CRF, the SVM-HMM, the Jordan recurrent neural network, the BiLSTM-CRF (a state-of-the-art deep learning architecture for the NER task), and the Flair neural language model. Moreover, the performance of each classifier has been assessed using various word embeddings as pre-trained feature vectors, including Hellinger PCA, CBOW, skip-gram, GloVe and fastText. The combination of the Flair model and the fastText word embeddings has achieved the highest CoNLL-F1 score of 78.59 % which outperforms the previous state-of-the-art on Persian NER by 1.14 percentage points.
dc.format.extent	9
dc.language	en
dc.publisher	Walter de Gruyter GmbH
dc.relation.ispartof	Persian Computational Linguistics and NLP
dc.relation.isbasedon	10.1515/9783110619225-006
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Persian named entity recognition with structural prediction methods
dc.type	Chapter
utslib.citation.volume	2
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2023-11-14T23:16:55Z
pubs.place-of-publication	Berlin
pubs.publication-status	Published
pubs.volume	2
dc.location	Berlin

Abstract:

Named-entity recognition (NER) is still a challenging task for a number of languages due to the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. The ArmanPersoNERCorpus, the first manually-annotated Persian NER corpus that we have released in 2018, has enabled supervised prediction of named entities in the Persian language. Tackling NER as a sequential labelling task, in this chapter we compare the performance of a range of sequential classifiers on the aforementioned annotated corpus. The applied classifiers include a conventional CRF, the SVM-HMM, the Jordan recurrent neural network, the BiLSTM-CRF (a state-of-the-art deep learning architecture for the NER task), and the Flair neural language model. Moreover, the performance of each classifier has been assessed using various word embeddings as pre-trained feature vectors, including Hellinger PCA, CBOW, skip-gram, GloVe and fastText. The combination of the Flair model and the fastText word embeddings has achieved the highest CoNLL-F1 score of 78.59 % which outperforms the previous state-of-the-art on Persian NER by 1.14 percentage points.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/173415