BILSTM-CRF for Persian named-entity recognition armanpersonercorpus: The first entity-annotated Persian dataset

Poostchi, H; Borzeshi, EZ; Piccardi, M

BILSTM-CRF for Persian named-entity recognition armanpersonercorpus: The first entity-annotated Persian dataset

Poostchi, H

Borzeshi, EZ Piccardi, M

Permalink

Publication Type:: Conference Proceeding
Citation:: LREC 2018 - 11th International Conference on Language Resources and Evaluation, 2019, pp. 4427 - 4431
Issue Date:: 2019-01-01

Closed Access

	Filename	Description	Size
	e2e07662a055d77997b617394f02d66f7eb7.pdf	Published version	353.4 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Poostchi, H https://orcid.org/0000-0002-8725-2619	en_US
dc.contributor.author	Borzeshi, EZ	en_US
dc.contributor.author	Piccardi, M https://orcid.org/0000-0001-9250-6604	en_US
dc.date.available	2017-12-13	en_US
dc.date.issued	2019-01-01	en_US
dc.identifier.citation	LREC 2018 - 11th International Conference on Language Resources and Evaluation, 2019, pp. 4427 - 4431	en_US
dc.identifier.isbn	9791095546009	en_US
dc.identifier.uri	http://hdl.handle.net/10453/128468
dc.description.abstract	© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved. Named-entity recognition (NER) can still be regarded as work in progress for a number of Asian languages due to the scarcity of annotated corpora. For this reason, with this paper we publicly release an entity-annotated Persian dataset and we present a performing approach for Persian NER based on a deep learning architecture. In addition to the entity-annotated dataset, we release a number of word embeddings (including GloVe, skip-gram, CBOW and Hellinger PCA) trained on a sizable collation of Persian text. The combination of the deep learning architecture (a BiLSTM-CRF) and the pre-trained word embeddings has allowed us to achieve a 77.45% CoNLL F1 score, a result that is more than 12 percentage points higher than the best previous result and interesting in absolute terms.	en_US
dc.relation.ispartof	LREC 2018 - 11th International Conference on Language Resources and Evaluation	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	BILSTM-CRF for Persian named-entity recognition armanpersonercorpus: The first entity-annotated Persian dataset	en_US
dc.type	Conference Proceeding
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
utslib.copyright.status	closed_access	*
pubs.publication-status	Published	en_US

Abstract:

© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved. Named-entity recognition (NER) can still be regarded as work in progress for a number of Asian languages due to the scarcity of annotated corpora. For this reason, with this paper we publicly release an entity-annotated Persian dataset and we present a performing approach for Persian NER based on a deep learning architecture. In addition to the entity-annotated dataset, we release a number of word embeddings (including GloVe, skip-gram, CBOW and Hellinger PCA) trained on a sizable collation of Persian text. The combination of the deep learning architecture (a BiLSTM-CRF) and the pre-trained word embeddings has allowed us to achieve a 77.45% CoNLL F1 score, a result that is more than 12 percentage points higher than the best previous result and interesting in absolute terms.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/128468