The Use of Stemming in the Arabic Text and Its Impact on the Accuracy of Classification

Atwan, J; Wedyan, M; Bsoul, Q; Hammadeen, A; Alturki, R

The Use of Stemming in the Arabic Text and Its Impact on the Accuracy of Classification

Atwan, J Wedyan, M

Bsoul, Q Hammadeen, A Alturki, R

Permalink

Publisher:: Hindawi
Publication Type:: Journal Article
Citation:: Scientific Programming, 2021, 2021, pp. 1-9
Issue Date:: 2021-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (1.41 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Atwan, J
dc.contributor.author	Wedyan, M https://orcid.org/0000-0001-6731-7246
dc.contributor.author	Bsoul, Q
dc.contributor.author	Hammadeen, A
dc.contributor.author	Alturki, R
dc.date.accessioned	2022-05-17T04:02:09Z
dc.date.available	2022-05-17T04:02:09Z
dc.date.issued	2021-01-01
dc.identifier.citation	Scientific Programming, 2021, 2021, pp. 1-9
dc.identifier.issn	1058-9244
dc.identifier.issn	1875-919X
dc.identifier.uri	http://hdl.handle.net/10453/157456
dc.description.abstract	The ongoing growth in the vast amount of digital documents and other data in the Arabic language available online has increased the need for classification methods that can deal with the complex nature of such data. The classification of Arabic plays a large and important role in many modern applications and interferes with other sciences, which start from search engines and do not end with the Internet of Things. However, addressing the Arab classification errors with high performance is largely insufficient to deal with the huge quantities to reveal the classification of Arab documents; while some work was tackled out on the classification of the Arabic text, most of the research has focused on English text. The methods proposed for English are not suitable for Arabic as the morphology of the two languages differs substantially. Moreover, morphologically, the preprocessing of Arabic text is a particularly challenging task. In this study, three commonly used classification algorithms, namely, the K-nearest neighbor, Naïve Bayes, and decision tree, were implemented for Arabic text in order to assess their effectiveness with and without the use of a light stemmer in the preprocessing phase. In the experiment, a dataset from Agency France Persse (AFP) Arabic Newswire 2001 consisting of four categories and 800 files was classified using the three classifiers. The result showed that the decision tree with light stemmer had the best accuracy rate for classification algorithm with 93%.
dc.language	en
dc.publisher	Hindawi
dc.relation.ispartof	Scientific Programming
dc.relation.isbasedon	10.1155/2021/1367210
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering
dc.subject.classification	Distributed Computing
dc.title	The Use of Stemming in the Arabic Text and Its Impact on the Accuracy of Classification
dc.type	Journal Article
utslib.citation.volume	2021
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Biomedical Engineering
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
dc.date.updated	2022-05-17T04:02:07Z
pubs.publication-status	Published
pubs.volume	2021

Abstract:

The ongoing growth in the vast amount of digital documents and other data in the Arabic language available online has increased the need for classification methods that can deal with the complex nature of such data. The classification of Arabic plays a large and important role in many modern applications and interferes with other sciences, which start from search engines and do not end with the Internet of Things. However, addressing the Arab classification errors with high performance is largely insufficient to deal with the huge quantities to reveal the classification of Arab documents; while some work was tackled out on the classification of the Arabic text, most of the research has focused on English text. The methods proposed for English are not suitable for Arabic as the morphology of the two languages differs substantially. Moreover, morphologically, the preprocessing of Arabic text is a particularly challenging task. In this study, three commonly used classification algorithms, namely, the K-nearest neighbor, Naïve Bayes, and decision tree, were implemented for Arabic text in order to assess their effectiveness with and without the use of a light stemmer in the preprocessing phase. In the experiment, a dataset from Agency France Persse (AFP) Arabic Newswire 2001 consisting of four categories and 800 files was classified using the three classifiers. The result showed that the decision tree with light stemmer had the best accuracy rate for classification algorithm with 93%.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/157456