Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques

Raj, C; Agarwal, A; Bharathy, G; Narayan, B; Prasad, M

Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques

Raj, C

Agarwal, A Bharathy, G

Narayan, B

Prasad, M

Permalink

Publisher:: MDPI AG
Publication Type:: Journal Article
Citation:: Electronics (Switzerland), 2021, 10, (22), pp. 2810-2810
Issue Date:: 2021-11-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download Published versionAdobe PDF (5.75 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Raj, C https://orcid.org/0000-0003-0083-6812
dc.contributor.author	Agarwal, A
dc.contributor.author	Bharathy, G https://orcid.org/0000-0001-8384-9509
dc.contributor.author	Narayan, B https://orcid.org/0000-0001-8852-5589
dc.contributor.author	Prasad, M
dc.date.accessioned	2021-11-27T23:34:05Z
dc.date.available	2021-11-27T23:34:05Z
dc.date.issued	2021-11-01
dc.identifier.citation	Electronics (Switzerland), 2021, 10, (22), pp. 2810-2810
dc.identifier.issn	2079-9292
dc.identifier.uri	http://hdl.handle.net/10453/151889
dc.description.abstract	The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user‐generated content has made it challenging to iden-tify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several ad-vantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word‐embedding‐techniques‐based natural language processing on algorithmic per-formance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency‐Inverse Document Frequency (TF‐IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi‐GRU and Bi‐LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state‐of‐the‐art approaches for cyberbullying detection, with accuracy and F1‐scores as high as ~95% and ~98%, respectively.
dc.language	en
dc.publisher	MDPI AG
dc.relation.ispartof	Electronics (Switzerland)
dc.relation.isbasedon	10.3390/electronics10222810
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	0906 Electrical and Electronic Engineering
dc.title	Cyberbullying detection: Hybrid models based on machine learning and natural language processing techniques
dc.type	Journal Article
utslib.citation.volume	10
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Arts and Social Sciences
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/e-Press
pubs.organisational-group	/University of Technology Sydney/Faculty of Arts and Social Sciences/FASS Faculty Administration
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
dc.date.updated	2021-11-27T23:34:00Z
pubs.issue	22
pubs.publication-status	Published
pubs.volume	10
utslib.citation.issue	22

Abstract:

The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user‐generated content has made it challenging to iden-tify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several ad-vantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word‐embedding‐techniques‐based natural language processing on algorithmic per-formance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency‐Inverse Document Frequency (TF‐IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi‐GRU and Bi‐LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state‐of‐the‐art approaches for cyberbullying detection, with accuracy and F1‐scores as high as ~95% and ~98%, respectively.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/151889