A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment

Gupta, BB; Yadav, K; Razzak, I; Psannis, K; Castiglione, A; Chang, X

A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment

Gupta, BB Yadav, K Razzak, I Psannis, K Castiglione, A Chang, X

Permalink

Publisher:: ELSEVIER
Publication Type:: Journal Article
Citation:: Computer Communications, 2021, 175, pp. 47-57
Issue Date:: 2021-07-01

Closed Access

	Filename	Description	Size
	1-s2.0-S0140366421001675-main.pdf	Published version	1.96 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Gupta, BB
dc.contributor.author	Yadav, K
dc.contributor.author	Razzak, I
dc.contributor.author	Psannis, K
dc.contributor.author	Castiglione, A
dc.contributor.author	Chang, X https://orcid.org/0000-0002-7778-8807
dc.date.accessioned	2022-02-24T04:42:05Z
dc.date.available	2022-02-24T04:42:05Z
dc.date.issued	2021-07-01
dc.identifier.citation	Computer Communications, 2021, 175, pp. 47-57
dc.identifier.issn	0140-3664
dc.identifier.issn	1873-703X
dc.identifier.uri	http://hdl.handle.net/10453/154833
dc.description.abstract	In recent times, we can see a massive increase in the number of devices that are being connected to the internet. These devices include but are not limited to smartphones, IoT, and cloud networks. In comparison to other possible cyber-attacks, these days, hackers are targeting these devices with phishing attacks since it exploits human vulnerabilities rather than system vulnerabilities. In a phishing attack, an online user is deceived by a seemingly trusted entity to give their personal data, i.e., login credentials or credit card details. When this private information is leaked to the hackers, this information becomes the source of other sophisticated attacks. In recent times many researchers have proposed the machine learning-based approach to solve phishing attacks; however, they have used a large number of features to develop reliable phishing detection techniques. A large number of features requires large processing powers to detect phishing, which makes it very much unsuitable for resource constrained devices. To address this issue, we have developed a phishing detection approach that only needs nine lexical features for effectively detecting phishing attacks. We used ISCXURL-2016 dataset for our experimental purpose, where 11964 instances of legitimate and phishing URLs are used. We have tested our approach against different machine learning classifiers and have obtained the highest accuracy of 99.57% with the Random forest algorithm.
dc.language	English
dc.publisher	ELSEVIER
dc.relation.ispartof	Computer Communications
dc.relation.isbasedon	10.1016/j.comcom.2021.04.023
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0805 Distributed Computing, 0906 Electrical and Electronic Engineering, 1005 Communications Technologies
dc.subject.classification	Networking & Telecommunications
dc.title	A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment
dc.type	Journal Article
utslib.citation.volume	175
utslib.for	0805 Distributed Computing
utslib.for	0906 Electrical and Electronic Engineering
utslib.for	1005 Communications Technologies
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2022-02-24T04:42:03Z
pubs.publication-status	Published
pubs.volume	175

Abstract:

In recent times, we can see a massive increase in the number of devices that are being connected to the internet. These devices include but are not limited to smartphones, IoT, and cloud networks. In comparison to other possible cyber-attacks, these days, hackers are targeting these devices with phishing attacks since it exploits human vulnerabilities rather than system vulnerabilities. In a phishing attack, an online user is deceived by a seemingly trusted entity to give their personal data, i.e., login credentials or credit card details. When this private information is leaked to the hackers, this information becomes the source of other sophisticated attacks. In recent times many researchers have proposed the machine learning-based approach to solve phishing attacks; however, they have used a large number of features to develop reliable phishing detection techniques. A large number of features requires large processing powers to detect phishing, which makes it very much unsuitable for resource constrained devices. To address this issue, we have developed a phishing detection approach that only needs nine lexical features for effectively detecting phishing attacks. We used ISCXURL-2016 dataset for our experimental purpose, where 11964 instances of legitimate and phishing URLs are used. We have tested our approach against different machine learning classifiers and have obtained the highest accuracy of 99.57% with the Random forest algorithm.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/154833