Statistical Features-Based Real-Time Detection of Drifted Twitter Spam

Chen, C; Wang, Y; Zhang, J; Xiang, Y; Zhou, W; Min, G

Statistical Features-Based Real-Time Detection of Drifted Twitter Spam

Chen, C Wang, Y Zhang, J Xiang, Y Zhou, W

Min, G

Permalink

Publisher:: Institute of Electrical and Electronics Engineers
Publication Type:: Journal Article
Citation:: IEEE Transactions on Information Forensics and Security, 2017, 12, (4), pp. 914-925
Issue Date:: 2017-04-01

Closed Access

	Filename	Description	Size
	Statistical_Features-Based_Real-Time_Detection_of_Drifted_Twitter_Spam.pdf		2.08 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Chen, C
dc.contributor.author	Wang, Y
dc.contributor.author	Zhang, J
dc.contributor.author	Xiang, Y
dc.contributor.author	Zhou, W https://orcid.org/0000-0002-1680-2521
dc.contributor.author	Min, G
dc.date.accessioned	2022-08-09T04:51:50Z
dc.date.available	2022-08-09T04:51:50Z
dc.date.issued	2017-04-01
dc.identifier.citation	IEEE Transactions on Information Forensics and Security, 2017, 12, (4), pp. 914-925
dc.identifier.issn	1556-6013
dc.identifier.issn	1556-6021
dc.identifier.uri	http://hdl.handle.net/10453/159802
dc.description.abstract	Twitter spam has become a critical problem nowadays. Recent works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. In our labeled tweets data set, however, we observe that the statistical properties of spam tweets vary over time, and thus, the performance of existing machine learning-based classifiers decreases. This issue is referred to as "Twitter Spam Drift". In order to tackle this problem, we first carry out a deep analysis on the statistical features of one million spam tweets and one million non-spam tweets, and then propose a novel Lfun scheme. The proposed scheme can discover "changed" spam tweets from unlabeled tweets and incorporate them into classifier's training process. A number of experiments are performed to evaluate the proposed scheme. The results show that our proposed Lfun scheme can significantly improve the spam detection accuracy in real-world scenarios.
dc.language	English
dc.publisher	Institute of Electrical and Electronics Engineers
dc.relation	http://purl.org/au-research/grants/arc/LP120200266
dc.relation.ispartof	IEEE Transactions on Information Forensics and Security
dc.relation.isbasedon	10.1109/TIFS.2016.2621888
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering
dc.subject.classification	Strategic, Defence & Security Studies
dc.title	Statistical Features-Based Real-Time Detection of Drifted Twitter Spam
dc.type	Journal Article
utslib.citation.volume	12
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2022-08-09T04:51:49Z
pubs.issue	4
pubs.publication-status	Published
pubs.volume	12
utslib.citation.issue	4

Abstract:

Twitter spam has become a critical problem nowadays. Recent works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. In our labeled tweets data set, however, we observe that the statistical properties of spam tweets vary over time, and thus, the performance of existing machine learning-based classifiers decreases. This issue is referred to as "Twitter Spam Drift". In order to tackle this problem, we first carry out a deep analysis on the statistical features of one million spam tweets and one million non-spam tweets, and then propose a novel Lfun scheme. The proposed scheme can discover "changed" spam tweets from unlabeled tweets and incorporate them into classifier's training process. A number of experiments are performed to evaluate the proposed scheme. The results show that our proposed Lfun scheme can significantly improve the spam detection accuracy in real-world scenarios.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/159802