Two-layer Sampling Active Learning Algorithm for Social Spammer Detection

Tan, K; Gao, M; Li, WT; Tian, RL; Wen, JH; Xiong, QY

Two-layer Sampling Active Learning Algorithm for Social Spammer Detection

Tan, K Gao, M Li, WT

Tian, RL Wen, JH Xiong, QY

Permalink

Publication Type:: Journal Article
Citation:: Zidonghua Xuebao/Acta Automatica Sinica, 2017, 43 (3), pp. 448 - 461
Issue Date:: 2017-03-01

Closed Access

	Filename	Description	Size
	»ùÓÚË«²ã²ÉÑùÖ÷¶¯Ñ§Ï°µÄÉç½»ÍøÂçÐé¼ÙÓÃ»§¼ì²â·½·¨_Ì·Ù© (1).pdf	Published Version	1.08 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Tan, K	en_US
dc.contributor.author	Gao, M	en_US
dc.contributor.author	Li, WT https://orcid.org/0000-0003-4941-8814	en_US
dc.contributor.author	Tian, RL	en_US
dc.contributor.author	Wen, JH	en_US
dc.contributor.author	Xiong, QY	en_US
dc.date.issued	2017-03-01	en_US
dc.identifier.citation	Zidonghua Xuebao/Acta Automatica Sinica, 2017, 43 (3), pp. 448 - 461	en_US
dc.identifier.issn	0254-4156	en_US
dc.identifier.uri	http://hdl.handle.net/10453/124764
dc.description.abstract	Copyright © 2017 Acta Automatica Sinica. All rights reserved. With the rapid development of social network, more and more people join in social network to make friends and share their views. However, social network is always suffering from fake accounts due to its openness. Fake accounts, also called spammers, always spread spam information to achieve their own purpose, which have destroyed the security and reliability of social network. Existing detection methods extract behaviour, text and relationship features of users, and then use machine learning algorithms to identify social spammers. But machine learning algorithms often suffer from insufficiently labeled training data. Aiming to solve this problem, we propose an efficient algorithm, called two-layer sampling active learning, to construct an accurate classifier with minimum labeled samples. We present three criteria (uncertainty, representative and diversity) to quantity the value of unlabeled samples, using the combination of sorting and clustering to actively select samples with max uncertainty, max representative and max diversity. Experimental results on Twitter, Apontador, and Youtube datasets prove the efficiency of our approach, and better precision and recall of our approach than other active learning methods.	en_US
dc.relation.ispartof	Zidonghua Xuebao/Acta Automatica Sinica	en_US
dc.relation.isbasedon	10.16383/j.aas.2017.c160308	en_US
dc.subject.classification	Industrial Engineering & Automation	en_US
dc.title	Two-layer Sampling Active Learning Algorithm for Social Spammer Detection	en_US
dc.type	Journal Article
utslib.citation.volume	3	en_US
utslib.citation.volume	43	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0102 Applied Mathematics	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.issue	3	en_US
pubs.publication-status	Published	en_US
pubs.volume	43	en_US

Abstract:

Copyright © 2017 Acta Automatica Sinica. All rights reserved. With the rapid development of social network, more and more people join in social network to make friends and share their views. However, social network is always suffering from fake accounts due to its openness. Fake accounts, also called spammers, always spread spam information to achieve their own purpose, which have destroyed the security and reliability of social network. Existing detection methods extract behaviour, text and relationship features of users, and then use machine learning algorithms to identify social spammers. But machine learning algorithms often suffer from insufficiently labeled training data. Aiming to solve this problem, we propose an efficient algorithm, called two-layer sampling active learning, to construct an accurate classifier with minimum labeled samples. We present three criteria (uncertainty, representative and diversity) to quantity the value of unlabeled samples, using the combination of sorting and clustering to actively select samples with max uncertainty, max representative and max diversity. Experimental results on Twitter, Apontador, and Youtube datasets prove the efficiency of our approach, and better precision and recall of our approach than other active learning methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/124764