Two-layer Sampling Active Learning Algorithm for Social Spammer Detection

Publication Type:
Journal Article
Zidonghua Xuebao/Acta Automatica Sinica, 2017, 43 (3), pp. 448 - 461
Issue Date:
Full metadata record
Copyright © 2017 Acta Automatica Sinica. All rights reserved. With the rapid development of social network, more and more people join in social network to make friends and share their views. However, social network is always suffering from fake accounts due to its openness. Fake accounts, also called spammers, always spread spam information to achieve their own purpose, which have destroyed the security and reliability of social network. Existing detection methods extract behaviour, text and relationship features of users, and then use machine learning algorithms to identify social spammers. But machine learning algorithms often suffer from insufficiently labeled training data. Aiming to solve this problem, we propose an efficient algorithm, called two-layer sampling active learning, to construct an accurate classifier with minimum labeled samples. We present three criteria (uncertainty, representative and diversity) to quantity the value of unlabeled samples, using the combination of sorting and clustering to actively select samples with max uncertainty, max representative and max diversity. Experimental results on Twitter, Apontador, and Youtube datasets prove the efficiency of our approach, and better precision and recall of our approach than other active learning methods.
Please use this identifier to cite or link to this item: