Learning Distilled Graph for Large-Scale Social Network Data Clustering

Publisher:
IEEE COMPUTER SOC
Publication Type:
Journal Article
Citation:
IEEE Transactions on Knowledge and Data Engineering, 2020, 32, (7), pp. 1393-1404
Issue Date:
2020-07-01
Filename Description Size
08663296.pdfPublished version1.54 MB
Adobe PDF
Full metadata record
© 1989-2012 IEEE. Spectral analysis is critical in social network analysis. As a vital step of the spectral analysis, the graph construction in many existing works utilizes content data only. Unfortunately, the content data often consists of noisy, sparse, and redundant features, which makes the resulting graph unstable and unreliable. In practice, besides the content data, social network data also contain link information, which provides additional information for graph construction. Some of previous works utilize the link data. However, the link data is often incomplete, which makes the resulting graph incomplete. To address these issues, we propose a novel Distilled Graph Clustering (DGC) method. It pursuits a distilled graph based on both the content data and the link data. The proposed algorithm alternates between two steps: in the feature selection step, it finds the most representative feature subset w.r.t. an intermediate graph initialized with link data; in graph distillation step, the proposed method updates and refines the graph based on only the selected features. The final resulting graph, which is referred to as the distilled graph, is then utilized for spectral clustering on the large-scale social network data. Extensive experiments demonstrate the superiority of the proposed method.
Please use this identifier to cite or link to this item: