Augmented network embedding in attributed graphs

Publication Type:
Issue Date:
Full metadata record
With the widespread use of information technologies, information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, telecommunication networks, and biological networks. Analyzing these networks sheds light on different aspects of social life, such as the structure of societies, information diffusion, and communication patterns. In reality, however, the large scale of information networks often makes network analytic tasks computationally expensive or intractable. Network embedding has been recently proposed as a new learning paradigm to embed network nodes into a low-dimensional vector space. This facilitates the original network to be easily handled in the new vector space for further analysis. Existing research on network embedding mainly focuses on capturing the structure relatedness in the embedding space, while ignores the important information carried by the widely existing node attributes and labels, which limited the network embedding performance significantly. In this thesis, we dealt with the research problem of augmented network embedding in attributed graphs that aims to learn informative node vector-format representations by augmenting network topology structure with node content attributes and node labels if available. We summarized four research challenges in augmented network embedding: (1) 𝘩𝘦𝘵𝘦𝘳𝘰𝘨𝘦𝘯𝘦𝘪𝘵𝘺 caused by the discrepancy between network structure and node attributes/labels; (2) 𝘥𝘢𝘵𝘢 𝘴𝘱𝘢𝘳𝘴𝘪𝘵𝘺 in network structure and node attributes/labels; (3) 𝘴𝘤𝘢𝘭𝘢𝘣𝘪𝘭𝘪𝘵𝘺 for handling large-scale networks; (4) 𝘵𝘢𝘴𝘬 𝘰𝘳𝘪𝘦𝘯𝘵𝘢𝘵𝘪𝘰𝘯 for directly benefiting specific network analytic tasks. To overcome the above challenges, we proposed a series of augmented network embedding algorithms in this thesis. To handle the 𝘩𝘦𝘵𝘦𝘳𝘰𝘨𝘦𝘯𝘦𝘪𝘵𝘺 challenge, we proposed the HSCA algorithm that effectively encodes the similarity measured by homophily, structural context and node content attributes into a unified node representation through the regularized inductive matrix factorization. The attri2vec algorithm was then proposed to address the 𝘩𝘦𝘵𝘦𝘳𝘰𝘨𝘦𝘯𝘦𝘪𝘵𝘺 and 𝘥𝘢𝘵𝘢 𝘴𝘱𝘢𝘳𝘴𝘪𝘵𝘺 challenges, in which node representations are learned by discovering an attribute subspace that better respects network structure. For handling large-scale incomplete networks, we proposed the SINE algorithm that learns node representations by simultaneously modeling node-neighbor and node-attribute relations through a three-layer neural network, with an efficient Stochastic Gradient Descent based online learning strategy. The above three augmented network embedding algorithms only augment network structure with node content attributes, with the purpose to obtain more informative network representations. They are unsupervised, task-general and incapable of directly benefiting specific tasks. To seamlessly integrate network embedding with network analytic tasks, we proposed two task-orientated network embedding algorithms. For collective classification on sparsely labeled networks, we proposed the discriminative attributed network embedding algorithm DMF that integrates network embedding with an empirical loss minimization for classifying node labels, with the purpose of simultaneously exerting the discriminative power of node labels and informativeness of node representations. For searching similar nodes efficiently on large-scale networks, BinaryNE was proposed to learn binary node representations from network structure and node content attributes so that node similarity search can be efficiently done through the fast bitwise Hamming distance calculation performed on the learned binary node representations. To verify the effectiveness of the proposed algorithms, extensive experiments were carried out on nine real-world attributed networks, showing the advantage of the proposed algorithms over state-of-the-art baselines.
Please use this identifier to cite or link to this item: