Augmented network embedding in attributed graphs

Zhang, Daokun

Augmented network embedding in attributed graphs

Zhang, Daokun

Permalink

Publication Type:: Thesis
Issue Date:: 2019

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (944.49 kB)

Adobe PDF

Download thesisAdobe PDF (2.75 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Daokun
dc.date.accessioned	2019-11-29T01:53:05Z
dc.date.available	2019-11-29T01:53:05Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/10453/137073
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	With the widespread use of information technologies, information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, telecommunication networks, and biological networks. Analyzing these networks sheds light on different aspects of social life, such as the structure of societies, information diffusion, and communication patterns. In reality, however, the large scale of information networks often makes network analytic tasks computationally expensive or intractable. Network embedding has been recently proposed as a new learning paradigm to embed network nodes into a low-dimensional vector space. This facilitates the original network to be easily handled in the new vector space for further analysis. Existing research on network embedding mainly focuses on capturing the structure relatedness in the embedding space, while ignores the important information carried by the widely existing node attributes and labels, which limited the network embedding performance significantly. In this thesis, we dealt with the research problem of augmented network embedding in attributed graphs that aims to learn informative node vector-format representations by augmenting network topology structure with node content attributes and node labels if available. We summarized four research challenges in augmented network embedding: (1) 𝘩𝘦𝘵𝘦𝘳𝘰𝘨𝘦𝘯𝘦𝘪𝘵𝘺 caused by the discrepancy between network structure and node attributes/labels; (2) 𝘥𝘢𝘵𝘢 𝘴𝘱𝘢𝘳𝘴𝘪𝘵𝘺 in network structure and node attributes/labels; (3) 𝘴𝘤𝘢𝘭𝘢𝘣𝘪𝘭𝘪𝘵𝘺 for handling large-scale networks; (4) 𝘵𝘢𝘴𝘬 𝘰𝘳𝘪𝘦𝘯𝘵𝘢𝘵𝘪𝘰𝘯 for directly benefiting specific network analytic tasks. To overcome the above challenges, we proposed a series of augmented network embedding algorithms in this thesis. To handle the 𝘩𝘦𝘵𝘦𝘳𝘰𝘨𝘦𝘯𝘦𝘪𝘵𝘺 challenge, we proposed the HSCA algorithm that effectively encodes the similarity measured by homophily, structural context and node content attributes into a unified node representation through the regularized inductive matrix factorization. The attri2vec algorithm was then proposed to address the 𝘩𝘦𝘵𝘦𝘳𝘰𝘨𝘦𝘯𝘦𝘪𝘵𝘺 and 𝘥𝘢𝘵𝘢 𝘴𝘱𝘢𝘳𝘴𝘪𝘵𝘺 challenges, in which node representations are learned by discovering an attribute subspace that better respects network structure. For handling large-scale incomplete networks, we proposed the SINE algorithm that learns node representations by simultaneously modeling node-neighbor and node-attribute relations through a three-layer neural network, with an efficient Stochastic Gradient Descent based online learning strategy. The above three augmented network embedding algorithms only augment network structure with node content attributes, with the purpose to obtain more informative network representations. They are unsupervised, task-general and incapable of directly benefiting specific tasks. To seamlessly integrate network embedding with network analytic tasks, we proposed two task-orientated network embedding algorithms. For collective classification on sparsely labeled networks, we proposed the discriminative attributed network embedding algorithm DMF that integrates network embedding with an empirical loss minimization for classifying node labels, with the purpose of simultaneously exerting the discriminative power of node labels and informativeness of node representations. For searching similar nodes efficiently on large-scale networks, BinaryNE was proposed to learn binary node representations from network structure and node content attributes so that node similarity search can be efficiently done through the fast bitwise Hamming distance calculation performed on the learned binary node representations. To verify the effectiveness of the proposed algorithms, extensive experiments were carried out on nine real-world attributed networks, showing the advantage of the proposed algorithms over state-of-the-art baselines.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/137073/2/02whole.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.subject	Augmented network embedding.
dc.subject	Attributed graph.
dc.subject	Information network.
dc.subject	Large-scale network.
dc.subject	Network analytics.
dc.subject	Vector space.
dc.subject	Network structure.
dc.subject	Node label.
dc.subject	Node content attribute.
dc.subject	Node representation.
dc.subject	Algorithm.
dc.subject	Stochastic gradient descent.
dc.subject	BinaryNE.
dc.title	Augmented network embedding in attributed graphs	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

With the widespread use of information technologies, information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, telecommunication networks, and biological networks. Analyzing these networks sheds light on different aspects of social life, such as the structure of societies, information diffusion, and communication patterns. In reality, however, the large scale of information networks often makes network analytic tasks computationally expensive or intractable. Network embedding has been recently proposed as a new learning paradigm to embed network nodes into a low-dimensional vector space. This facilitates the original network to be easily handled in the new vector space for further analysis. Existing research on network embedding mainly focuses on capturing the structure relatedness in the embedding space, while ignores the important information carried by the widely existing node attributes and labels, which limited the network embedding performance significantly. In this thesis, we dealt with the research problem of augmented network embedding in attributed graphs that aims to learn informative node vector-format representations by augmenting network topology structure with node content attributes and node labels if available. We summarized four research challenges in augmented network embedding: (1) 𝘩𝘦𝘵𝘦𝘳𝘰𝘨𝘦𝘯𝘦𝘪𝘵𝘺 caused by the discrepancy between network structure and node attributes/labels; (2) 𝘥𝘢𝘵𝘢 𝘴𝘱𝘢𝘳𝘴𝘪𝘵𝘺 in network structure and node attributes/labels; (3) 𝘴𝘤𝘢𝘭𝘢𝘣𝘪𝘭𝘪𝘵𝘺 for handling large-scale networks; (4) 𝘵𝘢𝘴𝘬 𝘰𝘳𝘪𝘦𝘯𝘵𝘢𝘵𝘪𝘰𝘯 for directly benefiting specific network analytic tasks. To overcome the above challenges, we proposed a series of augmented network embedding algorithms in this thesis. To handle the 𝘩𝘦𝘵𝘦𝘳𝘰𝘨𝘦𝘯𝘦𝘪𝘵𝘺 challenge, we proposed the HSCA algorithm that effectively encodes the similarity measured by homophily, structural context and node content attributes into a unified node representation through the regularized inductive matrix factorization. The attri2vec algorithm was then proposed to address the 𝘩𝘦𝘵𝘦𝘳𝘰𝘨𝘦𝘯𝘦𝘪𝘵𝘺 and 𝘥𝘢𝘵𝘢 𝘴𝘱𝘢𝘳𝘴𝘪𝘵𝘺 challenges, in which node representations are learned by discovering an attribute subspace that better respects network structure. For handling large-scale incomplete networks, we proposed the SINE algorithm that learns node representations by simultaneously modeling node-neighbor and node-attribute relations through a three-layer neural network, with an efficient Stochastic Gradient Descent based online learning strategy. The above three augmented network embedding algorithms only augment network structure with node content attributes, with the purpose to obtain more informative network representations. They are unsupervised, task-general and incapable of directly benefiting specific tasks. To seamlessly integrate network embedding with network analytic tasks, we proposed two task-orientated network embedding algorithms. For collective classification on sparsely labeled networks, we proposed the discriminative attributed network embedding algorithm DMF that integrates network embedding with an empirical loss minimization for classifying node labels, with the purpose of simultaneously exerting the discriminative power of node labels and informativeness of node representations. For searching similar nodes efficiently on large-scale networks, BinaryNE was proposed to learn binary node representations from network structure and node content attributes so that node similarity search can be efficiently done through the fast bitwise Hamming distance calculation performed on the learned binary node representations. To verify the effectiveness of the proposed algorithms, extensive experiments were carried out on nine real-world attributed networks, showing the advantage of the proposed algorithms over state-of-the-art baselines.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/137073