Learning and representing attributed graphs

Hu, Ruiqi

Learning and representing attributed graphs

Hu, Ruiqi

Permalink

Publication Type:: Thesis
Issue Date:: 2019

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (266.22 kB)

Adobe PDF

Download thesisAdobe PDF (4.37 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Hu, Ruiqi
dc.date.accessioned	2019-06-24T02:25:18Z
dc.date.available	2019-06-24T02:25:18Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/10453/134117
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	Information graphs are ubiquitous in many areas, such as medicine, social media and academic engines, and each node in the graph comes with various attributes. For example, in a academic citation graph, we can take each paper a node, then the author(s) and title of each paper can be extracted as the attributes of the node. Moreover, papers, authors as well as venues can be taken as different sources of nodes in one information graph. By doing so, we have got a heterogeneous information graph with more than one sources of nodes, attributes and links. To implement these applications, such as identifying protein residues and social media marketing, graph representation of homogeneous information graphs has been widely researched and employed. This research, aims to embed and represent homogeneous nodes with low-dimensional and unified vectors, while preserving the contextual information between nodes, and, as a result, classical machine learning methods can be directly applied. However, existing graph embedding algorithms are facing five major challenges: 1.the graph representation learning and node classification in graphs are separated into two steps, which may result in sub-optimal results because the node representation may not fit the classification model well; 2. existing ones are mostly shallow methods that can only capture the linear and simple relationships in the data; 3. Ignoring the data distribution of the latent codes from the graphs, which often results in inferior embedding in real-world graph data; 4. unable to handle the heterogeneous and multi-relational information graph which is the major form that graph data existed in the real-world; and 5. unable to effectively discover functional groups and understand the roles of detected groups. To face the aforementioned challenges, the main research objective of the thesis is to study that how to more effectively embed the nodes of a graph into a compact space for the tasks which are most related to the real-world applications. The main research objective has been studied from four coherently linked perspectives: (1) How to unify the traditional two-step embedding work-flow into one smooth embedding procedure to avoid the inconsistency between the embedding architecture and classifier; (2) How to learn a universal embedding for all sources of nodes in a graph, so one single embedding can be used to represent the entire heterogeneous information graph; (3) How to smoothly regularize the embedding with a certain distribution during the learning procedure for a more robust embedding; (4) How to automatically generate a human-understandable explanation of each cluster of nodes in the graph and applied the algorithm in the real business world. Specifically, this thesis aims to tackle aforementioned challenges by conducting studies of graph ladder network to unifies both representation and classifier model learning into one framework; developing universal graph representation to represent different types of nodes in heterogeneous information graph in a continuous and common vector space; introducing generative adversarial scheme into graph domain to encode the topological structure and node content in a graph to a compact representation, on which a decoder is trained to reconstruct the graph structure under an adversarial training scheme and carrying out co-clustering on enterprise information graph for functional group discovery and understanding. All works in this thesis are validated with related tasks like graph classification, graph clustering, graph visualization and link prediction respectively.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/134117/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Learning and representing attributed graphs	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

Information graphs are ubiquitous in many areas, such as medicine, social media and academic engines, and each node in the graph comes with various attributes. For example, in a academic citation graph, we can take each paper a node, then the author(s) and title of each paper can be extracted as the attributes of the node. Moreover, papers, authors as well as venues can be taken as different sources of nodes in one information graph. By doing so, we have got a heterogeneous information graph with more than one sources of nodes, attributes and links. To implement these applications, such as identifying protein residues and social media marketing, graph representation of homogeneous information graphs has been widely researched and employed. This research, aims to embed and represent homogeneous nodes with low-dimensional and unified vectors, while preserving the contextual information between nodes, and, as a result, classical machine learning methods can be directly applied. However, existing graph embedding algorithms are facing five major challenges: 1.the graph representation learning and node classification in graphs are separated into two steps, which may result in sub-optimal results because the node representation may not fit the classification model well; 2. existing ones are mostly shallow methods that can only capture the linear and simple relationships in the data; 3. Ignoring the data distribution of the latent codes from the graphs, which often results in inferior embedding in real-world graph data; 4. unable to handle the heterogeneous and multi-relational information graph which is the major form that graph data existed in the real-world; and 5. unable to effectively discover functional groups and understand the roles of detected groups. To face the aforementioned challenges, the main research objective of the thesis is to study that how to more effectively embed the nodes of a graph into a compact space for the tasks which are most related to the real-world applications. The main research objective has been studied from four coherently linked perspectives: (1) How to unify the traditional two-step embedding work-flow into one smooth embedding procedure to avoid the inconsistency between the embedding architecture and classifier; (2) How to learn a universal embedding for all sources of nodes in a graph, so one single embedding can be used to represent the entire heterogeneous information graph; (3) How to smoothly regularize the embedding with a certain distribution during the learning procedure for a more robust embedding; (4) How to automatically generate a human-understandable explanation of each cluster of nodes in the graph and applied the algorithm in the real business world. Specifically, this thesis aims to tackle aforementioned challenges by conducting studies of graph ladder network to unifies both representation and classifier model learning into one framework; developing universal graph representation to represent different types of nodes in heterogeneous information graph in a continuous and common vector space; introducing generative adversarial scheme into graph domain to encode the topological structure and node content in a graph to a compact representation, on which a decoder is trained to reconstruct the graph structure under an adversarial training scheme and carrying out co-clustering on enterprise information graph for functional group discovery and understanding. All works in this thesis are validated with related tasks like graph classification, graph clustering, graph visualization and link prediction respectively.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/134117