Learning and representing attributed graphs

Publication Type:
Thesis
Issue Date:
2019
Full metadata record
Information graphs are ubiquitous in many areas, such as medicine, social media and academic engines, and each node in the graph comes with various attributes. For example, in a academic citation graph, we can take each paper a node, then the author(s) and title of each paper can be extracted as the attributes of the node. Moreover, papers, authors as well as venues can be taken as different sources of nodes in one information graph. By doing so, we have got a heterogeneous information graph with more than one sources of nodes, attributes and links. To implement these applications, such as identifying protein residues and social media marketing, graph representation of homogeneous information graphs has been widely researched and employed. This research, aims to embed and represent homogeneous nodes with low-dimensional and unified vectors, while preserving the contextual information between nodes, and, as a result, classical machine learning methods can be directly applied. However, existing graph embedding algorithms are facing five major challenges: 1.the graph representation learning and node classification in graphs are separated into two steps, which may result in sub-optimal results because the node representation may not fit the classification model well; 2. existing ones are mostly shallow methods that can only capture the linear and simple relationships in the data; 3. Ignoring the data distribution of the latent codes from the graphs, which often results in inferior embedding in real-world graph data; 4. unable to handle the heterogeneous and multi-relational information graph which is the major form that graph data existed in the real-world; and 5. unable to effectively discover functional groups and understand the roles of detected groups. To face the aforementioned challenges, the main research objective of the thesis is to study that how to more effectively embed the nodes of a graph into a compact space for the tasks which are most related to the real-world applications. The main research objective has been studied from four coherently linked perspectives: (1) How to unify the traditional two-step embedding work-flow into one smooth embedding procedure to avoid the inconsistency between the embedding architecture and classifier; (2) How to learn a universal embedding for all sources of nodes in a graph, so one single embedding can be used to represent the entire heterogeneous information graph; (3) How to smoothly regularize the embedding with a certain distribution during the learning procedure for a more robust embedding; (4) How to automatically generate a human-understandable explanation of each cluster of nodes in the graph and applied the algorithm in the real business world. Specifically, this thesis aims to tackle aforementioned challenges by conducting studies of graph ladder network to unifies both representation and classifier model learning into one framework; developing universal graph representation to represent different types of nodes in heterogeneous information graph in a continuous and common vector space; introducing generative adversarial scheme into graph domain to encode the topological structure and node content in a graph to a compact representation, on which a decoder is trained to reconstruct the graph structure under an adversarial training scheme and carrying out co-clustering on enterprise information graph for functional group discovery and understanding. All works in this thesis are validated with related tasks like graph classification, graph clustering, graph visualization and link prediction respectively.
Please use this identifier to cite or link to this item: