Homophily, Structure, and Content Augmented Network Representation Learning
- Publication Type:
- Conference Proceeding
- Proceedings of the IEEE 16th International Conference on Data Mining, 2016, pp. 609 - 618
- Issue Date:
Advances in social networking and communication technologies have witnessed an increasing number of applications where data is not only characterized by rich content information, but also connected with complex relationships representing social roles and dependencies between individuals. To enable knowledge discovery from such networked data, network representation learning (NRL) aims to learn vector representations for network nodes, such that off-the-shelf machine learning algorithms can be directly applied. To date, existing NRL methods either primarily focus on network structure or simply combine node content and topology for learning. We argue that in information networks, information is mainly originated from three sources: (1) homophily, (2) topology structure, and (3) node content. Homophily states social phenomenon where individuals sharing similar attributes (content) tend to be directly connected through local relational ties, while topology structure emphasizes more on global connections. To ensure effective network representation learning, we propose to augment three information sources into one learning objective function, so that the interplay roles between three parties are enforced by requiring the learned network representations (1) being consistent with node content and topology structure, and also (2) following the social homophily constraints in the learned space. Experiments on multi-class node classification demonstrate that the representations learned by the proposed method consistently outperform state-of-the-art NRL methods, especially for very sparsely labeled networks.
Please use this identifier to cite or link to this item: