Informative pseudo-labeling for graph neural networks with few labels

Publisher:
Springer Nature
Publication Type:
Journal Article
Citation:
Data Mining and Knowledge Discovery, 2023, 37, (1), pp. 228-254
Issue Date:
2023-01-01
Full metadata record
Graph neural networks (GNNs) have achieved state-of-the-art results for semi-supervised node classification on graphs. Nevertheless, the challenge of how to effectively learn GNNs with very few labels is still under-explored. As one of the prevalent semi-supervised methods, pseudo-labeling has been proposed to explicitly address the label scarcity problem. It is the process of augmenting the training set with pseudo-labeled unlabeled nodes to retrain a model in a self-training cycle. However, the existing pseudo-labeling approaches often suffer from two major drawbacks. First, these methods conservatively expand the label set by selecting only high-confidence unlabeled nodes without assessing their informativeness. Second, these methods incorporate pseudo-labels to the same loss function with genuine labels, ignoring their distinct contributions to the classification task. In this paper, we propose a novel informative pseudo-labeling framework (InfoGNN) to facilitate learning of GNNs with very few labels. Our key idea is to pseudo-label the most informative nodes that can maximally represent the local neighborhoods via mutual information maximization. To mitigate the potential label noise and class-imbalance problem arising from pseudo-labeling, we also carefully devise a generalized cross entropy with a class-balanced regularization to incorporate pseudo-labels into model retraining. Extensive experiments on six real-world graph datasets validate that our proposed approach significantly outperforms state-of-the-art baselines and competitive self-supervised methods on graphs.
Please use this identifier to cite or link to this item: