Unsupervised Domain Adaptive Graph Convolutional Networks

Graph convolutional networks (GCNs) have achieved impressive success in many graph related analytics tasks. However, most GCNs only work in a single domain (graph) incapable of transferring knowledge from/to other domains (graphs), due to the challenges in both graph representation learning and domain adaptation over graph structures. In this paper, we present a novel approach, unsupervised domain adaptive graph convolutional networks (UDA-GCN), for domain adaptation learning for graphs. To enable effective graph representation learning, we first develop a dual graph convolutional network component, which jointly exploits local and global consistency for feature aggregation. An attention mechanism is further used to produce a unified representation for each node in different graphs. To facilitate knowledge transfer between graphs, we propose a domain adaptive learning module to optimize three different loss functions, namely source classifier loss, domain classifier loss, and target classifier loss as a whole, thus our model can differentiate class labels in the source domain, samples from different domains, the class labels from the target domain, respectively. Experimental results on real-world datasets in the node classification task validate the performance of our method, compared to state-of-the-art graph neural network algorithms.


INTRODUCTION
Node classification is an important yet challenging task in various network 1 applications, including social networks [2], proteinprotein interaction networks [11], and citation networks [18]. Many research efforts have been made in the past decade to develop reliable and efficient methods for node classification tasks [44]. However, most of existing methods mainly focus on graph representations for nodes from a single graph, and they have largely overlooked the generalisation of the classification model to a completely new graph. As a result, when the new graph is collected, even it is very similar to an existing graph, we have to relabel the nodes in the graph and rebuild a classifier model for the node classification task. The ineffectiveness of existing learning frameworks for graph data calls for transferable models that enable knowledge adapted from a source graph to a target graph.
Domain adaptation, which aims to support transfer learning from a source domain with sufficient label information to a target domain with plenty of unlabeled data by minimizing their domain discrepancy, has already attracted a lot of interests from the fields of Computer Vision [20] [21] and Natural Language Processing [16] [7]. However, applying domain adaptation to network analysis like classifying nodes across networks has not been sufficiently investigated. Given a source network having fully labeled nodes and a target network without any labeled data, the objective of unsupervised graph domain adaptation is to take advantage of the rich labeled information from the source network to help build an accurate node classifier for the target network. An example of unsupervised graph domain adaptation is illustrated in Figure 1.
Currently, most researches of domain adaptation concentrate on CV and NLP fields, which can not be directly applied on node classification problems. The reasons are twofold: First, these methods are usually designed for CV and NLP tasks, where samples (e.g. images and sequences) are independent and identically distributed, resulting in little requirement for model rotational invariance. However, network structured data, where nodes are connected with edges representing their relations, require models with rotational invariance because of the phenomenon known as graph isomorphism. Therefore, existing methods can not model network structural information, which is the core of node classification. Second, most existing domain adaptation models learn discriminative representations in a supervised manner, in which the value of loss function is only associated with each single sample's absolute position in their feature space. Network embedding for node classification, alternatively, usually aims to learn multipurpose representation in an unsupervised manner by preserving the relative position of all node pairs, resulting in increased difficulty in optimization.
Recently, there are some attempts to apply domain adaptation ideas for graph-structured data. The CDNE [31] algorithm learns transferable node embedding for cross network learning tasks by minimizing the maximum mean discrepancy (MMD) loss. However, it cannot jointly model network structures and node attributes, which limits its modeling capacity. To utilize the network structure for cross-network node classification, the AdaGCN algorithm [6] uses graph convolutional networks as a feature extractor to learn node representations, and utilize adversarial learning strategy to learn domain invariant node representation. Though it seems reasonable to exploit GCNs and adversarial learning jointly to enhance the performance of cross domain node classification for graphstructured data, these existing methods still cannot deal with three levels of challenges effectively below.
(1) In data structure level, many existing methods, graph convolutional networks (GCNs) [18] in particular, only consider the direct (the local consistency) neighbour nodes for knowledge embedding, the global consistency information has not been well investigated yet. In practice, the global consistency relationship is vitally important. For instance, in a real social network, each individual is a member of several communities and can be influenced by her/his neighborhoods with different distances around her/him, ranging from local consistency relationship (e.g. families, friends), to global consistency relationship (e.g. society, nation states). Thus global consistency relationship should also be exploited to obtain a comprehensive representation of the node for graph learning collaboratively.
(2) In representation learning level, most existing graph learning methods lean the node representation based on the local consistency relationship. However, as mentioned above, the global consistency relationship cannot be neglected. Thus, in our scenario, how to combine the local and global relationship to capture a comprehensive representation of the node is vitally important. Ideally, this should be done within the end-to-end learning framework. To address the above limitations, we propose an Unsupervised Domain Adaptive Graph Convolutional Networks (UDA-GCN) for cross-domain node classification by modeling the local and global consistency relationship of each graph, and combining source information, domain information and target information into a unified deep model. Our approach consists of three key components: (1) in data structure level, the local and global consistency relationship of each graph are utilized to assist in the training of node embedding module. (2) in representation learning level, an inter-graphed based attention mechanism is proposed to combine the local and global relationship for a comprehensive node representation of each domain.
(3) in domain adaptive learning level, we advocate a domain adaptive learning approach to exploit the source information, domain information and target information jointly, which can be utilized to learn domain-invariant and semantic representations effectively to reduce the domain discrepancy for cross-domain node classification. Empirical results on three public real datasets demonstrate that UDA-GCN outperforms the state-of-the-art cross-domain node classification methods. Our contributions can be summarized as follows: • We present a novel unsupervised graph domain adaptation problem, and propose an effective graph convolutional network algorithm to solve it. • We propose a novel method to integrate local and global consistency with an attention mechanism to learn effective node embedding across networks. • We design a new way to exploit source information and target information with different loss functions, so that domaininvariant and semantic representations can be effectively learned to reduce the domain discrepancy for cross-domain node classification. • We evaluate our method on real-world datasets, and the results demonstrate that the proposed model outperforms the baseline methods.

RELATED WORK
Our work is closely related to graph neural networks and cross domain classification. We briefly review these works in this section.

Graph Neural Networks
Network node representation generally aims to map nodes with higher proximities in a network closer to each other in the lowdimensional latent space, which is based on network topology structure only or with side information. For topology structure only embedding methods, most of existing works focused on preserving network structures and properties in embedding vectors [28] [33] [14]. Line [33] and SDNE [39] seek to preserve the firstorder and second-order proximities between nodes based on the first-order and second-order neighbors. DeepWalk [28] employs the random walk sampling strategy to generate the neighborhood of each node. Then, some deep learning approaches [4,30] have been employed to learn more similar feature representations for nodes which can more easily reach each other within K steps. Aside from topology structure only methods, many approaches are proposed to incorporate side information such as node features [25] [43] [46]. Recently, graph neural networks have attracted attention all around the world, which are designed to use deep learning architectures on graph-structured data [40,42,47,48]. Many solutions are proposed to generalize well-established neural network models that work on regular grid structure to deal with graphs with arbitrary structures [24,38,41]. GCN [18] is a deep convolutional learning paradigm for graph-structured data which integrates local node features and graph topology structure in convolutional layers. GAT [37] improves GCN by leveraging attention mechanism to aggregate features from the neighbors of a node with discrimination. However, most of existing methods mainly focus on learning representations for nodes from a single network. As a result, when transferring models across networks to handle the same task, they may suffer from the embedding space drift [8] and the embedding distribution discrepancy [35]. Moreover, most of these methods can only utilize the direct neighbourhood information (the local consistency relationship), while the high-order proximities which can capture the global consistency information are always be neglected [3].
Graph domain adaptation v.s. Inductive Learning. It is worthy to note that there are graph neural networks in recent years on learning inductive representation for node classification. Graph-SAGE [15], for example, presents different aggregation methods for feature extraction and can be applied to learn the embedding for nodes which are not in the training process. Unlike the inductive learning methods which only use the training set to train the model (the training data and testing data are separate), the domain adaptation approach is to feed the training data and testing data together into a network.
Different from the previous approaches, we focus on domain adaptation to implement the node classification task across two networks. Furthermore, we employ a dual graph convolutional networks to capture the local and global consistency relationship of each graph for node representation learning.

Cross-Domain classification
Domain adaptation is a subtopic of transfer learning, which aims to learn machine learning models transferable on different but relevant domains sharing same label space [23]. Many approaches are proposed for cross-domain classification, which can be roughly categorized into four groups: (1) Instance re-weighting approaches aim to identify the training samples in the source domain that are most relevant to the target domain by instance re-weighting and importance sampling. Then the re-weighted source instances are used for training a target domain model [16]. (2) Co-training methods bridge the gap between the source domain and the target domain by slowly adding target features and the most reliable examples of the current algorithm in the training set [5]. (3) Kernel methods explore multiple kernels to induce an optimal learning space and learn a kernel function and a robust classifier by minimizing the distribution mismatch between the labeled and unlabeled samples from the source and target domains [9]. (4) Feature representation based methods are designed to map different domains into a common shared space and contract their feature distributions as close as possible [7,20,32,50]. Among them, deep feature representation based methods have attracted a lot of attention in recent years due to its effectiveness. They can be categoried into three branches, i.e., discrepancy-based methods [20][36], reconstruction-based methods [49] [17], and adversarial-based methods [13][35] [27]. For crossdomain learning, many methods use an adversarial objective to reduce domain discrepancy [12,20]. Among which, the domain adversarial neural network (DANN) [13] learns domain invariant features by a minimax game between the domain classifier and the feature extractor, using a gradient reversal layer to back-propagate the gradients computed from the domain classifier.
Recently, domain adaptation have been utilized for graph-structured data [6,31,45]. CDNE [31] learns transferable node embeddings for cross network learning tasks by minimizing the maximum mean discrepancy (MMD) loss. However, it cannot jointly model network structures and node attributes, which might limit its modeling capacity. To utilize the network structure for cross-network node classification, some researches [6,45] attempt to use graph convolutional networks as feature extractor to learn node representations, and utilize adversarial learning strategy to learn domain invariant node representations, which obtain the promising performance. However, the above methods only use the GCN which considers the direct (the local consistency) neighbour nodes for knowledge embedding, and neglect the global consistency information of network for cross domain node classification.
In this paper, we propose an end-to-end Unsupervised Domain Adaptive Graph Convolutional Networks (UDA-GCN) for crossdomain node classification by jointly modeling local and global consistency relations of each graph, domain information, source domain information and target domain information as a unified learning framework.

PROBLEM DEFINITION AND OVERALL FRAMEWORK
This section defines the problem to be addressed and introduces notations used throughout the paper as summarized in Table 1.
Then we present the overall framework for the problem.
the domain adaptation rate γ 1 , γ 2 the balance parameters of the overall objective

Problem Statement
Node Classification on Graphs: In this paper, we focus on node classification on graphs. A graph is represented as N is a vertex set representing the nodes in a graph, and e i, j = (v i , v j ) ∈ E is an edge indicating the relationship between two nodes. The topological structure of a graph G can be represented by an adjacency matrix A, where ) be a fully labeled source network with a set of all labeled nodes V s and a set of the number of nodes in G s and C is the number of node categories.
Target Domain Graph: Similarly, the target network is represented as G t = V t , E t , X t , which is a completely unlabeled target network with a set of unlabeled nodes V t and a set of edges E t .
Unsupervised Domain Adaptive Node Classification: Given an unlabeled target network G t and a fully labeled source network G s , the cross domain node classification is to build a classifier f to accurately classify the nodes in the target network with the assistance of the fully labeled source network. However, this is a challenging task due to the lack of labels for G t .

Overall Framework
In order to leverage cross-domain graphs to learn a classifier for node classification, we propose an Unsupervised Domain Adaptive Graph Convolutional Networks (UDA-GCN) to reduce the distribution gap and induce a low-dimensional feature representation shared across domains. Our framework, as shown in Figure 2, mainly consists of the following three components: • Node Representation Learning. In order to learn the better representation of each node, we employ a dual graph convolutional networks to capture the local and global consistency relationship of each graph. • Inter-Graph Attention. We develop an inter-graph attention approach to automatically determine the weights (att s ,att t ) of the source and target graph representations from the local and global GCN layers, respectively. • Domain Adaptive Learning for Cross-Domain Node Classification. To enable cross-domain classification, we advocate a domain adaptive learning approach to train three classifiers. The first one is source classifier, and aims to minimize the classification loss on the source domain data. The second one is domain classifier, and a domain adversarial loss is utilized to enforce the differentiation between the source and target domains. The third one is target classifier, and as there's no label in the target domain, an entropy loss is placed on the target classifier in order to obtain a better semantic information of of the target domain. By doing so, the domain adaptive learning can maximally utilize the domain information and target domain information to learn domain-invariant and semantic representations effectively to reduce the domain discrepancy for cross-domain node classification.

METHODOLOGY
This section presents our unsupervised domain adaptive graph convolutional networks for cross-domain node classification.

Node Embedding Module
In order to encode the semantic information of each node (to capture the local and global information of the graph), the node representation learning procedure consists of two graph neural networks. For local consistency, we introduce the convolutional method using the graph adjacency matrix A. For global consistency, we propose another convolutional method based on a random walk. We feed both the source graph and target graph into our node embedding module. A ). By directly utilizing the GCN method proposed by [15], we formulate the Conv A as a type of feed-forward neural network. Given the input feature matrix X and adjacency matrix A, the output of the i-th hidden layer of the network Z is defined as:

Local Consistency Network (Conv
whereÃ = A + I n is the adjacent matrix with self-loops (I n ∈ R n×n is the identity matrix), andD i,i = jÃi, j . Accordingly,  the normalized adjacency matrix. Z (i−1) is the output of the (i − 1)th layer, and Z (0) = X . W (i) are the trainable parameters of the network, and σ (·) denotes the activation function.

Global
Consistency Network (Conv P ). In addition to Conv A which defined by the adjacent matrix A, we introduce a PPMI-based convolution method to encode the gloable information, which is denoted as a matrix P ∈ R N ×N .
Before obtaining the matrix P, we first calculate a frequency matrix F using the random walk. Random walks have been used as a similarity measure for a variety of problems in recommendation [29], graph classification [1], and semi-supervised learning [44]. Here, we use the random walk to calculate the semantic similarities between nodes. We then calculate P and explain why it leverages knowledge from the frequency to semantics based on F . Finally, we define the P-based graph convolution function Conv P .
Frequency matrix F : The Markov chain describing the sequence of nodes visited by a random walker is called a random walk. If the random walker is on node x i at time t , we define the state as s(t) = x i . The transition probability of jumping from the current node x i to one of its neighbors x j is denoted as p(s(t + 1) = x j |s(t) = x i ). In our problem setting, given the adjacency matrix A, we assign: Point-wise mutual information matrix (PPMI) P: After calculating the frequency matrix F , the i-th row in F is the row vector F i,: and the j-th column in F is the column vector F :, j . F i,: corresponds to a node x i and F :, j corresponds to a context c j . The contexts are defined as all nodes in X . The value of an entry F i, j is the number of times that x i occurs in context c j . Based on F , we calculate the PPMI matrix P ∈ R N ×N as: Since our node embedding module consists of two networks, in addition to the Conv A , which is based on the similarity defined by the adjacency matrix A, another network Conv P is derived from the similarity defined by the PPMI matrix P. This neural network is given by: where P is the PPMI matrix and D i,i = j P i, j for normalization. Obviously, applying diffusion based on such a node-contextual matrix P ensures global consistency. Additionally, by using the same neural network structure as Conv A , the two can be combined very concisely. Therefore, source and target graphs are fed into the parameters-shared node embedding module to learn representations.

Inter-Graph Attention
After performing the node embedding module for the source and target graph, we obtain four embeddings, Z s A , Z s P for source graph and Z t A ,Z t P for target graph. We need to aggregate embeddings from different graphs to produce a unified representation. For each domain, as embeddings from the local and global consistency networks contribute differently to learning the representation, we propose an Inter-Graph Attention scheme to capture the significance of each embedding from each domain.
Specially, we use the original input X s and X t as the key of the attention mechanism. We then perform attention on each domain output (Z s A ,Z s P for source domain and Z t A ,Z t P for target domain), two attention coefficient att s and att t are computed by an attention function f for each domain, respectively: where k denotes that the output is from the source domain s or the target domain t, J is a shared weight matrix to make the input X k have the same dimension with the output Z k A and Z k P . Then we further normalize the weight att k with a softmax layer.
After implementing the attention, we can get the final output Z s and Z t :

Domain Adaptive Learning for Cross-Domain Node Classification
To better learn a knowledge transfer across different domains to assist in the node classification task, our proposed model consists of a adversarial module, a source classifier as well as a target classifier working together to learn both class discriminative and domain invariant node representations, thus enabling classifying nodes in the target network. The overall objective is as follows: The γ 1 , γ 2 are the balance parameters. The L S , L DA and L T represent the source classifier loss, the domain classifier loss and the target classifier loss, respectively. The details are introduced as follows.

Source Classifier Loss.
The source classifier loss L S (f s (Z s ), Y s ) is to minimize the cross-entropy loss for the labeled data in the source domain: where y i denotes the label of the i-th node in the source domain, y i are the classification prediction for the i-th source labeled node v s i , respectively.

Domain Classifier
Loss. The domain classifier loss L DA (Z s , Z t ) enforces that the node representation after the node feature extraction process from source domain network G s and target domain network G t are similar. To achieve this, we learn a domain classifier f d (Q λ (Z s , Z t ); θ D ) parameterized by θ D with an adversarial training scheme, which tries to disciminative if a node is from G s or G t . On the one hand, we would like the source classifier f s can classify each node into the correct class via minimizing Eq. (15).
On the other hand, we would like that node representations from different domains are similar, so that the domain classifier cannot differentiate if the node comes from G t or G s . In our paper, we use Gradient Reversal Layer (GRL) [13] for adversarial training. Mathematically, we define the GRL as Q λ (x) = x with a reversal gradient Learning a GRL is adversarial in such a way that: on the one side, the reversal gradient enforces f s (Z s ) to be maximized; on the other side, θ D is optimized by minimizing the cross-entropy domain classifier loss: where m i ∈ {0, 1} denotes the groundtruth, andm i denotes the domain prediction for the i-th document in the source domain and target domain, respectively.

Target Classifier Loss.
In the target domain, an entropy loss is placed on the target classifier. Unlike the source classifier, we do not use cross-entropy as the label loss, because we do not have the class information for the unsupervised learning in the target domain. In order to utilize the data in the target domain, we employ an entropy loss for the target classifier f t : whereŷ i are the classification prediction for the i-th node v t i in the target domain.
L S (Z s , Y s ), L DA (Z s , Z t ) and L T (Z t ) are jointly optimized via our objective function in Eq. (14), and all parameters are optimized using the standard backpropagation algorithms.

Algorithm Description
Our algorithm is illustrated in Algorithm 1. Given a source graph G s = (V s , E s , X s , Y s ) and a target graph G t = V t , E t , X t , our goal is to obtain the node representations of the source graph and the target graph Z s and Z t , respectively. Firstly, we employ a dual graph convolutional networks to capture the local and global consistency relationship of each graph (Step 2-8). Here, the original input are X s and X t , the output are Z s A ,Z s P for source domain and Z t A ,Z t P for target domain. Then we propose an inter-graph attention scheme to the output of each domain and we obtain the final output of node representations Z s and Z t (Step 9). Finally, by employing the source classifier, domain classifier and target classifier, we can maximally utilize the domain information and label information to Algorithm 1 UDA-GCN training algorithm for cross domain node classi-

5:
if i = 1 then 6: [Z s , Z t ] ← Learn output embeddings for source domain and target domain using Eqs. (12) and (13) 11: f s ←Learn source classifier from Z s and Y s using Eq. (15) 12: f d ← Learn domain classifier from Z s and Z t using Eq. (16) 13: f t ← Learn target classifier from Z t using Eq. (17) 14: Back-propagate loss gradient from Z s , Z t and Y s using Eq. (14) 15: Update weights 16: if early stopping condition satisfied then

Time Complexity Analysis
Given a graph with n nodes and m edges, if the adjacent matrix is sparse, the time complexity for the graph convolution operation of GCN is O(m). In our model, we use the point-wise mutual information (PPMI) matrix instead of the adjacency matrix for propagation. Because the PPMI matrix is not guaranteed to be sparse, its complexity is subject to a dense matrix complexity: O(n 2 ). In addition, because we employ a dual GCN consisting of the sparse adjacency matrix and a dense PPMI matrix, the overall time complexity of Dual GCN moudle is O(m + n 2 ).

EXPERIMENTS
In this section, we will first describe benchmark datasets, baselines and experimental setting, and then report the algorithm performance.

Benchmark Datasets
We conduct experiments on three real-world networks. We constructed graphs based on datasets provided by ArnetMiner [34]. The details of the experimental datasets are displayed in Table 2. DBLPv8, ACMv9 and Citationv2 are three paper citation networks from different original sources (DBLP, ACM and Microsoft Academic Graph respectively) and for each dataset, we extracted the papers published in different periods, i.e., DBLPv8 (after year 2010), ACMv9 (between years 2000 and 2010), and Citationv2 (before year 2008). In our experiments, we consider them as undirected networks and each edge representing a citation relation between two papers. We classify papers to some of the following six categories according to its research topics, including "Database", "Data mining", "Artificial intelligent", "Computer vision", "Information Security" and "High Performance Computing". We evaluate our proposed model by conducting multi-label classification on these three network domains through six transfer learning tasks including C→D, A→D, D→C, A→C, D→A, and C→A, where D, A, C denote DBLPv8, ACMv9 and Citationv2, respectively.

Baselines
In order to make a fair comparison and demonstrate the effectiveness of our proposed model, we employ the following methods as baselines. We compare our approach with both state-of-the-art single-domain node classification models as well as cross-domain models with the necessary domain adaption. State-of-the-art single-domain node classification models: • DeepWalk [28]: It is a classics single network embedding method which employs the random walk sampling strategy to generate the neighborhood of each node and extends Skip-Gram model to learn low-dimensional node representation. • LINE [33]: LINE can preserve both first-order and secondorder proximities for the undirected network through modeling node co-occurrence probability and node conditional probability. • GraphSAGE [15]: The spectral clustering algorithm of SFA is adapted to co-cluster all words into the shared clusters for domain adaptation. • DNNs: DNNs is a multi-layer perceptron (MLP) which only uses the node features. • GCN [18]: GCN is a deep convolutional network for graphstructured data, which integrates network topology, node features and observed labels into an end-to-end learning framework.
Cross-domain node classification models with adaption: • DGRL [13]: The feature generator is a 2-layer perceptron to obtain the representation of each node. A gradient reverse layer (GRL) is added for domain classification.. • AdaGCN [6]: The feature generator is a GCN architecture and a gradient reverse layer (GRL) is added to train a domain classifier.

Experimental Settings
All deep learning algorithms are implemented in Pytorch [26] and are trained with Adam optimizer. We follow the evaluation protocol in unsupervised domain adaptation [6,19] and evaluate all approaches through grid search on the hyperparameter space and report the best results of each approach. We use all labeled source samples and all unlabeled target samples. For all cross domain node classification tasks, we use the same set of parameter configurations unless otherwise specified. For each deep approach, we use a fixed learning rate 1e −4 . For GCN, AdaGCN and UDA-GCN, the GCNs of both the source and target networks contain two hidden layers (L = 2) with structure as 128 − 16. For DeepWalk and LINE, node representations are first learned and then a one-vs-rest logistic regression classifier is trained with labeled nodes of source domain.
Here, the dimension of node representations for them are all set to 128 for fair comparison. For GraphSAGE, we also adapt it to the inductive setting, and train in the source domain. Here, we utilize the Pytorch version implemented by the geometric deep learning extension library [10]. DNNs and DGRL have similar parameter settings with GCN and AdaGCN, respectively. The adaptation rate λ is the following schedule: λ= min( 2 1+exp(−10p) − 1, 0.1), and the p is changing from 0 to 1 within the training process as in [13]. The balance parametersγ 1 , γ 2 are set 1, 0.8, respectively. The dropout rate for each GCN layer is set to 0.3. Table 3 lists the accuracy of different methods on cross-domain node classification tasks. From the results, we have the following observations:

Cross-Domain Classification Results
(1) The DeepWalk and LINE obtain the worst performance among these baselines since they only utilize the network structure information rather than node features. The DNNs also obtain worse performance than the other methods. This is because the traditional DNNs only consider node features, and do not capture the graph structure information for better node representation. (2) Graph-based methods (GCN and GraphSAGE) have better performances than the traditional two-step network embedding methods (DeepWalk and LINE), which shows that the end-to-end graph convolutional neural networks encoding both local graph structure and features of nodes have competitive advantages than traditional models in cross domain node classification.    Table 5, where the symbol × indicates the algorithm exploits the corresponding information. The ablation study results are shown in Table 4.

Effects of the global GCN layer module.
We compare UDA-GCN with UDA-GCN¬p to investigate the effectiveness of the novel dual GCN approach employed in our paper. From the result, we find that the UDA-GCN model performs better than UDA-GCN¬p, which confirms the superiority of the dual GCN which combines the local and global relationship to capture a comprehensive representation of the node.

Impact of domain-adversarial loss.
In order to verify the effectiveness of the domain-adversarial loss, we compare UDA-GCN model and UDA-GCN¬d. From Table 4, we can easily observe the UDA-GCN model performs significantly better than UDA-GCN¬d. This confirms that the usage of domain-adversarial loss can learn a superior representation for nodes from different domains.

5.5.3
Impact of the target classifier loss. In order to show the superiority of the target classifier, we design a variant model UDA-GCN¬t. The only difference between UDA-GCN¬t and UDA-GCN is that UDA-GCN¬t do not use the information of the target domain which is also the core information in cross-domain learning. The results in Table 4 show the performances of the node classification task on both datasets are improved when the target information are used, indicating the effectiveness of the target classifier loss.

Parameter Analysis
5.6.1 Attention Mechanism. By employing the inter-graph attention method to combine the local and global relationship for a comprehensive node representation, UDA-GCN obtains better results than AdaGCN. Fig. 3 shows the importance vectors of two nodes (the node 0 and node 1 , which are the randomly selected in this experiment). Note that the importance vector is node-wise, that is, each type of embedding plays different roles for different nodes. For node 0 , the global embedding is much more important than the local embedding, while for node 1 , the model pays more attention to the local embedding. Regarding the motivation of the attention mechanism, we find that the attention-based integration can perform better than the simple averaging-based or concatenation integration. The attention mechanism can capture the importance of different types of embeddings and learn the optimal integration weights of two types of representations. 5.6.2 Impact of feature dimensions of node output embeddings Z s and Z t . We set the number of feature dimensions of source output embedding Z s as the same as that of target output embedding Z t . UDA-GCN uses 2-layer GCNs with structure as 128 − 16, and feature dimensions d of node output embeddings is 16. We vary d from 4 to 128 and report the results of six cross domain node classification tasks, respectively in Figure 4. When d increases from 4 to 128, the accuracy of target domain is improved on both tasks. Furthermore, only slight differences can be observed with different d, and the increase of d does not necessarily result in performance improvements from 16 to 128. The results show that with sufficient feature dimensions (d ≥ 16), UDA-GCN is stable with the number of feature dimensions.

5.6.3
Visualization. An important application of network representation is to create meaningful visualizations that layout a network on a two dimensional space. We visualize the learned embedding for the target domain dataset. For simplicity, we only visualize two learned embeddings in the DBLPv8→ACMv9 and DBLPv8→Citationv2 to validate the effectiveness of our proposed model. For each approach, we map the the learned embedding  vectors to a 2-D space with the T-distributed Stochastic Neighbor Embedding (t-SNE) [22] method. t-SNE can project each highdimensional objects into a 2-dimensional or 3-dimensional space, where similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability. By t-SNE, the visualization result can preserve the similarity between the learned embeddings. Figs. 5 and 6 compare the visualization results of different approaches. We can observe that the visualization using DNNs is not very meaningful, where many nodes belonging to the same class are not clustered together and clusters are overlapped. GraphSAGE and GCN perform better than DNNs and their t-SNE results show more meaningful clusters than DNNs. However, the boundaries of most clusters are still hard to find. For the proposed UDA-GCN method, clusters are much more clear and many obvious boundaries can be found between clusters, which shows UDA-GCN can generate more meaningful layout of the network than other approaches.

CONCLUSIONS
In this paper, we studied the problem of unsupervised graph domain adaptation. We argued that most existing graph neural networks only learn models in a single graph and fail to consider the knowledge transfer across graphs. In this paper, we presented a novel unsupervised domain adaptive graph convolutional networks (UDA-GCN) to enable knowledge adaptation between graphs. By employing a dual graph convolutional networks to exploit both local and global relations of the graphs, we are able to learn better representation for nodes in both source and target graphs. The inter-graph attention mechanism presented here further generates a unified embedding for down-stream node classification task. By using a cross-entropy loss for source domain classification, a domain adversarial loss for domain discrimination, and an entropy loss for target domain information absorption, we are able to reduce the domain discrepancy and enable efficient domain adaptation. Experimental results on three real-world graph datasets show that our algorithm outperforms existing methods for cross domain network node classification.