A Cross-Domain Recommender System With Kernel-Induced Knowledge Transfer for Overlapping Entities

The aim of recommender systems is to automatically identify user preferences within collected data, then use those preferences to make recommendations that help with decisions. However, recommender systems suffer from data sparsity problem, which is particularly prevalent in newly launched systems that have not yet had enough time to amass sufficient data. As a solution, cross-domain recommender systems transfer knowledge from a source domain with relatively rich data to assist recommendations in the target domain. These systems usually assume that the entities either fully overlap or do not overlap at all. In practice, it is more common for the entities in the two domains to partially overlap. Moreover, overlapping entities may have different expressions in each domain. Neglecting these two issues reduces prediction accuracy of cross-domain recommender systems in the target domain. To fully exploit partially overlapping entities and improve the accuracy of predictions, this paper presents a cross-domain recommender system based on kernel-induced knowledge transfer, called KerKT. Domain adaptation is used to adjust the feature spaces of overlapping entities, while diffusion kernel completion is used to correlate the non-overlapping entities between the two domains. With this approach, knowledge is effectively transferred through the overlapping entities, thus alleviating data sparsity issues. Experiments conducted on four data sets, each with three sparsity ratios, show that KerKT has 1.13%–20% better prediction accuracy compared with six benchmarks. In addition, the results indicate that transferring knowledge from the source domain to the target domain is both possible and beneficial with even small overlaps.


I. INTRODUCTION
R ECOMMENDER systems have developed quickly with the explosion of Web 2.0 technologies [1] and are now in wide use. The aim of recommender systems is to provide users with items, such as products or services, that match their preferences. Generally, recommendation techniques are roughly divided into three categories based on the underlying data used to make the recommendation: collaborative filtering-based [2], content-based [3], and knowledge-based recommendation [4]. Collaborative filtering generates recommendations to one user from the historical records of other users with similar behavior [5]. This approach has advantages when historical data for users and items are actually available, such as ratings or browsing data. Originally designed as basic memory-based methods, collaborative filtering has evolved into model-based methods that commonly involve machine learning techniques, such as matrix factorization [6], probabilistic models [7], and deep neural networks [8]- [10]. Matrix factorization is, perhaps, the most widely used and has been incorporated into many commercial recommender systems [11]. However, observing the interactions between users and items has limitations in practice, and among them, data sparsity is a common problem [12]-a problem that is particularly severe and challenging in newly launched recommender systems. Data sparsity greatly impairs a recommender system's ability to produce accurate recommendation results, which leads to a poor experience for users [13].
To overcome data sparsity issues, some recommender systems based on collaborative filtering are beginning to incorporate transfer learning. Transfer learning extracts shared knowledge from a domain with comparatively denser data [14] and uses that knowledge to improve recommendations in the target domain. In newly launched recommender systems, this technique can significantly improve the performance [15]. Systems that use transfer learning techniques are known as cross-domain recommender systems. These systems are specifically designed to provide recommendations in the target 2162-237X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
domain using information extracted from the source domain. However, the most crucial concern in cross-domain recommender systems is how to extract common knowledge that can be shared between the two domains. The methods for knowledge extraction and transfer are different depending on whether and how the entities in each domain overlap. Existing cross-domain recommender systems usually assume that either none of the entities are common to both domains, or they all are with full one-to-one mapping. Nonoverlapping methods tend to extract shared knowledge based on collective group-level user behavior. Although many of these methods have been designed to suit specific situations, they cannot integrate knowledge from an overlapping entity once new information becomes available. In fully overlapping methods, the original source and target rating matrices are collectively factorized; then, the entities' features are extracted. Constraints on each entity ensure that these features are exactly the same in the source and target domains so they can act as a bridge for knowledge transfer. However, practical situations rarely satisfy the "all" or "none" overlap assumption; rather, they fall somewhere in between as shown in Fig. 1. In fully overlapping methods, information about the overlapping entities is used to establish constraints between two domains. These constraints usually relate to the entities' features. However, even with the same user, there may be small differences in item rating patterns between two different domains, and this is called domain divergence. If the entity feature constraints are not handled delicately, knowledge transfer will suffer and reduce the accuracy of the predictions.
Hence, cross-domain recommendation systems present the following challenges.
1) Feature Inconsistency Caused by Data Sparsity: Typically, there are no explicit features, only extracted latent features. And the observed sparse ratings do not fully represent a user's preferences, so features extracted from the same user in two different domains will be inconsistent. Thus, constructing an appropriate feature space is very challenging.

2) Feature Inconsistency Caused by Domain Heterogeneity:
Extracted latent features from an overlapping entity can be aligned through domain adaptation techniques. But extracted latent features from nonoverlapping entities lack a direct correlation, and their features are heterogeneous. 3) Partially Overlapping Entities: The number of overlapping entities can account for a very small part of the total number of entities in the target domain. Whether a small number of overlapping entities is effective in transferring knowledge or not and what the shared constraints should be, remain unsolved problems. In this paper, we propose a cross-domain recommender system with kernel-induced knowledge transfer (KerKT) as a knowledge transfer method to improve recommendation performance with partially overlapping entities. We first factorize the rating matrices separately to construct the user and item feature matrices. To avoid divergence in the feature space caused by data sparsity, we propose a domain adaptation method to adjust the feature spaces through the overlapping entities. Then, we use a diffusion kernel to construct a full and complete entity similarity matrix, so the similarity measures can be used in heterogeneous settings. Finally, we use a more flexible constraint to jointly factorize the source and target rating matrices. The main contributions of this paper are as follows.
1) A domain adaptation method that aligns the feature spaces of overlapping entities. The method matches the features of each overlapping entity acquired from two different domains. The overlapping entities are projected onto the same subspace to ensure the consistency between the representations. Feature divergence caused by data sparsity is eliminated. 2) A kernel-induced completion method for computing entity similarities in heterogeneous situations. Feature divergence caused by domain heterogeneity is eliminated and domain connection is reinforced. This means the similarities between entities can be determined through a modest amount of overlapping entity data. 3) A new matrix factorization method with constraints that integrates the intradomain and interdomain entity correlations acquired from overlapping entities. The two rating matrices are collectively factorized sharing interdomain knowledge while retaining their own domainspecific characteristics. The constraints are more flexible than previous methods and ensure that more useful knowledge is transferred to the target domain. 4) An adaptive knowledge transfer method, called KerKT, that addresses partially overlapping entities-the most common scenario in practice. Extensive experiments were conducted on four real-world data sets, each with three different sparsity ratios. The results show that KerKT alleviates the impact on recommendation caused by data sparsity and transfers knowledge even when there are only a few overlapping entities. The remainder of this paper is organized as follows. A review of work related to cross-domain recommender systems is provided in Section II. Section III introduces the preliminaries and formally defines the problem to be solved. In Section IV, we present the KerKT method. Section V contains the empirical experiments. We evaluated four tasks on four real-world data sets with three data sparsity ratios and three different levels of overlapping entities. The results show that our method performs better in prediction accuracy than six existing nontransfer and cross-domain methods. Finally, the conclusion and directions for future study are provided in Section VI.

II. RELATED WORK
In this section, we review the related work on both kernelbased and cross-domain recommender systems.

A. Kernel-Based Recommender Systems
In recommender systems, nonlinear interactions between users and items are modeled using kernels. Integrating kernels into a matrix factorization framework, which is a linear combination of the inner product of a user factor matrix and an item factor matrix, provides a more general and more flexible method for updating online models [16]. Lawrence and Urtasun [17] adopted a Mercer kernel to a nonlinear Gaussian process model. Ghazanfar et al. [18] incorporated metadata, such as genres and descriptions into the matrix factorization framework, along with kernels to solve cold-start problems. Coifman and Lafon [19] used diffusion maps to find representations of data with geometric meanings. Diffusion kernels are a special class of exponential kernels based on the heat equation, which aim to measure the similarities between vertexes or nodes when applied to graphs [20]. However, graph-based diffusion kernel completion is seldom used when dealing with recommendation issues.

B. Cross-Domain Recommender Systems
Cross-domain recommender systems can be divided into three categories.
1) Cross-Domain Recommender Systems With Side Information: In this category, it is assumed that some side information about the entities is available. Collective matrix factorization (CMF) [21] is designed for scenarios where a user-item rating matrix and an item-attribute matrix for the same group of items are available. The two matrices are collectively factorized by sharing the item parameters since the items are the same. The Tagicofi method [22] uses the user-item rating matrix and the user-tag matrix for the same group of users. User similarities extracted from shared tags are used to assist matrix factorization of the original rating matrix. On this basis, TagCDCF [23] extends the Tagicofi method to two domain scenarios, each containing the data of those two matrices by integrating the introdomain and interdomain correlations and matrix factorization simultaneously. In addition to using usergenerated tags, hybrid random walk [24] bridges cross-domain knowledge through social information.
2) Cross-Domain Recommender Systems With Nonoverlapping Entities: This category covers the methods that handle two domains with nonoverlapping entities and transfer knowledge at the group level. Users and items are clustered into groups, and knowledge is shared through group-level rating patterns [25]. For example, codebook transfer (CBT) [26] clusters users and items into groups and extracts group-level knowledge as a "codebook." A probabilistic model, called the rating matrix generated model (RMGM) [27], was subsequently extended from CBT, relaxing the hard membership in groups to soft membership. Neither of these methods ensures that the shared information between two groups in different domains is consistent, so the effectiveness of the knowledge transfer is not guaranteed. Consistent information transfer [28] relies on a domain adaptation technique to extract consistent knowledge from the source domain. This method is superior, especially when the data statistics between the source and target domains are divergent.
3) Cross-Domain Recommender Systems With Fully Overlapping Entities: These systems assume that the source and target domains share some common entities. These overlapping entities are used as a bridge, with constraints, to transfer knowledge. Transfer by collective factorization (TCF) [29] was developed to use implicit data in the source domain to help predict the explicit feedback in the target domain, such as ratings. However, the assumptions in TCF are very strict-users and items must have a one-to-one mapping across the domains. So, while this method is able to deal with heterogeneous data, its strict assumptions limit the scope for these types of applications in practice. Cross-domain triadic factorization (CDTF) [30] is a user-item-domain tensor that integrates both explicit and implicit feedback. It assumes that users are fully overlapping and that the user factor matrix is the same, thus bridging the domains. Cluster-based matrix factorization (CBMF) [31] tries to extend CDTF to partially overlapping entities, but the core of the CBMF method is the same as for nonoverlapping entities, which transfers knowledge based on groups rather than using the overlapping entities as a bridge.
Since entity correspondence is not always fully available, some strategies have been developed to match users or items across two domains. Li and Lin [32] used latent space matching to identify unknown user/item mappings. Sometimes, the identifying mappings are time-consuming; hence, Zhao et al. [33] developed an active-learning framework to identify the most valuable correspondences between entities. The process of identifying entity correspondence is not included in this paper. In summary, methods developed specifically for partially overlapping entities are rare. In this paper, we introduce KerKT to fill the gap in the literature between fully overlapping and nonoverlapping scenarios.

III. PRELIMINARIES AND PROBLEM FORMULATION
In this section, a matrix factorization view of the recommender system in one domain is given to clearly describe the problem setting. The problem under study in this paper is then formulated.

A. Recommendation Task Based on Matrix Factorization in One Domain
Suppose there are M users and N items in one domain, the relationship between users and items is given as X ∈ R M×N (bold letter represents a matrix). If a user's preferences are represented as ratings, then X is a rating matrix where X is subject to X i j ∈ {1, 2, 3, 4, 5, ?} ("?" denotes a missing value). By minimizing its Euclidean distance to the original rating matrix X [34], X is approximated bŷ Thus, U ∈ R M×K is the user feature matrix and V ∈ R N×K is the item feature matrix, which are two low-rank matrices for users and items, respectively. The i th user and the j th item are represented by the i th and j th row of the two matrices as U i * and V j * . After matrix factorization, the users and items are mapped to a latent factor feature space of a lower dimensionality K . The recommendation task is to predict the missing values in the rating matrix based on historical records of the users' preferences. Since the rating matrix X is usually extremely sparse, it is easy to overfit the low-rank approximation matrix factorization. Regularization is usually used on low-rank feature matrices to avoid this problem. In general, the optimization problem is where L is the loss function of the predicted ratings f (U, V ) and the original ratings X, R(U, V ) is the regularization term, and λ 0 is the regularization tradeoff parameter. Similar to probabilistic matrix factorization (PMF), the objective function to measure the loss with regularization terms and a Frobenius norm is [35] J where I is the rating indicator matrix, indicates that the rating is observed, or I i j = 0, otherwise. • denotes the Hadamard product of the matrices.

B. Problem Definition
The problem in this paper is based on the assumption that ratings in the target domain are very sparse. This raises the question of how to use relatively dense data in the source domain to assist a recommendation task in the target domain with overlapping entities. In practice, corresponding entities are not usually easy to identify. Typically, there are many unique entities between different data sets or platforms and only a few common entities. Thus, in this problem setting, the entities partially overlap. Only a small proportion of the entities in the target domain matrix X t have observed correspondences in the source rating matrix X s . Even though the entities represent the same user and/or item, the rating a user has given or the rating an item has received can be different in each domain. The overlapping entity indicator matrix is represented by indicates that the i th entity in the source domain is the same as the j th entity in the target domain, and W (s,t ) i j = 0 otherwise. Without loss of generality, we require the rating rows of overlapping users to be at the top, and the corresponding users are in the same rows in both matrices. This is achieved by permuting the rows of the original rating matrices. Thus, the form of the entity indicator matrix W (s,t ) is where I o is an identity matrix of the same dimension as the number of overlapping entities. This problem of partially overlapping entities in cross-domain recommender systems is formally defined in the following. Given a source rating matrix X s ∈ R M s ×N s and a target rating matrix X t ∈ R M t ×N t , a crossdomain recommender system based on partially overlapping entities is to assist with recommendation taskX t = U t V T t through an auxiliary source rating matrix X s and an overlapping entity indicator matrix W (s,t ) .

IV. CROSS-DOMAIN RECOMMENDER SYSTEM BY KERNEL-INDUCED KNOWLEDGE TRANSFER
This section introduces our KerKT method. The overlapping entities in each domain may be either users or items. For the purposes of this presentation, we have assumed the users overlap. Overlapping items are handled in the same way and have, therefore, been omitted from this paper. This section begins with an overview of the entire method, and then, each of the five steps is explained in detail, followed by a smallscale example for greater clarity.

A. KerKT Method Overview
To enable knowledge sharing between the source and target domains with overlapping users, constraints on the user feature matrices are added to the collective matrix factorization of the source and target rating matrices. Previous research assumes "identical" factor matrices for overlapping entities, but this assumption is too limiting to satisfy in practice. Instead, we have chosen to constrain the similarities between the entities in each domain as a bridge for knowledge transfer. However, while it is easy to measure the similarities between entities in the same domain, interdomain entity similarities cannot be computed directly.
The overlapping entities are mapped to the same feature space through domain adaptation techniques, while the nonoverlapping entities are connected by diffusion kernel completion. Thus, the similarities between all users in both the domains can be measured. Furthermore, constraining the user features using these similarities may lead to a better optimization result. The optimization problem is formalized as where R o (U) is the regularization term for the entity similarity constraints derived from overlapping users and λ o 0 is the regularization tradeoff parameter. The KerKT method consists of five steps, as shown in Fig. 2.
1) The user features and item features are extracted separately from the source and target domains, and the two sets of user features are aligned with the same feature space through overlapping users. 2) The item features are regulated according to the original rating matrices and the aligned user feature matrices.
3) The user and item feature matrices resulting from the previous two steps are used to measure the user and item similarities in one domain. 4) Kernel-induced completion is conducted to measure the interdomain user similarities. 5) The user/item features are retrained based on the constraints of the entity similarities; then, recommendations are made. We have selected a specific algorithm to perform each step, but other suitable feature extraction or domain adaptation algorithms could be used as substitutes. The proposed domain adaptation method is contained in Steps 1 and 2, while the kernel-induced completion method is contained in Steps 3 and 4. The matrix factorization method, with constraints on both intradomain and interdomain entity correlations, is contained in Step 5.

B. KerKT Method
Our proposed KerKT method comprises five steps.

1) Step 1 (Extracting and Aligning User features in Both
Domains): In this step, the source rating matrix X s and the target rating matrix X t are separately factorized, which results in the user feature matrix U s for the source domain and U t for the target domain. Recall that the users in the source and target domains partially overlap. Accordingly, each user feature matrix can be divided into two parts: one containing the overlapping users; the other containing nonoverlapping users. The overlapping user feature matrix for the source domain is denoted as U s,o , and U s,n denotes the nonoverlapping matrix. The same goes for the target domain, i.e., U t,o and U t,n .
Assuming that the overlapping users have similar tastes or preferences in both domains, we can use them as a bridge to transfer knowledge. However, as mentioned in Section I, even the same user's rating patterns may not be completely the same in the two different domains. Data sparsity exacerbates this condition and may lead to two different factorized user feature vectors with different physical meanings. Hence, setting the similarity of the overlapping user entities to 1 may lead to inaccurate similarity measurements, which would eventually negatively impact the effectiveness of the knowledge transfer in the following steps. Therefore, before using the entity correspondences as a strong condition, we need to ensure that the overlapping users in both the domains are represented in the same feature space. This is referred to as "subspace alignment" in transfer learning.
The aim is to map the user feature spaces of two overlapping users into a common subspace where the domain shift has been eliminated, so the overlapping users ultimately share the same feature space across both the domains. In the source domain, the j th column of the overlapping user feature matrix is the representation of the j th user feature. We use a marginal probabilistic distribution of the j th column to represent the characteristics of the user features in each matrix. Thus, the goal is to minimize the differences between the marginal probabilistic distributions of the user features for the source domain and the target domain. If the marginal probability distributions of one user feature are the same in both the domains, then the two user features are considered to have the same physical meaning. In this way, we can align the two user feature spaces. In our previous research, we provided a definition for information consistent trifactorization. However, here, since the scenario and the matrix factorization model are different, this definition has been refined into a definition for consistent matrix factorization with partially overlapping users in the following.
Definition 1 (Consistent Matrix Factorization With Partially Overlapping Users): Given a source rating matrix X s ∈ R M s ×N s and a target rating matrix X t ∈ R M t ×N t , X s and X t can be factorized as follows: where U s,o and U t,o are the overlapping user feature matrices in the source domain and the target domain and U s,n and U t,n are the nonoverlapping user feature matrices, respectively. If both factorizations satisfy the following equation, then they are consistent matrix factorizations: where P(U s,o ) and P(U t,o ) represent the marginal probability distribution of U s,o and U t,o . Thus, the user feature spaces in both the source and target domains are aligned. Solving a matrix factorization optimization problem that satisfies the above-mentioned constraints is almost impossible. However, according to Definition 1, a mapping function for those two matrices can be found to achieve the following: A geodesic flow kernel (GFK) is a domain adaptation strategy to find a space that two different feature spaces can be projected into, thus eliminating the divergence of two distributions. We can use this strategy to find a mapping function to align the two user feature spaces formed by overlapping users. Once the GFK operators s (U (0) s , U (0) t ) are determined, they can be used through the following mapping functions: where t,o ) are the GFK operators. More details can be found in [28] and [36].
With the divergence eliminated through these mappings, the new representations of the overlapping users will satisfy the conditions in Definition 1. Nonoverlapping users also need to be projected onto the same feature space. The mapping functions s and t are used for this purpose where U (1) s and U (1) t are the aligned user feature matrices after mapping, and s and t are the mapping functions using GFK.
How the new user feature spaces are derived and how U (1) s and U (1) t are learned are summarized in Algorithm 1.

Algorithm 1 Consistent User Feature Extraction
Input: X s , the source rating matrix; X t , the target rating matrix; W (s,t ) u , the overlapping user indicator matrix; Output: U (1) s , the aligned user feature matrix in source domain; U (1) t , the aligned user feature matrix in target domain; 1: Factorize X s and get user feature matrix s,n as in equation (3) 2: Factorize X t and get user feature matrix [28] 4: Obtain mapping functions s and t as in equation (9)

2) Step 2 (Item Feature Regulation in Both Domains):
In matrix factorization, the user feature matrix and the item feature matrix are both low-rank matrices that map users and items to the same k-dimensional feature space. So, once the user feature spaces are aligned, the item feature matrices should be regularized to the new k-dimensional feature space. The new item feature matrices are obtained by minimizing the distance between the approximations of the low-rank matrices and the original data in the rating matrix. A Frobenius norm is used to measure the distance. The cost function in source domain is as follows, the one in the target domain has the same form.
where λ V s is the regularization parameter. The item feature matrices are learned by optimizing Gradient descent is used for this optimization. The update rule is where the learning rate is η Vs . V (1) t can be obtained through the same process.
This step is summarized in Algorithm 2.

3) Step 3 (Entity Similarity Measures in One Domain):
This step calculates the user and item similarities in one domain. Since the rating matrix is very sparse, making this calculation directly from the rating matrix can lead to inaccurate results. Hence, using the PMF formulation introduced in Section III-A, one rating X i j is generated from a user latent feature vector Algorithm 2 Item Feature Regularization Input: X s , the source rating matrix U (1) s , the user feature matrix Output: V (1) s , the regularized item feature matrix 1: Initialize V (1) 4: Update V s as in equation (15) 5: Update J v (V s ) as in equation (13) 6: end while 7: return V (1) s U i * and an item latent feature vector V * j . Thus, the source rating matrix and the target rating matrix can be factorized as This is a dimensionality reduction and data compression process, as users/items are mapped to a lower k-dimensional feature space (usually k M, k N). Once complete, users and items are represented as full k-dimensional feature matrices, and the user/item similarities can be calculated from the user/item feature matrices.
Similarity measurements are easy with user and item feature spaces in one domain since the feature spaces are homogeneous. And there are many suitable choices for performing these calculations, such as cosine similarity, Pearson's similarity, Euclidean measurement, or the radial basis function (RBF) measurement. The choice depends on the situation and the characteristics of the domain. For example, cosine similarity is very popular and effective for word count and text similarity measurements due to the advantages of using angles rather than distance. Pearson's measurement tends to be more effective in memory-based collaborative filtering methods owing to its emphasis on averages. In this problem, we are measuring user similarity from a user feature matrix where the feature values are real numbers, so the following RBF measurement is the most appropriate: W i j = e −(U i * −U j * 2 /σ 2 ) , where σ 2 is set to be the median of all the nonzero values calculated by U i * − U j * 2 .

4) Step 4 (Kernel-Induced Completion of Interdomain User Similarity):
In interdomain user similarity measurement, the user feature spaces are not the same and the user features are heterogeneous, which means that their similarities cannot be calculated directly. However, given the first three steps, some user similarities between the source and target domains are now known. The overlapping entity indicator matrix W (s,t ) contains the observed overlapping user information. Hence, a full user similarity matrix can be constructed as where W (s,s) u and W (t,t ) u represent the user similarities in the source and target domains, respectively, and W (s,t ) u = (W (t,s) u ) T represents the interdomain user similarities. As in Step 2, the two feature spaces of the overlapping users are aligned to eliminate feature space divergence. Therefore, it is reasonable to set the similarity of observed overlapping  (s,t ) . For now, the similarities between the overlapping users are the only known entries in the interdomain similarity matrix. We need to complete W (s,t ) u using the information from W u . Note that, here, the nonoverlapping users and their features are heterogeneous. Thus, their similarities cannot be computed directly.
This matrix completion problem has a strong connection to a bipartite edge completion problem [37]. In Step 3, the user similarities are all measured within one domain. As a result, we have fully connected nodes in the graph representations of both the source and target domains, as indicated by the red and blue nodes in Fig. 3. The overlapping users are shown as purple nodes in the graph, and they act as a "bridge" to couple the two graphs. To complete the user similarity matrix, W u requires filling in all the edges from the entire graph. The subscript u has been omitted to simplify the notation.
In network propagation, a random walk is a good way to reach all the nodes. As shown in Fig. 3, one user entity, denoted as a node x in the source domain, is fully connected with all the other node in the source domain W (s,s) xp , p ∈ U s , so does node y in the target domain W (t,t ) yq , q ∈ U t . If node p in the source domain and node q in the target domain are overlapping users, i.e., they are the same user, then W The above-mentioned equation can be treated as a one-step random walk from both the source domain and the target domain. Generally, M steps of random walk are taken in total from the source and target sides, and all the possible steps are added together to complete the final graph However, the goal in this problem is to find all the similarities between all the users in both the domains. Therefore, a finite number of random walk steps may not identify all the possible relationships, but it would be more likely to associate all the indirectly connected users if K was infinite. Hence, the diffusion kernel completion method [38] is used to complete the user similarity matrix

W (s,t ) = e (β s W (s,s) ) W (s,t ) e (β t W (t,t) )
where β s and β t are two positive scalars to regulate the weights of the source and target domains.

5) Step 5 (Collective Matrix Factorization With User Similarity Constraints):
With all similarities measured and all the pairs of nodes connected, a fully connected graph can be constructed. A very common strategy for increasing computational speed is to remove edges, leaving a sparse graph. This approach tends to achieve better performance empirically, as it emphasizes local information with high similarity while ignoring information that is likely to be false. Hence, the k-nearest neighbors of each node are retained in a similar way to the original memory-based collaborative filtering strategy.
In the scenario of this paper, only overlapping users are observed but items are nonoverlapping. The users have both intradomain and interdomain similarities, but the items only have intradomain similarities. Based on the assumption that "similar users have similar tastes and will thus choose similar items to consume," both the intradomain and interdomain similarities are used to constrain the proposed matrix factorization as prior knowledge. In terms of intradomain similarities, although the data in the target domain are very sparse, they are still very valuable for measuring the similarities between users/items so as to constrain the matrix factorization. As for interdomain similarities, users in the target domain are not only correlated with users in their own domain but also in the source domain via the overlapping users. As a consequence, users in the source domain with similar preferences to users in the target domain are transferred as knowledge to improve the performance of the recommender system.
The constraints result in users who are similar tends to have similar latent factors. We have used the regularization form in [22] R o (U) = tr(U T LU) (20) where L denotes a Laplacian matrix and L = D − W. W is the user similarity matrix, and D is a diagonal matrix defined as D ii = j W i j . Note that, although the form of regularization is quite similar, our method is different to [33]. In [33], the similarities between entities in the source domain are directly used in the target domain as prior knowledge without considering the inconsistencies between the features in each domain. By contrast, in our method, the similarities between entities in the target domain are learned through the domain adaption technique, which ensures that the features are consistent, and through the diffusion kernel technique, which makes the similarity constraints more accurate and complete. Our proposed constraints are both more flexible and more reasonable to satisfy in practice. This goal is achieved by minimizing the following objective function: where tr is the trace of a matrix, α ∈ (0, 1) is a tradeoff parameter to balance the source and target domain data, and λ u , λ u , and λ u are the regularization parameters to control the influence of the constraints on the user similarities, item similarities, and algorithm complexity. Details on how these parameters affect the proposed method are presented in Section V. Using gradient descent, the objective function is minimized with the following update rules: By updating U s , V s , U t , and V t iteratively, we arrive at a final optimized approximation ofX t = U t V T t . Recommendations are given according to the rating prediction in the target domain.

C. Small-Scale Example
To better illustrate our method, this section outlines a small-scale example. Suppose the source domain of a recommender system contains five users, denoted as U s = {u 1 , u 2 , u 3 , u 4 , u 5 }, and nine items, denoted as The target domain contains six users, denoted as U t = {u 6 , u 7 , u 8 , u 9 , u 10 , u 11 }, and nine items, denoted as I t = { j 1 , j 2 , j 3 , j 4 , j 5 , j 6 , j 7 , j 8 , j 9 }. Four of the nine items in the source domain i 6 , i 7 , i 8 , i 9 correspond to four of the nine items in the target domain -j 1 , j 2 , j 3 , j 4 , respectively, as shown in Fig. 4.
Step 1 (Extracting and Aligning Features): With the lowerdimension K set to 4, the source rating matrix X s and the target rating matrix X t are factorized into user feature and V (1) t . The example in Fig. 5 shows the domain adaptation process using GFK. In this small-scale example, there is not sufficient data to generate meaningful distribution. Therefore, we have used the result and item feature matrices from our Task 2 experiment in Section V. Fig. 5(a) shows the distribution of one item feature for the items that overlap in the source and target domains. It is apparent that a divergence exists between these two distributions. Fig. 5(b) shows the adjustment to the feature after domain adaptation. Here, the feature distributions for the source domain and the target domain are far better aligned.
Step 3 (Similarity Measures): This step shows how the item similarity matrix W v is calculated, given that item overlaps exist in our example. With item feature matrices from the source domain and the target domain, the item similarity in the same domain can be calculated directly through RBF measurement. Together with the overlap information, the item similarity matrix W v is , and W (s,t ) v are shown at the top of the next page.
Step 4 (Kernel Induced Completion): The similarity matrix W (s,t ) v is completed according to (19). In more detail, the lowrank eigendecomposition for both W (s,s) v and where D s and D t are the k s , k t leading eigenvalues for W (s,s) v and W (t,t ) v and Q s and Q t are the corresponding eigenvalues.
Diffusion completion is then conducted aŝ The final result ofŴ in this example is shown at the top of the next page.
Step 5 (Collective Matrix Factorization): With the user and item similarity matrix available, the target rating matrix is approximated as indicated in (21), and recommendations can be given accordingly.

V. EXPERIMENTS AND ANALYSIS
This section presents the experimental results and the related analysis. The data sets and evaluation metrics are introduced first, followed by the experimental settings and baseline methods. Then, we present the results of the empirical experiments, with a parameter analysis to conclude this section.

A. Data Sets and Evaluation Metrics
Our method was tested under the conditions that the source and target domains share some overlapping users and/or items. For a fair comparison, we chose movies and books as the recommendation subject-two commonly used categories in previous research on cross-domain recommender systems. Four real-world data sets were used in our experiments: Movielens, 1 Netflix, 2 AmazonBook 3 [39], and Douban 4 [40]. Each of these data sets is publicly available and has been tested on single domain recommendation in a variety of situations, but rarely in cross-domain recommendation. Our experiments are a supplement to the lack of tests in this specific problem setting. The statistical information for these data sets is presented in Table I.
From AmazonBooks, we removed all users who had given exactly the same rating for every book, as these data are not effective for constructing a recommender system [28]. Movielens20M was normalized to the range of {1, 2, 3, 4, 5}. The following four cross-domain recommendation tasks were designed for experiments as shown in Table II.
Task 1: Movie → movie, user-overlap, Movielens20M. Task 2: Movie → movie, item-overlap, Netflix. Task 3: Book → book, item-overlap, AmazonBook. Task 4: Movie → book, user-overlap, Douban. In the first three tasks, we used the data from one data set and split the entities into the source domain and the target domain to simulate entity overlaps. The fourth task was designed for Douban, a real-world rating website where users can rate movies, books, and music. We now turn to Task 1 as an example to describe how the data were selected. The process for Task 2 and Task 3 was similar but with overlapping items. With the source domain data, we filtered out the users who had given less than a total of 20 ratings and items who had received less than 10 ratings. We randomly selected 2000 items and 2000 users, constraining the sparsity to 2% to ensure that the source domain data were relatively dense. In the 2000 users, we randomly chose 200 users as overlapping users. We then randomly selected 1800 users with no correspondence to the 2000 users in the source domain. In total, these users composed the 2000 users in the target domain data. In addition, we also randomly chose 2000 items for the target domain data that had no intersections with items in the source domain. We tested the target domain data with three sparsity ratios to compare the different algorithms in different circumstances.
Mean absolute error (MAE) and root mean square error (RMSE) were used as evaluation metrics where X uv andX uv are the true and predicted ratings, Y is the test set, and |Y | is the number of the test set. The smaller the errors, the better the performance.

B. Experimental Settings and Baselines
Three nontransfer learning methods were chosen for comparison: Pearson's correlation coefficient (PCC) [41], flexible mixture model (FMM) [42], and PMF [35], along with three cross-domain recommendation methods, CBT [26],  [27], and PMF transfer learning (PMFTL) [33]. PCC is a classical memory-based collaborative filtering method. FMM is a graphical model designed to allow one user/item to be clustered into several groups simultaneously. Empirically, FMM has been proven to be more effective in providing recommendations to users with a few historical ratings. RMGM is a cross-domain recommendation method that evolved out of the single domain FMM. CBT is also a cross-domain recommendation method. Both of these methods were designed for scenarios with no overlapping users or items. PMFTL is a transfer learning method for cross-domain scenarios with entity overlap as proposed in [33]. However, for a fair comparison, we removed the active-learning module in the originally proposed method. PMFTL was developed on the basis of PMF with partially overlapping entities. PMFTL has more relaxed constraints than TCF [29]. TCF was designed for problems where users and items have a one-to-one mapping. The constraints in TCF are strict. One constraint requires that the user and item feature matrices in the source and target domains are exactly the same, whereas PMFTL uses similarities estimated in the source domain directly as constraints in the target domain. Since TCF cannot be used to solve the problem presented in this paper, we did not select it for comparison.
User-based collaborative filtering was used for PCC, with the number of neighboring users set to 50.

C. Results
The results with these four data sets are shown in Tables III-VI, and a visual comparison is shown in Fig. 6. KerKT delivered the best performance of all the comparison methods on all four cross-domain recommendation tasks. This verifies the conclusion that using overlapping entities as a bridge for transferring knowledge is useful in cross-domain   KerKT significantly outperformed all the nontransfer learning recommendation techniques.

2) Comparison With Cross-Domain Recommendation
Methods for nonoverlapping Entities: RMGM showed improved precision in recommendations over its basis, FMM, but sometimes the improvement was not significant (see Table VI). CBT did not always improve the performance of the recommender system and sometimes suffered from negative transfer, indicating that CBT is not stable when transferring knowledge (see Table IV). Neither of these methods uses nonoverlapping entity information explicitly, but rather extracts cluster-based knowledge to share between the source and target domains. KerKT outperformed both these methods in all recommendation tasks. Methods designed for scenarios with nonoverlapping entities can be applied to solve the problem proposed in this paper as a substitution. But, we can see from the results that they did not show advantages since they did not use the overlapping information.

3) Comparison With Cross-Domain Recommendation
Methods for Partially Overlapping Entities: As methods designed for partially overlapping entities are rare and methods developed for fully overlapping entities cannot be used to solve the problem proposed in this paper, only PMFTL serves as a representative of the cross-domain recommendation method for partially overlapping entities. PMFTL was developed on the basis of PMF with partially overlapping entities.

D. Complexity Analysis
This complexity analysis covers each step of the proposed method KerKT. For simplicity, the dimensions of the user and item features in both the domains have all been set to k. The time complexity of each step is listed as follows.
Step 1: O(kn 2 ), for each iteration to update U (0) s , V (0) s , U (0) t , and V (0) t and for each iteration to approximate X s and X t for the matrix factorization, and O(k 2 n), for the domain adaptation process to adjust the feature matrices of the overlapped entities.
Step 2: O(kn 2 ) for each iteration to update the feature matrices without the overlapped entities.
Step 3: O(kn 2 ) to calculate the similarity matrix of users and items.
Step 4: O(n 3 ) for the diffusion kernel completion.
Step 5: O(kn 2 ) for each iteration to update (22). Since the number of iterations in Steps 3 and 5 are not infinite, KerKT's the total complexity is O(n 3 ). We have also listed the time taken for each of the methods to complete Task 2 in Table IX. This experiment was conducted with 200 overlapping entities and 0.5% sparsity on a computer with 16-GB memory and 2.2-GHz Intel Core i7. We can see that the nontransfer methods were faster. This is due to their general simplicity, because they do not take cross-domain data into consideration.
Most of the time was spent on the user and item similarity matrix calculations. However, some parallel computing could be used to speed up these computations. Alternatively, these matrices could be precalculated and stored  so as not to affect the speed of online recommendation. Overall, the time complexity analysis shows that the KerKT method can be used with large-scale data sets and for online e-commerce or business-to-business systems.

E. Parameter Analysis
There are three important parameters in KerKT: λ u , λ v , and λ. Each is a tradeoff parameter in (21). For simplicity, we have only presented the results for the Movielens data set. This experiment was conducted with a sparsity ratio of 99.0% and 200 overlapping entities. MAE and RMSE were used as metrics. The results are presented in Figs. 7 and 8.
To analyze the parameters λ u and λ v , we set parameter λ to 0.0001. From Fig. 7, we can see that the MAE and RMSE change with different settings for λ u and λ v . These parameters reflect the influence of the user and item similarities on the matrix factorization, while the parameter λ restricts the complexity of the algorithm to avoid overfitting. We used a grid search to find the optimized settings for each of these parameters, λ, λ u , and λ v , which resulted in a setting of 0.01 for all.

VI. CONCLUSION AND FURTHER STUDY
This paper presents a novel cross-domain recommendation method for knowledge transfer, called KerKT. This method exploits overlapping entities as a bridge between the source and target domains and is applicable to e-commerce websites, such as Amazon, where book rating data are very dense but data in other categories are sparse. Unlike previous research, KerKT does not require that the entities be fully overlapped; it performs well in scenarios with partially overlapping entities. One advantage of this method is that it aligns the latent features of the entities extracted from the original ratings matrix. This fixes shifts in the entity feature space caused by user preference deviations between the domains. Furthermore, the entity similarity matrix is completed through diffusion kernel completion to tackle the inconsistencies caused by heterogeneous feature spaces between two domains. The similarity matrix is extended into matrix factorization with more flexible constraints to integrate the overlapping entity information. Experimental results from a comparison with six nontransfer learning and cross-domain recommendation methods show that KerKT achieved the best performance. Even with a small ratio of overlapping entities, it was still possible to transfer knowledge from the source domain to the target domain.
There is practical significance in studying and developing cross-domain recommender systems. Smart BizSeeker, a B2B recommender system, aims to recommend appropriate business partners to businesses in Australia [43]. In a future study, we plan to implement our proposed method into their system. Furthermore, there are still some interesting issues to be explored. For example, how to choose the best source domain if several domains are available? And which sparsity ratio in the target domain benefits transfer learning the most? In future work, we intend to apply our method to other kinds of data beyond ratings, such as Web browser records and social media records.