Off-policy Learning over Heterogeneous Information for Recommendation

Publisher:
ACM
Publication Type:
Conference Proceeding
Citation:
WWW 2022 - Proceedings of the ACM Web Conference 2022, 2022, pp. 2348-2359
Issue Date:
2022-04-25
Full metadata record
Reinforcement learning has recently become an active topic in recommender system research, where the logged data that records interactions between items and users feedback is used to discover the policy. Much off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has been a popular research topic in reinforcement learning. However, the log entries are biased in that the logs over-represent actions favored by the recommender system, as the user feedback contains only partial information limited to the particular items exposed to the user. As a result, the policy learned from such off-line logged data tends to be biased from the true behaviour policy. In this paper, we are the first to propose a novel off-policy learning augmented by meta-paths for the recommendation. We argue that the Heterogeneous information network (HIN), which provides rich contextual information of items and user aspects, could scale the logged data contribution for unbiased target policy learning. Towards this end, we develop a new HIN augmented target policy model (HINpolicy), which explicitly leverages contextual information to scale the generated reward for target policy. In addition, being equipped with the HINpolicy model, our solution adaptively receives HIN-augmented corrections for counterfactual risk minimization, and ultimately yields an effective policy to maximize the long run rewards for the recommendation. Finally, we extensively evaluate our method through a series of simulations and large-scale real-world datasets, obtaining favorable results compared with state-of-the-art methods.
Please use this identifier to cite or link to this item: