Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based Recommendation.

Wang, S; Chen, X; McAuley, J; Cripps, S; Yao, L

Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based Recommendation.

Wang, S Chen, X McAuley, J Cripps, S

Yao, L

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Trans Neural Netw Learn Syst, 2023, PP, (99)
Issue Date:: 2023-11-16

Closed Access

	Filename	Description	Size
	1686721.pdf	Published version	1.62 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wang, S
dc.contributor.author	Chen, X
dc.contributor.author	McAuley, J
dc.contributor.author	Cripps, S https://orcid.org/0000-0003-3207-172X
dc.contributor.author	Yao, L
dc.date.accessioned	2024-05-13T01:40:18Z
dc.date.available	2024-05-13T01:40:18Z
dc.date.issued	2023-11-16
dc.identifier.citation	IEEE Trans Neural Netw Learn Syst, 2023, PP, (99)
dc.identifier.issn	2162-237X
dc.identifier.issn	2162-2388
dc.identifier.uri	http://hdl.handle.net/10453/178865
dc.description.abstract	Recent advances in recommender systems have proved the potential of reinforcement learning (RL) to handle the dynamic evolution processes between users and recommender systems. However, learning to train an optimal RL agent is generally impractical with commonly sparse user feedback data in the context of recommender systems. To circumvent the lack of interaction of current RL-based recommender systems, we propose to learn a general model-agnostic counterfactual synthesis (MACS) policy for counterfactual user interaction data augmentation. The counterfactual synthesis policy aims to synthesize counterfactual states while preserving significant information in the original state relevant to the user's interests, building upon two different training approaches we designed: learning with expert demonstrations and joint training. As a result, the synthesis of each counterfactual data is based on the current recommendation agent's interaction with the environment to adapt to users' dynamic interests. We integrate the proposed policy deep deterministic policy gradient (DDPG), soft actor critic (SAC), and twin delayed DDPG (TD3) in an adaptive pipeline with a recommendation agent that can generate counterfactual data to improve the performance of recommendation. The empirical results on both online simulation and offline datasets demonstrate the effectiveness and generalization of our counterfactual synthesis policy and verify that it improves the performance of RL recommendation agents.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartof	IEEE Trans Neural Netw Learn Syst
dc.relation.isbasedon	10.1109/TNNLS.2023.3329808
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based Recommendation.
dc.type	Journal Article
utslib.citation.volume	PP
utslib.location.activity	United States
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Provost
utslib.copyright.status	closed_access	*
dc.date.updated	2024-05-13T01:40:15Z
pubs.issue	99
pubs.publication-status	Published online
pubs.volume	PP
utslib.citation.issue	99

Abstract:

Recent advances in recommender systems have proved the potential of reinforcement learning (RL) to handle the dynamic evolution processes between users and recommender systems. However, learning to train an optimal RL agent is generally impractical with commonly sparse user feedback data in the context of recommender systems. To circumvent the lack of interaction of current RL-based recommender systems, we propose to learn a general model-agnostic counterfactual synthesis (MACS) policy for counterfactual user interaction data augmentation. The counterfactual synthesis policy aims to synthesize counterfactual states while preserving significant information in the original state relevant to the user's interests, building upon two different training approaches we designed: learning with expert demonstrations and joint training. As a result, the synthesis of each counterfactual data is based on the current recommendation agent's interaction with the environment to adapt to users' dynamic interests. We integrate the proposed policy deep deterministic policy gradient (DDPG), soft actor critic (SAC), and twin delayed DDPG (TD3) in an adaptive pipeline with a recommendation agent that can generate counterfactual data to improve the performance of recommendation. The empirical results on both online simulation and offline datasets demonstrate the effectiveness and generalization of our counterfactual synthesis policy and verify that it improves the performance of RL recommendation agents.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/178865