SATF: A Scalable Attentive Transfer Framework for Efficient Multiagent Reinforcement Learning.

Chen, B; Cao, Z; Bai, Q

SATF: A Scalable Attentive Transfer Framework for Efficient Multiagent Reinforcement Learning.

Chen, B Cao, Z

Bai, Q

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Trans Neural Netw Learn Syst, 2024, PP, (99)
Issue Date:: 2024-04-22

Closed Access

	Filename	Description	Size
	1725481.pdf	Published version	4.34 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Chen, B
dc.contributor.author	Cao, Z https://orcid.org/0000-0003-3656-0328
dc.contributor.author	Bai, Q
dc.date.accessioned	2024-08-07T05:26:47Z
dc.date.available	2024-08-07T05:26:47Z
dc.date.issued	2024-04-22
dc.identifier.citation	IEEE Trans Neural Netw Learn Syst, 2024, PP, (99)
dc.identifier.issn	2162-237X
dc.identifier.issn	2162-2388
dc.identifier.uri	http://hdl.handle.net/10453/180313
dc.description.abstract	It is challenging to train an efficient learning procedure with multiagent reinforcement learning (MARL) when the number of agents increases as the observation space exponentially expands, especially in large-scale multiagent systems. In this article, we proposed a scalable attentive transfer framework (SATF) for efficient MARL, which achieved goals faster and more accurately in homogeneous and heterogeneous combat tasks by transferring learned knowledge from a small number of agents (4) to a large number of agents (up to 64). To reduce and align the dimensionality of the observed state variations caused by increasing numbers of agents, the proposed SATF deployed a novel state representation network with a self-attention mechanism, known as dynamic observation representation network (DorNet), to extract the dominant observed information with excellent cost-effectiveness. The experiments on the MAgent platform showed that the SATF outperformed the distributed MARL (independent Q-learning (IQL) and A2C) in task sequences from 8 to 64 agents. The experiments on StarCraft II showed that the SATF demonstrated superior performance relative to the centralized training with decentralized execution MARL (QMIX) by presenting shorter training steps, achieving a desired win rate of up to approximately 90% when increasing the number of agents from 4 to 32. The findings of our study showed the great potential for enhancing the efficiency of MARL training in large-scale agent combat missions.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartof	IEEE Trans Neural Netw Learn Syst
dc.relation.isbasedon	10.1109/TNNLS.2024.3387397
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	SATF: A Scalable Attentive Transfer Framework for Efficient Multiagent Reinforcement Learning.
dc.type	Journal Article
utslib.citation.volume	PP
utslib.location.activity	United States
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2024-08-07T05:26:42Z
pubs.issue	99
pubs.publication-status	Published online
pubs.volume	PP
utslib.citation.issue	99

Abstract:

It is challenging to train an efficient learning procedure with multiagent reinforcement learning (MARL) when the number of agents increases as the observation space exponentially expands, especially in large-scale multiagent systems. In this article, we proposed a scalable attentive transfer framework (SATF) for efficient MARL, which achieved goals faster and more accurately in homogeneous and heterogeneous combat tasks by transferring learned knowledge from a small number of agents (4) to a large number of agents (up to 64). To reduce and align the dimensionality of the observed state variations caused by increasing numbers of agents, the proposed SATF deployed a novel state representation network with a self-attention mechanism, known as dynamic observation representation network (DorNet), to extract the dominant observed information with excellent cost-effectiveness. The experiments on the MAgent platform showed that the SATF outperformed the distributed MARL (independent Q-learning (IQL) and A2C) in task sequences from 8 to 64 agents. The experiments on StarCraft II showed that the SATF demonstrated superior performance relative to the centralized training with decentralized execution MARL (QMIX) by presenting shorter training steps, achieving a desired win rate of up to approximately 90% when increasing the number of agents from 4 to 32. The findings of our study showed the great potential for enhancing the efficiency of MARL training in large-scale agent combat missions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/180313