Coordination, Detection and Collaboration in Deep Multi-Agent Reinforcement Learning

Siu, Chapman

Coordination, Detection and Collaboration in Deep Multi-Agent Reinforcement Learning

Siu, Chapman

Permalink

Publication Type:: Thesis
Issue Date:: 2023

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (13.44 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Siu, Chapman
dc.date.accessioned	2023-09-27T03:16:41Z
dc.date.available	2023-09-27T03:16:41Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/10453/172320
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Numerous real-world problems including controlling autonomous self-driving cars, drones, and robot swarm control are typically formulated as a cooperative multi-agent reinforcement learning (MARL) challenge. In cooperative MARL, one key challenge is handling the enormous joint action space. This joint action space increases at an exponential rate relative to the agents present in the MARL task. For instance, if there are six agents which have nine actions to select from, this will yield a space representing the joint actions which exceeds over ten million possibilities. Effective MARL methods must be capable of operating over a large joint action space, particularly when the application of traditional single-agent reinforcement learning methods are not practically viable. In the first section, we examine the usage of dynamic coordination graphs for MARL called Dynamical Q-value Coordination Graph (QCGraph). QCGraph aims to dynamically represent and generalise through factorising the combined value function of all agents. This is achieved by using global information generated from analysing subsets of agents. This allows agents to learn coordination even when conducting decentralised execution. The second part of the thesis investigates challenges around detection of distribution shift in reinforcement learning which occurs when attempting to use offline experiences which may be sub-optimal. Specifically we explore Dual Behavioural Regularised Actor Critic (DBR) to train agents to gravitate towards “good” experiences, whilst avoiding “bad” ones. This is achieved through constructing two behavioural policies to model “good” and “bad” experiences and constraining the resulting learning process of the policy to ensure the distributional shift is bounded. Although we can apply the general concepts in DBR to multi-agent reinforcement learning, we can also use experiences of other agents. We explore this in Multi-Agent regularised Q-learning where we examine mechanisms for correcting for the differences in experiences across agents based on their underlying policy distribution when sharing experiences. In the final part of the thesis we explore the task of collaboration, where we allow agents to behave in ways which purposely neglect the coordination aspect of mixing networks which are commonly used in MARL algorithms. To examine this, we first explore neural network ensemble approaches which can learn disjointly and provide theoretical guarantees in the boosting framework. We use this ensemble learning approach in MARL in an approach we call Greedy UnMixing which allows us to combine the mixer and individual networks such that the mixing network can be “unmixed”, thereby allowing agents to become uncoordinated explicitly.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/172320/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2023 Chapman Siu
dc.rights	au.edu.uts.lib/cph
dc.title	Coordination, Detection and Collaboration in Deep Multi-Agent Reinforcement Learning	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Numerous real-world problems including controlling autonomous self-driving cars, drones, and robot swarm control are typically formulated as a cooperative multi-agent reinforcement learning (MARL) challenge. In cooperative MARL, one key challenge is handling the enormous joint action space. This joint action space increases at an exponential rate relative to the agents present in the MARL task. For instance, if there are six agents which have nine actions to select from, this will yield a space representing the joint actions which exceeds over ten million possibilities. Effective MARL methods must be capable of operating over a large joint action space, particularly when the application of traditional single-agent reinforcement learning methods are not practically viable. In the first section, we examine the usage of dynamic coordination graphs for MARL called Dynamical Q-value Coordination Graph (QCGraph). QCGraph aims to dynamically represent and generalise through factorising the combined value function of all agents. This is achieved by using global information generated from analysing subsets of agents. This allows agents to learn coordination even when conducting decentralised execution. The second part of the thesis investigates challenges around detection of distribution shift in reinforcement learning which occurs when attempting to use offline experiences which may be sub-optimal. Specifically we explore Dual Behavioural Regularised Actor Critic (DBR) to train agents to gravitate towards “good” experiences, whilst avoiding “bad” ones. This is achieved through constructing two behavioural policies to model “good” and “bad” experiences and constraining the resulting learning process of the policy to ensure the distributional shift is bounded. Although we can apply the general concepts in DBR to multi-agent reinforcement learning, we can also use experiences of other agents. We explore this in Multi-Agent regularised Q-learning where we examine mechanisms for correcting for the differences in experiences across agents based on their underlying policy distribution when sharing experiences. In the final part of the thesis we explore the task of collaboration, where we allow agents to behave in ways which purposely neglect the coordination aspect of mixing networks which are commonly used in MARL algorithms. To examine this, we first explore neural network ensemble approaches which can learn disjointly and provide theoretical guarantees in the boosting framework. We use this ensemble learning approach in MARL in an approach we call Greedy UnMixing which allows us to combine the mixer and individual networks such that the mixing network can be “unmixed”, thereby allowing agents to become uncoordinated explicitly.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/172320