Coordination, Detection and Collaboration in Deep Multi-Agent Reinforcement Learning

Publication Type:
Thesis
Issue Date:
2023
Full metadata record
Numerous real-world problems including controlling autonomous self-driving cars, drones, and robot swarm control are typically formulated as a cooperative multi-agent reinforcement learning (MARL) challenge. In cooperative MARL, one key challenge is handling the enormous joint action space. This joint action space increases at an exponential rate relative to the agents present in the MARL task. For instance, if there are six agents which have nine actions to select from, this will yield a space representing the joint actions which exceeds over ten million possibilities. Effective MARL methods must be capable of operating over a large joint action space, particularly when the application of traditional single-agent reinforcement learning methods are not practically viable. In the first section, we examine the usage of dynamic coordination graphs for MARL called Dynamical Q-value Coordination Graph (QCGraph). QCGraph aims to dynamically represent and generalise through factorising the combined value function of all agents. This is achieved by using global information generated from analysing subsets of agents. This allows agents to learn coordination even when conducting decentralised execution. The second part of the thesis investigates challenges around detection of distribution shift in reinforcement learning which occurs when attempting to use offline experiences which may be sub-optimal. Specifically we explore Dual Behavioural Regularised Actor Critic (DBR) to train agents to gravitate towards “good” experiences, whilst avoiding “bad” ones. This is achieved through constructing two behavioural policies to model “good” and “bad” experiences and constraining the resulting learning process of the policy to ensure the distributional shift is bounded. Although we can apply the general concepts in DBR to multi-agent reinforcement learning, we can also use experiences of other agents. We explore this in Multi-Agent regularised Q-learning where we examine mechanisms for correcting for the differences in experiences across agents based on their underlying policy distribution when sharing experiences. In the final part of the thesis we explore the task of collaboration, where we allow agents to behave in ways which purposely neglect the coordination aspect of mixing networks which are commonly used in MARL algorithms. To examine this, we first explore neural network ensemble approaches which can learn disjointly and provide theoretical guarantees in the boosting framework. We use this ensemble learning approach in MARL in an approach we call Greedy UnMixing which allows us to combine the mixer and individual networks such that the mixing network can be “unmixed”, thereby allowing agents to become uncoordinated explicitly.
Please use this identifier to cite or link to this item: