Federated Multi-Agent Deep Reinforcement Learning for Resource Allocation of Vehicle-to-Vehicle Communications

Li, X; Lu, L; Ni, W; Jamalipour, A; Zhang, D; Du, H

Federated Multi-Agent Deep Reinforcement Learning for Resource Allocation of Vehicle-to-Vehicle Communications

Li, X Lu, L Ni, W

Jamalipour, A Zhang, D Du, H

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Vehicular Technology, 2022, 71, (8), pp. 8810-8824
Issue Date:: 2022-08-01

Closed Access

	Filename	Description	Size
	Federated Multi-Agent Deep Reinforcement Learning for Resource Allocation of Vehicle-to-Vehicle Communications.pdf	Published version	2.61 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, X
dc.contributor.author	Lu, L
dc.contributor.author	Ni, W https://orcid.org/0000-0002-4933-594X
dc.contributor.author	Jamalipour, A
dc.contributor.author	Zhang, D
dc.contributor.author	Du, H
dc.date.accessioned	2023-04-11T04:16:31Z
dc.date.available	2023-04-11T04:16:31Z
dc.date.issued	2022-08-01
dc.identifier.citation	IEEE Transactions on Vehicular Technology, 2022, 71, (8), pp. 8810-8824
dc.identifier.issn	0018-9545
dc.identifier.issn	1939-9359
dc.identifier.uri	http://hdl.handle.net/10453/169575
dc.description.abstract	Dynamic topology, fast-changing channels and the time sensitivity of safety-related services present challenges to the status quo of resource allocation for cellular-underlaying vehicle-to-vehicle (V2V) communications. In this paper, we investigate a novel federated multi-agent deep reinforcement learning (FedMARL) approach for the decentralized joint optimization of channel selection and power control for V2V communication. The approach takes advantage of both deep reinforcement learning (DRL) and federated learning (FL), satisfying the reliability and delay requirements of V2V communication while maximizing the transmit rates of cellular links. Specifically, we elaborately construct individual V2V agent implement by the dueling double deep Q-network (D3QN), and design the reward function to train V2V agents collaboratively. As a result, each agent individually optimizes channel selection and power level based on its local observations, including the instantaneous channel state information (CSI) of corresponding V2V link, the instantaneous co-channel interference from the cellular link, the previous channels selections of nearby V2V pairs, and the queue backlog at the V2V transmitter. Another important aspect is that we incorporate FL to alleviate the training instability problem induced by cooperative multi-agent environment. The local DRL models of different V2V agents are federated periodically, addressing the limitations of partial observability on the entire network status for individual agent, and accelerating the training process of multi-agent learning. Validated via simulations, the proposed FedMARL scheme shows superiority to the baselines in terms of the cellular sum-rate and the V2V packet delivery rate.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Vehicular Technology
dc.relation.isbasedon	10.1109/TVT.2022.3173057
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 10 Technology
dc.subject.classification	Automobile Design & Engineering
dc.title	Federated Multi-Agent Deep Reinforcement Learning for Resource Allocation of Vehicle-to-Vehicle Communications
dc.type	Journal Article
utslib.citation.volume	71
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	10 Technology
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
dc.date.updated	2023-04-11T04:16:30Z
pubs.issue	8
pubs.publication-status	Published
pubs.volume	71
utslib.citation.issue	8

Abstract:

Dynamic topology, fast-changing channels and the time sensitivity of safety-related services present challenges to the status quo of resource allocation for cellular-underlaying vehicle-to-vehicle (V2V) communications. In this paper, we investigate a novel federated multi-agent deep reinforcement learning (FedMARL) approach for the decentralized joint optimization of channel selection and power control for V2V communication. The approach takes advantage of both deep reinforcement learning (DRL) and federated learning (FL), satisfying the reliability and delay requirements of V2V communication while maximizing the transmit rates of cellular links. Specifically, we elaborately construct individual V2V agent implement by the dueling double deep Q-network (D3QN), and design the reward function to train V2V agents collaboratively. As a result, each agent individually optimizes channel selection and power level based on its local observations, including the instantaneous channel state information (CSI) of corresponding V2V link, the instantaneous co-channel interference from the cellular link, the previous channels selections of nearby V2V pairs, and the queue backlog at the V2V transmitter. Another important aspect is that we incorporate FL to alleviate the training instability problem induced by cooperative multi-agent environment. The local DRL models of different V2V agents are federated periodically, addressing the limitations of partial observability on the entire network status for individual agent, and accelerating the training process of multi-agent learning. Validated via simulations, the proposed FedMARL scheme shows superiority to the baselines in terms of the cellular sum-rate and the V2V packet delivery rate.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/169575