Bayesian Optimization Enhanced Deep Reinforcement Learning for Trajectory Planning and Network Formation in Multi-UAV Networks

Gong, S; Wang, M; Gu, B; Zhang, W; Hoang, DT; Niyato, D

Bayesian Optimization Enhanced Deep Reinforcement Learning for Trajectory Planning and Network Formation in Multi-UAV Networks

Gong, S Wang, M Gu, B Zhang, W Hoang, DT Niyato, D

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Vehicular Technology, 2023, PP, (99), pp. 1-16
Issue Date:: 2023-01-01

Embargoed

	Filename	Description	Size
	Bayesian Optimization Enhanced Deep Reinforcement Learning for Trajectory Planning and Network Formation in Multi-UAV Networks.pdf	Accepted version	2.56 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Embargoed
Open Access

This item is currently unavailable due to the publisher's embargo.

The embargo period expires on 29 Mar 2025

Full metadata record

Field	Value	Language
dc.contributor.author	Gong, S
dc.contributor.author	Wang, M
dc.contributor.author	Gu, B
dc.contributor.author	Zhang, W
dc.contributor.author	Hoang, DT
dc.contributor.author	Niyato, D
dc.date.accessioned	2023-08-27T08:55:05Z
dc.date.available	2023-08-27T08:55:05Z
dc.date.issued	2023-01-01
dc.identifier.citation	IEEE Transactions on Vehicular Technology, 2023, PP, (99), pp. 1-16
dc.identifier.issn	0018-9545
dc.identifier.issn	1939-9359
dc.identifier.uri	http://hdl.handle.net/10453/171838
dc.description.abstract	In this paper, we employ multiple UAVs coordinated by a base station (BS) to help the ground users (GUs) to offload their sensing data. Different UAVs can adapt their trajectories and network formation to expedite data transmissions via multi-hop relaying. The trajectory planning aims to collect all GUs' data, while the UAVs' network formation optimizes the multi-hop UAV network topology to minimize the energy consumption and transmission delay. The joint network formation and trajectory optimization is solved by a two-step iterative approach. Firstly, we devise the adaptive network formation scheme by using a heuristic algorithm to balance the UAVs' energy consumption and data queue size. Then, with the fixed network formation, the UAVs' trajectories are further optimized by using multi-agent deep reinforcement learning without knowing the GUs' traffic demands and spatial distribution. To improve the learning efficiency, we further employ Bayesian optimization to estimate the UAVs' flying decisions based on historical trajectory points. This helps avoid inefficient action explorations and improves the convergence rate in the model training. The simulation results reveal close spatial-temporal couplings between the UAVs' trajectory planning and network formation. Compared with several baselines, our solution can better exploit the UAVs' cooperation in data offloading, thus improving energy efficiency and delay performance.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Vehicular Technology
dc.relation.isbasedon	10.1109/TVT.2023.3262778
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 10 Technology
dc.subject.classification	Automobile Design & Engineering
dc.subject.classification	40 Engineering
dc.subject.classification	46 Information and computing sciences
dc.title	Bayesian Optimization Enhanced Deep Reinforcement Learning for Trajectory Planning and Network Formation in Multi-UAV Networks
dc.type	Journal Article
utslib.citation.volume	PP
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	10 Technology
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	embargoed	*
utslib.copyright.embargo	2025-03-29T00:00:00+1000Z
dc.date.updated	2023-08-27T08:55:01Z
pubs.issue	99
pubs.publication-status	Published
pubs.volume	PP
utslib.citation.issue	99

Abstract:

In this paper, we employ multiple UAVs coordinated by a base station (BS) to help the ground users (GUs) to offload their sensing data. Different UAVs can adapt their trajectories and network formation to expedite data transmissions via multi-hop relaying. The trajectory planning aims to collect all GUs' data, while the UAVs' network formation optimizes the multi-hop UAV network topology to minimize the energy consumption and transmission delay. The joint network formation and trajectory optimization is solved by a two-step iterative approach. Firstly, we devise the adaptive network formation scheme by using a heuristic algorithm to balance the UAVs' energy consumption and data queue size. Then, with the fixed network formation, the UAVs' trajectories are further optimized by using multi-agent deep reinforcement learning without knowing the GUs' traffic demands and spatial distribution. To improve the learning efficiency, we further employ Bayesian optimization to estimate the UAVs' flying decisions based on historical trajectory points. This helps avoid inefficient action explorations and improves the convergence rate in the model training. The simulation results reveal close spatial-temporal couplings between the UAVs' trajectory planning and network formation. Compared with several baselines, our solution can better exploit the UAVs' cooperation in data offloading, thus improving energy efficiency and delay performance.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/171838