Deep Reinforcement Learning for Real-Time Trajectory Planning in UAV Networks

Li, K; Ni, W; Tovar, E; Guizani, M

Deep Reinforcement Learning for Real-Time Trajectory Planning in UAV Networks

Li, K Ni, W

Tovar, E Guizani, M

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2020 International Wireless Communications and Mobile Computing (IWCMC), 2020, 00, pp. 958-963
Issue Date:: 2020-07-27

Closed Access

	Filename	Description	Size
	09148316.pdf		24.92 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, K
dc.contributor.author	Ni, W https://orcid.org/0000-0002-4933-594X
dc.contributor.author	Tovar, E
dc.contributor.author	Guizani, M
dc.date	2020-06-15
dc.date.accessioned	2021-06-10T01:34:13Z
dc.date.available	2021-06-10T01:34:13Z
dc.date.issued	2020-07-27
dc.identifier.citation	2020 International Wireless Communications and Mobile Computing (IWCMC), 2020, 00, pp. 958-963
dc.identifier.isbn	978-1-7281-3129-0
dc.identifier.issn	2376-6492
dc.identifier.issn	2376-6506
dc.identifier.uri	http://hdl.handle.net/10453/149479
dc.description.abstract	In Unmanned Aerial Vehicle (UAV)-enabled wireless powered sensor networks, a UAV can be employed to charge the ground sensors remotely via Wireless Power Transfer (WPT) and collect the sensory data. This paper focuses on trajectory planning of the UAV for aerial data collection and WPT to minimize buffer overflow at the ground sensors and unsuccessful transmission due to lossy airborne channels. Consider network states of battery levels and buffer lengths of the ground sensors, channel conditions, and location of the UAV. A flight trajectory planning optimization is formulated as a Partial Observable Markov Decision Process (POMDP), where the UAV has partial observation of the network states. In practice, the UAV-enabled sensor network contains a large number of network states and actions in POMDP while the up-to-date knowledge of the network states is not available at the UAV. To address these issues, we propose an onboard deep reinforcement learning algorithm to optimize the realtime trajectory planning of the UAV given outdated knowledge on the network states.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2020 International Wireless Communications and Mobile Computing (IWCMC)
dc.relation.ispartof	International Wireless Communications and Mobile Computing
dc.relation.isbasedon	10.1109/iwcmc48107.2020.9148316
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Deep Reinforcement Learning for Real-Time Trajectory Planning in UAV Networks
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Limassol, Cyprus
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2021-06-10T01:33:52Z
pubs.finish-date	2020-06-19
pubs.publication-status	Published
pubs.start-date	2020-06-15
pubs.volume	00

Abstract:

In Unmanned Aerial Vehicle (UAV)-enabled wireless powered sensor networks, a UAV can be employed to charge the ground sensors remotely via Wireless Power Transfer (WPT) and collect the sensory data. This paper focuses on trajectory planning of the UAV for aerial data collection and WPT to minimize buffer overflow at the ground sensors and unsuccessful transmission due to lossy airborne channels. Consider network states of battery levels and buffer lengths of the ground sensors, channel conditions, and location of the UAV. A flight trajectory planning optimization is formulated as a Partial Observable Markov Decision Process (POMDP), where the UAV has partial observation of the network states. In practice, the UAV-enabled sensor network contains a large number of network states and actions in POMDP while the up-to-date knowledge of the network states is not available at the UAV. To address these issues, we propose an onboard deep reinforcement learning algorithm to optimize the realtime trajectory planning of the UAV given outdated knowledge on the network states.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/149479