LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UAV-Assisted Sensor Network

Li, K; Ni, W; Dressler, F

LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UAV-Assisted Sensor Network

Li, K Ni, W

Dressler, F

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Internet of Things Journal, 2022, 9, (6), pp. 4179-4189
Issue Date:: 2022-03-15

Closed Access

	Filename	Description	Size
	LSTM-Characterized_Deep_Reinforcement_Learning_for_Continuous_Flight_Control_and_Resource_Allocation_in_UAV-Assisted_Sensor_Network.pdf	Published version	2.65 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, K
dc.contributor.author	Ni, W https://orcid.org/0000-0002-4933-594X
dc.contributor.author	Dressler, F
dc.date.accessioned	2023-04-05T04:41:08Z
dc.date.available	2023-04-05T04:41:08Z
dc.date.issued	2022-03-15
dc.identifier.citation	IEEE Internet of Things Journal, 2022, 9, (6), pp. 4179-4189
dc.identifier.issn	2327-4662
dc.identifier.issn	2327-4662
dc.identifier.uri	http://hdl.handle.net/10453/169190
dc.description.abstract	Unmanned aerial vehicles (UAVs) can be employed to collect sensory data in remote wireless sensor networks (WSNs). Due to UAV's maneuvering, scheduling a sensor device to transmit data can overflow data buffers of the unscheduled ground devices. Moreover, lossy airborne channels can result in packet reception errors at the scheduled sensor. This article proposes a new deep reinforcement learning-based flight resource allocation framework (DeFRA) to minimize the overall data packet loss in a continuous action space. DeFRA is based on deep deterministic policy gradient (DDPG), optimally controls instantaneous headings and speeds of the UAV, and selects the ground device for data collection. Furthermore, a state characterization layer, leveraging long short-term memory (LSTM), is developed to predict network dynamics, resulting from time-varying airborne channels and energy arrivals at the ground devices. To validate the effectiveness of DeFRA, experimental data collected from a real-world UAV testbed and energy harvesting WSN are utilized to train the actions of the UAV. Numerical results demonstrate that the proposed DeFRA achieves a fast convergence while reducing the packet loss by over 15%, as compared to the existing deep reinforcement learning solutions.
dc.language	English
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartof	IEEE Internet of Things Journal
dc.relation.isbasedon	10.1109/JIOT.2021.3102831
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0805 Distributed Computing, 1005 Communications Technologies
dc.title	LSTM-Characterized Deep Reinforcement Learning for Continuous Flight Control and Resource Allocation in UAV-Assisted Sensor Network
dc.type	Journal Article
utslib.citation.volume	9
utslib.for	0805 Distributed Computing
utslib.for	1005 Communications Technologies
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2023-04-05T04:41:06Z
pubs.issue	6
pubs.publication-status	Published
pubs.volume	9
utslib.citation.issue	6

Abstract:

Unmanned aerial vehicles (UAVs) can be employed to collect sensory data in remote wireless sensor networks (WSNs). Due to UAV's maneuvering, scheduling a sensor device to transmit data can overflow data buffers of the unscheduled ground devices. Moreover, lossy airborne channels can result in packet reception errors at the scheduled sensor. This article proposes a new deep reinforcement learning-based flight resource allocation framework (DeFRA) to minimize the overall data packet loss in a continuous action space. DeFRA is based on deep deterministic policy gradient (DDPG), optimally controls instantaneous headings and speeds of the UAV, and selects the ground device for data collection. Furthermore, a state characterization layer, leveraging long short-term memory (LSTM), is developed to predict network dynamics, resulting from time-varying airborne channels and energy arrivals at the ground devices. To validate the effectiveness of DeFRA, experimental data collected from a real-world UAV testbed and energy harvesting WSN are utilized to train the actions of the UAV. Numerical results demonstrate that the proposed DeFRA achieves a fast convergence while reducing the packet loss by over 15%, as compared to the existing deep reinforcement learning solutions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/169190