Multi-Agent Multi-Armed Bandit Learning for Online Management of Edge-Assisted Computing

Wu, B; Chen, T; Ni, W; Wang, X

Multi-Agent Multi-Armed Bandit Learning for Online Management of Edge-Assisted Computing

Wu, B Chen, T Ni, W

Wang, X

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Communications, 2021, 69, (12), pp. 8188-8199
Issue Date:: 2021-12-01

Closed Access

	Filename	Description	Size
	Multi-Agent_Multi-Armed_Bandit_Learning_for_Online_Management_of_Edge-Assisted_Computing.pdf	Published version	1.39 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wu, B
dc.contributor.author	Chen, T
dc.contributor.author	Ni, W https://orcid.org/0000-0002-4933-594X
dc.contributor.author	Wang, X
dc.date.accessioned	2022-05-12T03:42:55Z
dc.date.available	2022-05-12T03:42:55Z
dc.date.issued	2021-12-01
dc.identifier.citation	IEEE Transactions on Communications, 2021, 69, (12), pp. 8188-8199
dc.identifier.issn	0090-6778
dc.identifier.issn	1558-0857
dc.identifier.uri	http://hdl.handle.net/10453/157272
dc.description.abstract	By orchestrating resources of edge and core network, the delays of edge-assisted computing can decrease. Offloading scheduling is challenging though, especially in the presence of many edge devices with randomly varying link and computing conditions. This paper presents a new online learning-based approach to the offloading scheduling, where multi-agent multi-armed bandit (MA-MAB) learning is designed to exploit the randomly varying conditions and asymptotically minimize the computing delay. We first propose a combinatorial bandit upper confidence bound (CB-UCB) algorithm, where users collectively feed back the observed delays of all edge devices and links. The optimistic bound of the delay is derived to facilitate centralized offloading scheduling for all users. In addition, we put forth a distributed bandit upper confidence bound (DB-UCB) algorithm, where users take random turns to make conflict-free, distributed selections of edge devices. The optimistic confidence bound of each user is developed to allow the user's selection only based on its own observations and decisions. Furthermore, we establish the asymptotic optimality of the proposed algorithms by proving the sublinearity of their regrets, and that the random turns the users take to make decisions do not compromise the asymptotic optimality of the DB-UCB algorithm, as corroborated by numerical simulations.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Communications
dc.relation.isbasedon	10.1109/TCOMM.2021.3113386
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0804 Data Format, 0906 Electrical and Electronic Engineering, 1005 Communications Technologies
dc.title	Multi-Agent Multi-Armed Bandit Learning for Online Management of Edge-Assisted Computing
dc.type	Journal Article
utslib.citation.volume	69
utslib.for	0804 Data Format
utslib.for	0906 Electrical and Electronic Engineering
utslib.for	1005 Communications Technologies
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
dc.date.updated	2022-05-12T03:42:53Z
pubs.issue	12
pubs.publication-status	Published
pubs.volume	69
utslib.citation.issue	12

Abstract:

By orchestrating resources of edge and core network, the delays of edge-assisted computing can decrease. Offloading scheduling is challenging though, especially in the presence of many edge devices with randomly varying link and computing conditions. This paper presents a new online learning-based approach to the offloading scheduling, where multi-agent multi-armed bandit (MA-MAB) learning is designed to exploit the randomly varying conditions and asymptotically minimize the computing delay. We first propose a combinatorial bandit upper confidence bound (CB-UCB) algorithm, where users collectively feed back the observed delays of all edge devices and links. The optimistic bound of the delay is derived to facilitate centralized offloading scheduling for all users. In addition, we put forth a distributed bandit upper confidence bound (DB-UCB) algorithm, where users take random turns to make conflict-free, distributed selections of edge devices. The optimistic confidence bound of each user is developed to allow the user's selection only based on its own observations and decisions. Furthermore, we establish the asymptotic optimality of the proposed algorithms by proving the sublinearity of their regrets, and that the random turns the users take to make decisions do not compromise the asymptotic optimality of the DB-UCB algorithm, as corroborated by numerical simulations.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/157272