Online Reinforcement Learning for Beam Tracking and Rate Adaptation in Millimeter-Wave Systems

Krunz, M; Aykin, I; Sarkar, S; Akgun, B

Online Reinforcement Learning for Beam Tracking and Rate Adaptation in Millimeter-Wave Systems

Krunz, M Aykin, I Sarkar, S Akgun, B

Permalink

Publisher:: IEEE COMPUTER SOC
Publication Type:: Journal Article
Citation:: IEEE Transactions on Mobile Computing, 2024, 23, (2), pp. 1830-1845
Issue Date:: 2024-02-01

Closed Access

	Filename	Description	Size
	1633972.pdf	Published version	2.15 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Krunz, M
dc.contributor.author	Aykin, I
dc.contributor.author	Sarkar, S
dc.contributor.author	Akgun, B
dc.date.accessioned	2024-08-21T05:21:18Z
dc.date.available	2024-08-21T05:21:18Z
dc.date.issued	2024-02-01
dc.identifier.citation	IEEE Transactions on Mobile Computing, 2024, 23, (2), pp. 1830-1845
dc.identifier.issn	1536-1233
dc.identifier.issn	1558-0660
dc.identifier.uri	http://hdl.handle.net/10453/180462
dc.description.abstract	In this article, we propose MAMBA, a restless multi-armed bandit framework for beam tracking in directional millimeter-wave (mmW) cellular systems. Instead of relying on explicit control messages, MAMBA utilizes the ACK/NACK packets transmitted by user equipments (UEs) to the base station (BS) as a part of the hybrid automatic repeat request (HARQ) procedure. These packets are used to measure the quality of the currently operating downlink beam, and select a new downlink beam along with an appropriate modulation and coding scheme (MCS) for future transmissions. At its core, MAMBA implements an online reinforcement learning technique called adaptive Thompson sampling (ATS), which determines a good beam and associated MCS to be used for the upcoming transmissions. To evaluate MAMBA's performance, we conduct extensive simulations and over-the-air (OTA) experiments over the 28 GHz band using phased-array antennas. We study fixed- as well as adaptive-rate variants of MAMBA, and contrast it with four other beam tracking strategies: a beam selection scheme similar to the one used in 5G NR (called 'static oracle'), a theoretically optimal but practically infeasible beam tracking scheme (called 'dynamic oracle'), an ϵ-greedy algorithm (Mohamed 2021), and the Unimodal Beam Alignment (UBA) algorithm (Hashemi et al. 2018). Our results show that MAMBA achieves 182% throughput gain over the 'static oracle' and is reasonably close to the throughput of the 'dynamic oracle'. Compared to UBA, MAMBA achieves 25-35% gain in throughput, depending on UE mobility. Finally, when operated at a fixed MCS, MAMBA/ATS achieves 21% gain over the ϵ-greedy algorithm at the lowest applied MCS index, and 255% gain at the highest MCS index.
dc.language	English
dc.publisher	IEEE COMPUTER SOC
dc.relation.ispartof	IEEE Transactions on Mobile Computing
dc.relation.isbasedon	10.1109/TMC.2023.3243910
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0805 Distributed Computing, 0906 Electrical and Electronic Engineering, 1005 Communications Technologies
dc.subject.classification	Networking & Telecommunications
dc.subject.classification	4006 Communications engineering
dc.subject.classification	4604 Cybersecurity and privacy
dc.subject.classification	4606 Distributed computing and systems software
dc.title	Online Reinforcement Learning for Beam Tracking and Rate Adaptation in Millimeter-Wave Systems
dc.type	Journal Article
utslib.citation.volume	23
utslib.for	0805 Distributed Computing
utslib.for	0906 Electrical and Electronic Engineering
utslib.for	1005 Communications Technologies
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2024-08-21T05:21:16Z
pubs.issue	2
pubs.publication-status	Published
pubs.volume	23
utslib.citation.issue	2

Abstract:

In this article, we propose MAMBA, a restless multi-armed bandit framework for beam tracking in directional millimeter-wave (mmW) cellular systems. Instead of relying on explicit control messages, MAMBA utilizes the ACK/NACK packets transmitted by user equipments (UEs) to the base station (BS) as a part of the hybrid automatic repeat request (HARQ) procedure. These packets are used to measure the quality of the currently operating downlink beam, and select a new downlink beam along with an appropriate modulation and coding scheme (MCS) for future transmissions. At its core, MAMBA implements an online reinforcement learning technique called adaptive Thompson sampling (ATS), which determines a good beam and associated MCS to be used for the upcoming transmissions. To evaluate MAMBA's performance, we conduct extensive simulations and over-the-air (OTA) experiments over the 28 GHz band using phased-array antennas. We study fixed- as well as adaptive-rate variants of MAMBA, and contrast it with four other beam tracking strategies: a beam selection scheme similar to the one used in 5G NR (called 'static oracle'), a theoretically optimal but practically infeasible beam tracking scheme (called 'dynamic oracle'), an ϵ-greedy algorithm (Mohamed 2021), and the Unimodal Beam Alignment (UBA) algorithm (Hashemi et al. 2018). Our results show that MAMBA achieves 182% throughput gain over the 'static oracle' and is reasonably close to the throughput of the 'dynamic oracle'. Compared to UBA, MAMBA achieves 25-35% gain in throughput, depending on UE mobility. Finally, when operated at a fixed MCS, MAMBA/ATS achieves 21% gain over the ϵ-greedy algorithm at the lowest applied MCS index, and 255% gain at the highest MCS index.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/180462