Online Reinforcement Learning for Beam Tracking and Rate Adaptation in Millimeter-Wave Systems
- Publisher:
- IEEE COMPUTER SOC
- Publication Type:
- Journal Article
- Citation:
- IEEE Transactions on Mobile Computing, 2024, 23, (2), pp. 1830-1845
- Issue Date:
- 2024-02-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
1633972.pdf | Published version | 2.15 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
In this article, we propose MAMBA, a restless multi-armed bandit framework for beam tracking in directional millimeter-wave (mmW) cellular systems. Instead of relying on explicit control messages, MAMBA utilizes the ACK/NACK packets transmitted by user equipments (UEs) to the base station (BS) as a part of the hybrid automatic repeat request (HARQ) procedure. These packets are used to measure the quality of the currently operating downlink beam, and select a new downlink beam along with an appropriate modulation and coding scheme (MCS) for future transmissions. At its core, MAMBA implements an online reinforcement learning technique called adaptive Thompson sampling (ATS), which determines a good beam and associated MCS to be used for the upcoming transmissions. To evaluate MAMBA's performance, we conduct extensive simulations and over-the-air (OTA) experiments over the 28 GHz band using phased-array antennas. We study fixed- as well as adaptive-rate variants of MAMBA, and contrast it with four other beam tracking strategies: a beam selection scheme similar to the one used in 5G NR (called 'static oracle'), a theoretically optimal but practically infeasible beam tracking scheme (called 'dynamic oracle'), an ϵ-greedy algorithm (Mohamed 2021), and the Unimodal Beam Alignment (UBA) algorithm (Hashemi et al. 2018). Our results show that MAMBA achieves 182% throughput gain over the 'static oracle' and is reasonably close to the throughput of the 'dynamic oracle'. Compared to UBA, MAMBA achieves 25-35% gain in throughput, depending on UE mobility. Finally, when operated at a fixed MCS, MAMBA/ATS achieves 21% gain over the ϵ-greedy algorithm at the lowest applied MCS index, and 255% gain at the highest MCS index.
Please use this identifier to cite or link to this item: