Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings

Abstract : Setting up the future Internet of Things (IoT) networks will require to support more and more communicating devices. We prove that intelligent devices in unlicensed bands can use Multi-Armed Bandit (MAB) learning algorithms to improve resource exploitation. We evaluate the performance of two classical MAB learning algorithms, UCB1 and Thompson Sampling, to handle the decentralized decision-making of Spectrum Access, applied to IoT networks; as well as learning performance with a growing number of intelligent end-devices. We show that using learning algorithms does help to fit more devices in such networks, even when all end-devices are intelligent and are dynamically changing channel. In the studied scenario, stochastic MAB learning provides a up to 16% gain in term of successful transmission probabilities, and has near optimal performance even in non-stationary and non-i.i.d. settings with a majority of intelligent devices.
Liste complète des métadonnées
Contributor : Rémi Bonnefoi <>
Submitted on : Saturday, August 19, 2017 - 9:31:18 PM
Last modification on : Thursday, March 21, 2019 - 2:50:30 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License


  • HAL Id : hal-01575419, version 1



Rémi Bonnefoi, Lilian Besson, Christophe Moy, Emilie Kaufmann, Jacques Palicot. Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings. CROWNCOM 2017 - 12th EAI International Conference on Cognitive Radio Oriented Wireless Networks, Sep 2017, Lisbon, Portugal. ⟨hal-01575419v1⟩



Record views


Files downloads