Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings

Setting up the future Internet of Things (IoT) networks will require to support more and more communicating devices. We prove that intelligent devices in unlicensed bands can use Multi-Armed Bandit (MAB) learning algorithms to improve resource exploitation. We evaluate the performance of two classical MAB learning algorithms, UCB1 and Thompson Sampling, to handle the decentralized decision-making of Spectrum Access, applied to IoT networks; as well as learning performance with a growing number of intelligent end-devices. We show that using learning algorithms does help to fit more devices in such networks, even when all end-devices are intelligent and are dynamically changing channel. In the studied scenario, stochastic MAB learning provides a up to 16% gain in term of successful transmission probabilities, and has near optimal performance even in non-stationary and non-i.i.d. settings with a majority of intelligent devices.

La mise en place des futurs réseaux Internet des Objets (IoT) nécessitera de supporter de plus en plus d'appareils communicants. Nous prouvons que les objets adaptatifs, dans des bandes non licenciées, peuvent utiliser les algorithmes d'apprentissage de type Bandit Multi-Bras (MAB) pour améliorer l'exploitation des ressources. Nous évaluons les performances de deux algorithmes classiques d'apprentissage MAB, UCB1 et Thompson Sampling, pour prendre en charge la prise de décision décentralisée d'Analyse de Spectre, appliquée aux réseaux IoT, ainsi que les performances d'apprentissage avec un nombre croissant d'objets intelligents. Nous montrons que l'utilisation d'algorithmes d'apprentissage aide à adapter un plus grand nombre de dispositifs dans de tels réseaux, même lorsque tous les appareils finaux sont intelligents et changent de canal de façon dynamique. Dans le scénario étudié, l'apprentissage stochastique (MAB) fournit un gain allant jusqu'à 16% en terme de probabilités de transmission réussie, et a des performances quasi optimales même dans les situations non stationnaires et non i.i.d. avec une majorité d'appareils intelligents.

Mots clés

Multi-Armed Bandits Internet of Things Cognitive Radio Non-Stationary Bandits Reinforcement Learning

Domaines

Réseaux et télécommunications [cs.NI] Traitement du signal et de l'image [eess.SP]

Fichier principal

BBMKP_CROWNCOM_2017.pdf (218.42 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Lilian Besson : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01575419

Soumis le : lundi 2 juillet 2018-07:15:03

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-01575419 , version 1 (19-08-2017)

hal-01575419 , version 2 (02-07-2018)

Licence

Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales

Identifiants

HAL Id : hal-01575419 , version 2
ARXIV : 1807.00491
DOI : 10.1007/978-3-319-76207-4_15

Citer

Rémi Bonnefoi, Lilian Besson, Christophe Moy, Emilie Kaufmann, Jacques Palicot. Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings. CROWNCOM 2017 - 12th EAI International Conference on Cognitive Radio Oriented Wireless Networks, Sep 2017, Lisbon, Portugal. pp.173-185, ⟨10.1007/978-3-319-76207-4_15⟩. ⟨hal-01575419v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES UNIV-RENNES1 CNRS INRIA INSA-RENNES IETR SUP_SCEE SUP_IETR IETR_SCEE CENTRALESUPELEC CRISTAL INRIA2 CRISTAL-SEQUEL UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UNIV-LILLE INSA-GROUPE ANR UR1-MATH-NUM IETR-ASIC IETR-SIGNAL NANTES-UNIVERSITE

1515 Consultations

1735 Téléchargements