Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

Nicolas Galichet 1, 2 Michèle Sebag 2 Olivier Teytaud 2
2 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.
Type de document :
Communication dans un congrès
Cheng Soon Ong and Tu Bao Ho. Asian Conference on Machine Learning 2013, Nov 2013, Canberra, Australia. 29, pp.245-260, 2013, Journal of Machine Learning Research : Workshop and Conference Proceedings
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00924062
Contributeur : Nicolas Galichet <>
Soumis le : lundi 6 janvier 2014 - 15:27:30
Dernière modification le : jeudi 5 avril 2018 - 12:30:12
Document(s) archivé(s) le : jeudi 10 avril 2014 - 17:20:30

Fichiers

acml2013_marab.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00924062, version 2
  • ARXIV : 1401.1123

Collections

Citation

Nicolas Galichet, Michèle Sebag, Olivier Teytaud. Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits. Cheng Soon Ong and Tu Bao Ho. Asian Conference on Machine Learning 2013, Nov 2013, Canberra, Australia. 29, pp.245-260, 2013, Journal of Machine Learning Research : Workshop and Conference Proceedings. 〈hal-00924062v2〉

Partager

Métriques

Consultations de la notice

657

Téléchargements de fichiers

566