Thompson Sampling for Bayesian Bandits with Resets - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Thompson Sampling for Bayesian Bandits with Resets

Paolo Viappiani

Résumé

Multi-armed bandit problems are challenging sequential decision problems that have been widely studied as they constitute a mathematical framework that abstracts many different decision problems in fields such as machine learning, logistics, industrial optimization, management of clinical trials, etc. In this paper we address a non stationary environment with expected rewards that are dynamically evolving, considering a particular type of drift, that we call resets, in which the arm qualities are re-initialized from time to time. We compare different arm selection strategies with simulations, focusing on a Bayesian method based on Thompson sampling (a simple, yet effective, technique for trading off between exploration and exploitation).
Fichier principal
Vignette du fichier
Viappiani_BanditsReset_ADT2013.pdf (369.33 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01215254 , version 1 (16-01-2019)

Identifiants

Citer

Paolo Viappiani. Thompson Sampling for Bayesian Bandits with Resets. The 3rd International Conference on Algorithmic Decision Theory, ADT 2013, Nov 2013, Bruxelles, Belgium. pp.399-410, ⟨10.1007/978-3-642-41575-3_31⟩. ⟨hal-01215254⟩
81 Consultations
806 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More