Targeting a simple statistical bandit problem

Abstract : This manuscript deals with the estimation of the optimal rule and its mean reward in a simple bandit setting where, at each round, the player is given a context, chooses one of two actions based on the context and all past observations, and receives a reward corresponding to the action undertaken. The player focuses on the mean reward and tries to narrow her confidence interval for it, but it happens that she can also estimate her regret. Inference hinges on the targeted learning methodology. A simulation study illustrates the results of the manuscript.
Type de document :
Pré-publication, Document de travail
2016
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-01359222
Contributeur : Antoine Chambaz <>
Soumis le : vendredi 2 septembre 2016 - 08:25:38
Dernière modification le : jeudi 16 mars 2017 - 01:07:45
Document(s) archivé(s) le : dimanche 4 décembre 2016 - 09:40:53

Fichier

bandits_HAL.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01359222, version 1

Collections

Citation

Antoine Chambaz, Wenjing Zheng. Targeting a simple statistical bandit problem. 2016. <hal-01359222>

Partager

Métriques

Consultations de
la notice

116

Téléchargements du document

36