Targeting a simple statistical bandit problem

Antoine Chambaz; Wenjing Zheng

Pré-Publication, Document De Travail Année : 2016

Targeting a simple statistical bandit problem

(1, 2, 3) , (2)

1
2
3

Antoine Chambaz

Fonction : Auteur
PersonId : 867345

Modélisation aléatoire de Paris X

Division of Biostatistics

Mathématiques Appliquées Paris 5

Wenjing Zheng

Fonction : Auteur

Division of Biostatistics

Résumé

This manuscript deals with the estimation of the optimal rule and its mean reward in a simple bandit setting where, at each round, the player is given a context, chooses one of two actions based on the context and all past observations, and receives a reward corresponding to the action undertaken. The player focuses on the mean reward and tries to narrow her confidence interval for it, but it happens that she can also estimate her regret. Inference hinges on the targeted learning methodology. A simulation study illustrates the results of the manuscript.

Mots clés

targeted learning bandit problem data-adaptive parameter regret

Domaines

Statistiques [math.ST]

Fichier principal

bandits_HAL.pdf (364.15 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Chambaz : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01359222

Soumis le : vendredi 2 septembre 2016-08:25:38

Dernière modification le : jeudi 11 avril 2024-13:16:13

Archivage à long terme le : dimanche 4 décembre 2016-09:40:53

Dates et versions

hal-01359222 , version 1 (02-09-2016)

Identifiants

HAL Id : hal-01359222 , version 1

Citer

Antoine Chambaz, Wenjing Zheng. Targeting a simple statistical bandit problem. 2016. ⟨hal-01359222⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MAP5 USPC MODALX UNIV-PARIS-LUMIERES UP-SCIENCES ANR UNIV-PARIS-NANTERRE

153 Consultations

164 Téléchargements

Targeting a simple statistical bandit problem

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Relations

Exporter

Collections

Partager