Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards

Thomas Bonald 1, * Alexandre Proutière 2, 3
* Auteur correspondant
2 DYOGENE - Dynamics of Geometric Networks
DI-ENS - Département d'informatique de l'École normale supérieure, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
Abstract : We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over $[0,1]$. Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and until the first $m$ failures, respectively, where $m$ is a fixed parameter. This two-target algorithm achieves a long-term average regret in $\sqrt{2n}$ for a large parameter $m$ and a known time horizon $n$. This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in $2\sqrt{n}$. The results are extended to any mean-reward distribution whose support contains 1 and to unknown time horizons. Numerical experiments show the performance of the algorithm for finite time horizons.
keyword : bandits
Type de document :
Communication dans un congrès
NIPS 2013 - Neural Information Processing Systems Conference, Dec 2013, Lake Tahoe, Nevada, United States. pp.8, 2013
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-00920045
Contributeur : Alexandre Proutiere <>
Soumis le : mardi 17 décembre 2013 - 16:26:27
Dernière modification le : jeudi 29 septembre 2016 - 01:22:04
Document(s) archivé(s) le : samedi 8 avril 2017 - 07:33:11

Fichier

nips2013.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00920045, version 1

Collections

Citation

Thomas Bonald, Alexandre Proutière. Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards. NIPS 2013 - Neural Information Processing Systems Conference, Dec 2013, Lake Tahoe, Nevada, United States. pp.8, 2013. <hal-00920045>

Partager

Métriques

Consultations de
la notice

254

Téléchargements du document

130