Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards

Résumé

We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over $[0,1]$. Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and until the first $m$ failures, respectively, where $m$ is a fixed parameter. This two-target algorithm achieves a long-term average regret in $\sqrt{2n}$ for a large parameter $m$ and a known time horizon $n$. This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in $2\sqrt{n}$. The results are extended to any mean-reward distribution whose support contains 1 and to unknown time horizons. Numerical experiments show the performance of the algorithm for finite time horizons.

Mots clés

Fichier principal
Vignette du fichier
nips2013.pdf (250.83 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00920045 , version 1 (17-12-2013)

Identifiants

  • HAL Id : hal-00920045 , version 1

Citer

Thomas Bonald, Alexandre Proutière. Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards. NIPS 2013 - Neural Information Processing Systems Conference, Dec 2013, Lake Tahoe, Nevada, United States. pp.8. ⟨hal-00920045⟩
296 Consultations
289 Téléchargements

Partager

Gmail Facebook X LinkedIn More