Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards

Thomas Bonald; Alexandre Proutière

Communication Dans Un Congrès Année : 2013

Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards

(1) , (2, 3)

1
2
3

Thomas Bonald

Fonction : Auteur correspondant
PersonId : 3160
IdHAL : tbonald
ORCID : 0000-0003-0468-0384
IdRef : 12422959X

Connectez-vous pour contacter l'auteur

Département Informatique et Réseaux

Alexandre Proutière

Fonction : Auteur
PersonId : 833456

Dynamics of Geometric Networks

KTH Royal Institute of Technology [Stockholm]

Résumé

We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over $[0,1]$. Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and until the first $m$ failures, respectively, where $m$ is a fixed parameter. This two-target algorithm achieves a long-term average regret in $\sqrt{2n}$ for a large parameter $m$ and a known time horizon $n$. This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in $2\sqrt{n}$. The results are extended to any mean-reward distribution whose support contains 1 and to unknown time horizons. Numerical experiments show the performance of the algorithm for finite time horizons.

Mots clés

bandits

Domaines

Apprentissage [cs.LG]

Fichier principal

nips2013.pdf (250.83 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alexandre Proutiere : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00920045

Soumis le : mardi 17 décembre 2013-16:26:27

Dernière modification le : vendredi 19 avril 2024-16:18:58

Archivage à long terme le : samedi 8 avril 2017-07:33:11

Dates et versions

hal-00920045 , version 1 (17-12-2013)

Identifiants

HAL Id : hal-00920045 , version 1

Citer

Thomas Bonald, Alexandre Proutière. Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards. NIPS 2013 - Neural Information Processing Systems Conference, Dec 2013, Lake Tahoe, Nevada, United States. pp.8. ⟨hal-00920045⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM ENS-PARIS CNRS INRIA PARISTECH INRIA2 PSL LTCI INFRES DIG

299 Consultations

322 Téléchargements

Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager