Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

Thibaut Cuvelier; Richard Combes; Eric Gourdin

doi:10.1145/3410220.3453926

Communication Dans Un Congrès Année : 2021

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

(1) , (2) , (3)

1
2
3

Thibaut Cuvelier

Fonction : Auteur
PersonId : 173056
IdHAL : thibaut-cuvelier
ORCID : 0000-0002-4233-2316

CentraleSupélec

Richard Combes

Fonction : Auteur
PersonId : 14877
IdHAL : richard-combes
ORCID : 0000-0003-3954-7241
IdRef : 171607732

Laboratoire des signaux et systèmes

Eric Gourdin

Fonction : Auteur
PersonId : 952670

Orange Labs [Issy les Moulineaux]

Résumé

We consider combinatorial semi-bandits over a set X ⊂ {0, 1} d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O d (ln m) 2 (ln T) ∆ min after T rounds, where m = max x ∈X 1 ⊤ x. However, ESCB has computational complexity O(|X|), which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm that is both computationally and statistically efficient for this problem with regret R(T) = O d (ln m) 2 (ln T) ∆ min and computational asymptotic complexity O(δ −1 T poly(d)), where δ T is a function which vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version of ESCB with the same regret guarantees. We show that, whenever budgeted linear maximization over X can be solved up to a given approximation ratio, AESCB is implementable in polynomial time O(δ −1 T poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently. Additional algorithms, proofs and numerical experiments are given in the complete version of this work.

Mots clés

Bandits Combinatorial Bandits Combinatorial Optimization

Domaines

Mathématique discrète [cs.DM] Combinatoire [math.CO] Optimisation et contrôle [math.OC]

Fichier principal

sigmet297-cuvelierA.pdf (663.76 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thibaut Cuvelier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03201526

Soumis le : lundi 19 avril 2021-06:09:58

Dernière modification le : lundi 18 mars 2024-03:05:40

Archivage à long terme le : mardi 20 juillet 2021-18:10:39

Dates et versions

hal-03201526 , version 1 (19-04-2021)

Identifiants

HAL Id : hal-03201526 , version 1
DOI : 10.1145/3410220.3453926

Citer

Thibaut Cuvelier, Richard Combes, Eric Gourdin. Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits. SIGMETRICS 2021, ACM, Jun 2021, Virtual Event, China. ⟨10.1145/3410220.3453926⟩. ⟨hal-03201526⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS SUP_LSS SUP_TELECOMS CENTRALESUPELEC TDS-MACS UNIV-PARIS-SACLAY GS-ENGINEERING GS-COMPUTER-SCIENCE

76 Consultations

64 Téléchargements

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager