A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

Abstract : We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new, efficient algorithm that combines the idea of leveraging forced collisions for implicit communication and that of performing matching eliminations. We give a finite-time analysis of our algorithm, bounding its regret by O((log T)^{1+\kappa}) for any fixed \kappa>0. If the optimal assignment of players to arms is unique, we further show that it attains the optimal O(log(T)) regret, solving an open question raised at NeurIPS 2018.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

Contributor : Emilie Kaufmann <>
Submitted on : Wednesday, June 5, 2019 - 10:09:01 PM
Last modification on : Monday, June 24, 2019 - 11:30:57 AM


Files produced by the author(s)


  • HAL Id : hal-02006069, version 2
  • ARXIV : 1902.01239


Etienne Boursier, Emilie Kaufmann, Abbas Mehrabian, Vianney Perchet. A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players. 2019. ⟨hal-02006069v2⟩



Record views


Files downloads