Learning with bandit feedback in potential games

Johanne Cohen 1 Amélie Héliou 2, 3, 4 Panayotis Mertikopoulos 5
1 GALaC - LRI - Graphes, Algorithmes et Combinatoire (LRI)
LRI - Laboratoire de Recherche en Informatique
2 AMIB - Algorithms and Models for Integrative Biology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France
5 POLARIS - Performance analysis and optimization of LARge Infrastructures and Systems
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : This paper examines the equilibrium convergence properties of no-regret learning with exponential weights in potential games. To establish convergence with minimal information requirements on the players’ side, we focus on two frameworks: the semi-bandit case (where players have access to a noisy estimate of their payoff vectors, including strategies they did not play), and the bandit case (where players are only able to observe their in-game, realized payoffs). In the semi-bandit case, we show that the induced sequence of play converges almost surely to a Nash equilibrium at a quasi-exponential rate. In the bandit case, the same result holds for "-approximations of Nash equilibria if we introduce an exploration factor " > 0 that guarantees that action choice probabilities never fall below ". In particular, if the algorithm is run with a suitably decreasing exploration factor, the sequence of play converges to a bona fide Nash equilibrium with probability 1.
Document type :
Conference papers
Complete list of metadatas

Contributor : Panayotis Mertikopoulos <>
Submitted on : Tuesday, November 21, 2017 - 1:12:16 PM
Last modification on : Tuesday, April 2, 2019 - 2:52:12 AM


  • HAL Id : hal-01643352, version 1


Johanne Cohen, Amélie Héliou, Panayotis Mertikopoulos. Learning with bandit feedback in potential games. NIPS '17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Dec 2017, Long Beach, CA, United States. ⟨hal-01643352⟩



Record views