Learning with bandit feedback in potential games

This paper examines the equilibrium convergence properties of no-regret learning with exponential weights in potential games. To establish convergence with minimal information requirements on the players’ side, we focus on two frameworks: the semi-bandit case (where players have access to a noisy estimate of their payoff vectors, including strategies they did not play), and the bandit case (where players are only able to observe their in-game, realized payoffs). In the semi-bandit case, we show that the induced sequence of play converges almost surely to a Nash equilibrium at a quasi-exponential rate. In the bandit case, the same result holds for "-approximations of Nash equilibria if we introduce an exploration factor " > 0 that guarantees that action choice probabilities never fall below ". In particular, if the algorithm is run with a suitably decreasing exploration factor, the sequence of play converges to a bona fide Nash equilibrium with probability 1.

Domaines

Optimisation et contrôle [math.OC]

Panayotis Mertikopoulos : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01643352

Soumis le : mardi 21 novembre 2017-13:12:16

Dernière modification le : jeudi 4 avril 2024-21:08:01

Dates et versions

hal-01643352 , version 1 (21-11-2017)

Identifiants

HAL Id : hal-01643352 , version 1

Citer

Johanne Cohen, Amélie Héliou, Panayotis Mertikopoulos. Learning with bandit feedback in potential games. NIPS '17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Dec 2017, Long Beach, CA, United States. ⟨hal-01643352⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X UGA CNRS INRIA LIX LIG X-LIX X-DEP-INFO LIG_SRCPR UMR8623 CENTRALESUPELEC LRI-GALAC INRIA2 TDS-MACS LIG-SRCPR-POLARIS UNIV-PARIS-SACLAY ANR LISN GS-ENGINEERING GS-COMPUTER-SCIENCE LISN-GALAC LIG_SIDCH

635 Consultations

0 Téléchargements