Nash and the Bandit Approach for Adversarial Portfolios - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Nash and the Bandit Approach for Adversarial Portfolios

Résumé

—In this paper we study the use of a portfolio of policies for adversarial problems. We use two different portfolios of policies and apply it to the game of Go. The first portfolio is composed of different versions of the GnuGo agent. The second portfolio is composed of fixed random seeds. First we demonstrate that learning an offline combination of these policies using the notion of Nash Equilibrium generates a stronger opponent. Second, we show that we can learn online such distributions through a bandit approach. The advantages of our approach are (i) diversity (the Nash-Portfolio is more variable than its components) (ii) adaptivity (the Bandit-Portfolio adapts to the opponent) (iii) simplicity (no computational overhead) (iv) increased performance. Due to the importance of games on mobile devices, designing artificial intelligences for small computational power is crucial; our approach is particularly suited for mobile device since it create a stronger opponent simply by biasing the distribution over the policies and moreover it generalizes quite well.
Fichier principal
Vignette du fichier
nashrand3.pdf (300.62 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01077628 , version 1 (03-11-2014)

Identifiants

Citer

David L. Saint-Pierre, Olivier Teytaud. Nash and the Bandit Approach for Adversarial Portfolios. CIG 2014 - Computational Intelligence in Games, IEEE, Aug 2014, Dortmund, Germany. pp.7, ⟨10.1109/CIG.2014.6932897⟩. ⟨hal-01077628⟩
213 Consultations
323 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More