Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities

Hugo Gilbert; Bruno Zanuttini; Paolo Viappiani; Paul Weng; Esther Nicart

Communication Dans Un Congrès Année : 2016

Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities

(1) , (2) , (1) , (3, 4) , (5, 2)

1
2
3
4
5

Hugo Gilbert

Fonction : Auteur
PersonId : 778820
ORCID : 0000-0001-9729-2959

DECISION

Bruno Zanuttini

Fonction : Auteur
PersonId : 952903

Equipe MAD - Laboratoire GREYC - UMR6072

Paolo Viappiani

Fonction : Auteur
PersonId : 9572
IdHAL : paolo-viappiani
ORCID : 0000-0002-7922-3877
IdRef : 178446521

DECISION

Paul Weng

Fonction : Auteur

SYSU-CMU Joint Institute of Engineering

SYSU-CMU Shunde International Joint Research Institute

Esther Nicart

Fonction : Auteur

Cordon Electronics DS2i

Equipe MAD - Laboratoire GREYC - UMR6072

Résumé

In reinforcement learning, policies are typically evaluated according to the expectation of cumu-lated rewards. Researchers in decision theory have argued that more sophisticated decision criteria can better model the preferences of a decision maker. In particular, Skew-Symmetric Bilinear (SSB) utility functions generalize von Neumann and Morgenstern's expected utility (EU) theory to encompass rational decision behaviors that EU cannot accommodate. In this paper , we adopt an SSB utility function to compare policies in the reinforcement learning setting. We provide a model-free SSB reinforcement learning algorithm, SSB Q-learning, and prove its convergence towards a policy that is-optimal according to SSB. The proposed algorithm is an adaptation of fictitious play [Brown, 1951] combined with techniques from stochastic approximation [Borkar, 1997]. We also present some experimental results which evaluate our approach in a variety of settings.

Domaines

Informatique [cs] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

Gilbert.UAI.2016.pdf (500.78 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mad Greyc : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01356085

Soumis le : mercredi 24 août 2016-17:31:42

Dernière modification le : mercredi 20 mars 2024-16:20:04

Archivage à long terme le : vendredi 25 novembre 2016-13:35:20

Dates et versions

hal-01356085 , version 1 (24-08-2016)

Identifiants

HAL Id : hal-01356085 , version 1

Citer

Hugo Gilbert, Bruno Zanuttini, Paolo Viappiani, Paul Weng, Esther Nicart. Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities. 32nd Conference on Uncertainty in Artificial Intelligence (UAI~2016), Jun 2016, New York City, United States. ⟨hal-01356085⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS GREYC GREYC-MAD LIP6 COMUE-NORMANDIE ENSICAEN UNICAEN SORBONNE-UNIVERSITE SU-SCIENCES

412 Consultations

124 Téléchargements

Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager