Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities

Résumé

In reinforcement learning, policies are typically evaluated according to the expectation of cumu-lated rewards. Researchers in decision theory have argued that more sophisticated decision criteria can better model the preferences of a decision maker. In particular, Skew-Symmetric Bilinear (SSB) utility functions generalize von Neumann and Morgenstern's expected utility (EU) theory to encompass rational decision behaviors that EU cannot accommodate. In this paper , we adopt an SSB utility function to compare policies in the reinforcement learning setting. We provide a model-free SSB reinforcement learning algorithm, SSB Q-learning, and prove its convergence towards a policy that is-optimal according to SSB. The proposed algorithm is an adaptation of fictitious play [Brown, 1951] combined with techniques from stochastic approximation [Borkar, 1997]. We also present some experimental results which evaluate our approach in a variety of settings.
Fichier principal
Vignette du fichier
Gilbert.UAI.2016.pdf (500.78 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01356085 , version 1 (24-08-2016)

Identifiants

  • HAL Id : hal-01356085 , version 1

Citer

Hugo Gilbert, Bruno Zanuttini, Paolo Viappiani, Paul Weng, Esther Nicart. Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities. 32nd Conference on Uncertainty in Artificial Intelligence (UAI~2016), Jun 2016, New York City, United States. ⟨hal-01356085⟩
412 Consultations
124 Téléchargements

Partager

Gmail Facebook X LinkedIn More