Skip to Main content Skip to Navigation
Conference papers

Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities

Hugo Gilbert 1 Bruno Zanuttini 2 Paolo Viappiani 1 Paul Weng 3, 4 Esther Nicart 5, 2
1 DECISION
LIP6 - Laboratoire d'Informatique de Paris 6
2 Equipe MAD - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : In reinforcement learning, policies are typically evaluated according to the expectation of cumu-lated rewards. Researchers in decision theory have argued that more sophisticated decision criteria can better model the preferences of a decision maker. In particular, Skew-Symmetric Bilinear (SSB) utility functions generalize von Neumann and Morgenstern's expected utility (EU) theory to encompass rational decision behaviors that EU cannot accommodate. In this paper , we adopt an SSB utility function to compare policies in the reinforcement learning setting. We provide a model-free SSB reinforcement learning algorithm, SSB Q-learning, and prove its convergence towards a policy that is-optimal according to SSB. The proposed algorithm is an adaptation of fictitious play [Brown, 1951] combined with techniques from stochastic approximation [Borkar, 1997]. We also present some experimental results which evaluate our approach in a variety of settings.
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01356085
Contributor : Mad Greyc <>
Submitted on : Wednesday, August 24, 2016 - 5:31:42 PM
Last modification on : Thursday, November 21, 2019 - 12:00:07 AM
Document(s) archivé(s) le : Friday, November 25, 2016 - 1:35:20 PM

File

Gilbert.UAI.2016.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01356085, version 1

Citation

Hugo Gilbert, Bruno Zanuttini, Paolo Viappiani, Paul Weng, Esther Nicart. Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities. 32nd Conference on Uncertainty in Artificial Intelligence (UAI~2016), Jun 2016, New York City, United States. ⟨hal-01356085⟩

Share

Metrics

Record views

524

Files downloads

129