Skip to Main content Skip to Navigation
Conference papers

Direct Value Learning: a Rank-Invariant Approach to Reinforcement Learning

Basile Mayeur 1, 2 Riad Akrour 1, 2 Michèle Sebag 3, 1, 2
2 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : Taking inspiration from inverse reinforcement learning, the proposed Direct Value Learning for Reinforcement Learning (DIVA) approach uses light priors to gener-ate inappropriate behaviors, and uses the corresponding state sequences to directly learn a value function. When the transition model is known, this value function directly defines a (nearly) optimal controller. Otherwise, the value function is extended to the state-action space using off-policy learning. The experimental validation of DIVA on the mountain car problem shows the robustness of the approach comparatively to SARSA, based on the assumption that the target state is known. The experimental validation on the bicycle problem shows that DIVA still finds good policies when relaxing this assumption.
Complete list of metadata

Cited literature [16 references]  Display  Hide  Download
Contributor : Basile Mayeur <>
Submitted on : Thursday, December 4, 2014 - 2:22:53 PM
Last modification on : Thursday, November 5, 2020 - 9:08:02 AM
Long-term archiving on: : Monday, March 9, 2015 - 5:58:32 AM


Files produced by the author(s)


Public Domain


  • HAL Id : hal-01090982, version 1



Basile Mayeur, Riad Akrour, Michèle Sebag. Direct Value Learning: a Rank-Invariant Approach to Reinforcement Learning. Autonomously Learning Robots, workshop at NIPS 2014, Gerhard Neumann (TU-Darmstadt) and Joelle Pineau (McGill University) and Peter Auer (Uni Leoben) and Marc Toussaint (Uni Stuttgart), Dec 2014, Montreal, Canada. ⟨hal-01090982⟩



Record views


Files downloads