Skip to Main content Skip to Navigation
Conference papers

Interactive Value Iteration for Markov Decision Processes with Unknown Rewards

Paul Weng 1 Bruno Zanuttini 2
LIP6 - Laboratoire d'Informatique de Paris 6
2 Equipe MAD - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of re- wards. We first show how the set of possible reward functions for a given preference relation can be rep- resented as a polytope. Then our algorithm, called Interactive Value Iteration, searches for an optimal policy while refining its knowledge about the pos- sible reward functions, by querying a tutor when necessary. We prove that the number of queries needed before finding an optimal policy is upper- bounded by a polynomial in the size of the problem, and we present experimental results which demon- strate that our approach is efficient in practice.
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download
Contributor : Greyc Référent <>
Submitted on : Wednesday, February 5, 2014 - 5:33:15 PM
Last modification on : Tuesday, May 14, 2019 - 11:05:55 AM
Document(s) archivé(s) le : Sunday, April 9, 2017 - 8:59:47 AM


Files produced by the author(s)


  • HAL Id : hal-00942290, version 1


Paul Weng, Bruno Zanuttini. Interactive Value Iteration for Markov Decision Processes with Unknown Rewards. IJCAI '13 - Twenty-Third international joint conference on Artificial Intelligence, Aug 2013, Beijing, China. pp.2415-2421. ⟨hal-00942290⟩



Record views


Files downloads