Interactive Value Iteration for Markov Decision Processes with Unknown Rewards

Paul Weng; Bruno Zanuttini

Communication Dans Un Congrès Année : 2013

Interactive Value Iteration for Markov Decision Processes with Unknown Rewards

(1) , (2)

1
2

Paul Weng

Fonction : Auteur

DECISION

Bruno Zanuttini

Fonction : Auteur
PersonId : 952903

Equipe MAD - Laboratoire GREYC - UMR6072

Résumé

To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of re- wards. We first show how the set of possible reward functions for a given preference relation can be rep- resented as a polytope. Then our algorithm, called Interactive Value Iteration, searches for an optimal policy while refining its knowledge about the pos- sible reward functions, by querying a tutor when necessary. We prove that the number of queries needed before finding an optimal policy is upper- bounded by a polynomial in the size of the problem, and we present experimental results which demon- strate that our approach is efficient in practice.

Domaines

Système multi-agents [cs.MA] Intelligence artificielle [cs.AI] Modélisation et simulation Apprentissage [cs.LG]

Fichier principal

Weng.IJCAI.2013.pdf (253.47 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Greyc Référent : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00942290

Soumis le : mercredi 5 février 2014-17:33:15

Dernière modification le : mercredi 20 mars 2024-16:20:04

Archivage à long terme le : dimanche 9 avril 2017-08:59:47

Dates et versions

hal-00942290 , version 1 (05-02-2014)

Identifiants

HAL Id : hal-00942290 , version 1

Citer

Paul Weng, Bruno Zanuttini. Interactive Value Iteration for Markov Decision Processes with Unknown Rewards. IJCAI '13 - Twenty-Third international joint conference on Artificial Intelligence, Aug 2013, Beijing, China. pp.2415-2421. ⟨hal-00942290⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS GREYC GREYC-MAD LIP6 COMUE-NORMANDIE TDS-MACS ENSICAEN UNICAEN SORBONNE-UNIVERSITE SU-SCIENCES

333 Consultations

175 Téléchargements

Interactive Value Iteration for Markov Decision Processes with Unknown Rewards

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager