Optimistic planning for belief-augmented Markov decision processes

Raphael Fonteneau; Lucian Busoniu; Rémi Munos

Communication Dans Un Congrès Année : 2013

Optimistic planning for belief-augmented Markov decision processes

(1) , (2) , (3)

1
2
3

Raphael Fonteneau

Fonction : Auteur
PersonId : 862040

Department of Electrical Engineering and Computer Science

Lucian Busoniu

Fonction : Auteur
PersonId : 933138

Centre de Recherche en Automatique de Nancy

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Résumé

This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OP-MDP) algorithm [10], [9] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the belief-augmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

adprl.pdf (464.82 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Rémi Munos : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00840202

Soumis le : lundi 1 juillet 2013-21:26:20

Dernière modification le : lundi 11 septembre 2023-17:20:03

Archivage à long terme le : mercredi 5 avril 2017-05:41:14

Dates et versions

hal-00840202 , version 1 (01-07-2013)

Identifiants

HAL Id : hal-00840202 , version 1

Citer

Raphael Fonteneau, Lucian Busoniu, Rémi Munos. Optimistic planning for belief-augmented Markov decision processes. IEEE International Symposium on Adaptive Dynamic Programming and reinforcement Learning, ADPRL 2013, Apr 2013, Singapour, Singapore. pp.CDROM. ⟨hal-00840202⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA CRAN LAGIS CRISTAL UNIV-LORRAINE INRIA2 CRISTAL-SEQUEL

307 Consultations

258 Téléchargements

Optimistic planning for belief-augmented Markov decision processes

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager