Optimistic planning for belief-augmented Markov decision processes

Raphael Fonteneau 1 Lucian Busoniu 2 Rémi Munos 3
3 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OP-MDP) algorithm [10], [9] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the belief-augmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance.
Document type :
Conference papers
Complete list of metadatas

Cited literature [39 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00840202
Contributor : Rémi Munos <>
Submitted on : Monday, July 1, 2013 - 9:26:20 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Wednesday, April 5, 2017 - 5:41:14 AM

File

adprl.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00840202, version 1

Citation

Raphael Fonteneau, Lucian Busoniu, Rémi Munos. Optimistic planning for belief-augmented Markov decision processes. IEEE International Symposium on Adaptive Dynamic Programming and reinforcement Learning, ADPRL 2013, Apr 2013, Singapour, Singapore. pp.CDROM. ⟨hal-00840202⟩

Share

Metrics

Record views

574

Files downloads

224