Aggregating optimistic planning trees for solving markov decision processes

Gunnar Kedenburg 1 Raphael Fonteneau 1, 2 Remi Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : This paper addresses the problem of online planning in Markov decision processes using a generative model and under a budget constraint. We propose a new algorithm, ASOP, which is based on the construction of a forest of single successor state planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are explored using a "safe" optimistic planning strategy which combines the optimistic principle (in order to explore the most promising part of the search space first) and a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We report numerical results on a benchmark problem showing that ASOP performs as well as state-of-the-art optimistic planning algorithms.
Document type :
Conference papers
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00923681
Contributor : Rémi Munos <>
Submitted on : Friday, January 3, 2014 - 7:04:52 PM
Last modification on : Thursday, May 2, 2019 - 2:24:11 PM
Long-term archiving on: Thursday, April 3, 2014 - 10:40:54 PM

File

nips13a.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00923681, version 1

Citation

Gunnar Kedenburg, Raphael Fonteneau, Remi Munos. Aggregating optimistic planning trees for solving markov decision processes. Advances in Neural Information Processing Systems, 2013, United States. pp.2382-2390. ⟨hal-00923681⟩

Share

Metrics

Record views

317

Files downloads

312