Optimistic Planning for Markov Decision Processes

Lucian Busoniu 1, * Remi Munos 2
* Corresponding author
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : The reinforcement learning community has recently intensified its interest in online planning methods, due to their relative independence on the state space size. However, tight near-optimality guarantees are not yet available for the general case of stochastic Markov decision processes and closed-loop, state-dependent planning policies. We therefore consider an algorithm related to AO* that optimistically explores a tree representation of the space of closed-loop policies, and we analyze the near-optimality of the action it returns after n tree node expansions. While this optimistic planning requires a finite number of actions and possible next states for each transition, its asymptotic performance does not depend directly on these numbers, but only on the subset of nodes that significantly impact near-optimal policies. We characterize this set by introducing a novel measure of problem complexity, called the near-optimality exponent. Specializing the exponent and performance bound for some interesting classes of MDPs illustrates the algorithm works better when there are fewer near-optimal policies and less uniform transition probabilities.
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00756736
Contributor : Lucian Busoniu <>
Submitted on : Friday, November 23, 2012 - 4:10:41 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Sunday, February 24, 2013 - 3:54:13 AM

File

aistats12.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00756736, version 1

Citation

Lucian Busoniu, Remi Munos. Optimistic Planning for Markov Decision Processes. 15th International Conference on Artificial Intelligence and Statistics, AISTATS-12, Apr 2012, La Palma, Canary Islands, Spain. pp.182-189. ⟨hal-00756736⟩

Share

Metrics

Record views

1231

Files downloads

293