Skip to Main content Skip to Navigation
Conference papers

Monte-Carlo Graph Search: the Value of Merging Similar States

Edouard Leurent 1, 2 Odalric-Ambrym Maillard 1, 3
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
3 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as UCT rely on a tree structure to represent their value estimates. That is, they do not identify together two similar states reached via different trajectories and represented in separate branches of the tree. In this work, we propose a graph-based planning algorithm, which takes into account this state similarity. In our analysis, we provide a regret bound that depends on a novel problem-dependent measure of difficulty, which improves on the original tree-based bound in MDPs where the trajectories overlap, and recovers it otherwise. Then, we show that this methodology can be adapted to existing planning algorithms that deal with stochastic systems. Finally, numerical simulations illustrate the benefits of our approach.
Document type :
Conference papers
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal.inria.fr/hal-03004124
Contributor : Edouard Leurent <>
Submitted on : Friday, November 13, 2020 - 3:08:40 PM
Last modification on : Friday, November 27, 2020 - 2:20:12 PM

File

main.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03004124, version 1

Citation

Edouard Leurent, Odalric-Ambrym Maillard. Monte-Carlo Graph Search: the Value of Merging Similar States. ACML 2020 - 12th Asian Conference on Machine Learning, Nov 2020, Bangkok / Virtual, Thailand. pp.577 - 602. ⟨hal-03004124⟩

Share

Metrics

Record views

33

Files downloads

70