Skip to Main content Skip to Navigation
Conference papers

Continuous Upper Confidence Trees with Polynomial Exploration - Consistency

David Auger 1 Adrien Couetoux 2 Olivier Teytaud 2, 3
1 AlCAAP - Algorithmique, Combinatoire Analytique et Applications
PRISM - Parallélisme, Réseaux, Systèmes, Modélisation
3 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : Upper Confidence Trees (UCT) are now a well known algorithm for sequential decision making; it is a provably consistent variant of Monte-Carlo Tree Search. However, the consistency is only proved in a the case where both the action space is finite. We here propose a proof in the case of fully observable Markov Decision Processes with bounded horizon, possibly including infinitely many states and infinite action spaces and arbitrary stochastic transition kernels. We illustrate the consistency on two benchmark problems, one being a legacy toy problem, the other a more challenging one, the famous energy unit commitment problem.
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/hal-00835352
Contributor : Adrien Couetoux <>
Submitted on : Tuesday, June 18, 2013 - 3:26:31 PM
Last modification on : Wednesday, September 16, 2020 - 5:04:54 PM
Long-term archiving on: : Thursday, September 19, 2013 - 4:11:19 AM

File

doublePWConf.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00835352, version 1

Collections

Citation

David Auger, Adrien Couetoux, Olivier Teytaud. Continuous Upper Confidence Trees with Polynomial Exploration - Consistency. ECML/PKKD 2013, Sep 2013, Prague, Czech Republic. pp.194-209. ⟨hal-00835352⟩

Share

Metrics

Record views

610

Files downloads

1393