From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning

Rémi Munos

Rapport Année : 2014

From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning

(1)

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Résumé

This work covers several aspects of the optimism in the face of uncertainty principle applied to large scale optimization problems under finite numerical budget. The initial motivation for the research reported here originated from the empirical success of the so-called Monte-Carlo Tree Search method popularized in computer-go and further extended to many other games as well as optimization and planning problems. Our objective is to contribute to the development of theoretical foundations of the field by characterizing the complexity of the underlying optimization problems and designing efficient algorithms with performance guarantees. The main idea presented here is that it is possible to decompose a complex decision making problem (such as an optimization problem in a large search space) into a sequence of elementary decisions, where each decision of the sequence is solved using a (stochastic) multi-armed bandit (simple mathematical model for decision making in stochastic environments). This so-called hierarchical bandit approach (where the reward observed by a bandit in the hierarchy is itself the return of another bandit at a deeper level) possesses the nice feature of starting the exploration by a quasi-uniform sampling of the space and then focusing progressively on the most promising area, at different scales, according to the evaluations observed so far, and eventually performing a local search around the global optima of the function. The performance of the method is assessed in terms of the optimality of the returned solution as a function of the number of function evaluations. Our main contribution to the field of function optimization is a class of hierarchical optimistic algorithms designed for general search spaces (such as metric spaces, trees, graphs, Euclidean spaces, ...) with different algorithmic instantiations depending on whether the evaluations are noisy or noiseless and whether some measure of the ''smoothness'' of the function is known or unknown. The performance of the algorithms depend on the local behavior of the function around its global optima expressed in terms of the quantity of near-optimal states measured with some metric. If this local smoothness of the function is known then one can design very efficient optimization algorithms (with convergence rate independent of the space dimension), and when it is not known, we can build adaptive techniques that can, in some cases, perform almost as well as when it is known.

Mots clés

Monte-Carlo Tree Search Optimism in the face of uncertainty Bandit theory Upper Confidence Bounds

Domaines

Apprentissage [cs.LG] Optimisation et contrôle [math.OC]

Fichier principal

FTML2014.pdf (6.61 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Rémi Munos : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00747575

Soumis le : mardi 4 février 2014-16:01:31

Dernière modification le : mercredi 17 avril 2024-10:45:39

Archivage à long terme le : lundi 5 mai 2014-07:20:54

Dates et versions

hal-00747575 , version 1 (31-10-2012)

hal-00747575 , version 2 (12-07-2013)

hal-00747575 , version 3 (16-09-2013)

hal-00747575 , version 4 (09-10-2013)

hal-00747575 , version 5 (04-02-2014)

Identifiants

HAL Id : hal-00747575 , version 5

Citer

Rémi Munos. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning. 2014. ⟨hal-00747575v5⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS CRISTAL INRIA2 CRISTAL-SEQUEL TDS-MACS LARA

4529 Consultations

11662 Téléchargements

From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager