Optimistic planning of deterministic systems

Jean-Francois Hren; Rémi Munos

Communication Dans Un Congrès Année : 2008

Optimistic planning of deterministic systems

(1) , (1)

Jean-Francois Hren

Fonction : Auteur

Sequential Learning

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Résumé

If one possesses a model of a controlled deterministic system, then from any state, one may consider the set of all possible reachable states starting from that state and using any sequence of actions. This forms a tree whose size is exponential in the planning time horizon. Here we ask the question: given finite computational resources (e.g. CPU time), which may not be known ahead of time, what is the best way to explore this tree, such that once all resources have been used, the algorithm would be able to propose an action (or a sequence of actions) whose performance is as close as possible to optimality? The performance with respect to optimality is assessed in terms of the regret (with respect to the sum of discounted future rewards) resulting from choosing the action returned by the algorithm instead of an optimal action. In this paper we investigate an optimistic exploration of the tree, where the most promising states are explored first, and compare this approach to a naive uniform exploration. Bounds on the regret are derived both for uniform and optimistic exploration strategies. Numerical simulations illustrate the benefit of optimistic planning.

Domaines

Apprentissage [cs.LG]

Fichier principal

ewrl08.pdf (334.26 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Rémi Munos : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00830182

Soumis le : mardi 4 juin 2013-15:22:23

Dernière modification le : vendredi 24 mars 2023-14:52:57

Archivage à long terme le : jeudi 5 septembre 2013-04:23:14

Dates et versions

hal-00830182 , version 1 (04-06-2013)

Identifiants

HAL Id : hal-00830182 , version 1

Citer

Jean-Francois Hren, Rémi Munos. Optimistic planning of deterministic systems. European Workshop on Reinforcement Learning, 2008, France. pp.151-164. ⟨hal-00830182⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS INRIA2

1265 Consultations

1307 Téléchargements

Optimistic planning of deterministic systems

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager