Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence

Sarah Filippi; Olivier Cappé; Aurélien Garivier

Pré-Publication, Document De Travail Année : 2010

Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence

(1) , (1) , (1)

Sarah Filippi

Fonction : Auteur
PersonId : 862433

Laboratoire Traitement et Communication de l'Information

Olivier Cappé

Fonction : Auteur
PersonId : 1534
IdHAL : olivier-cappe
ORCID : 0000-0001-7415-8669
IdRef : 057106878

Laboratoire Traitement et Communication de l'Information

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Laboratoire Traitement et Communication de l'Information

Résumé

We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carrying out extended value iterations, under a constraint of consistency with the estimated model transition probabilities. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By study- ing the linear maximization problem under KL constraints, we provide an efficient algorithm for solving KL-optimistic extended value iteration. When implemented within the structure of UCRL2, the near-optimal method introduced by [Auer et al, 2008], this algorithm also achieves bounded regrets in the undiscounted case. We however provide some geometric arguments as well as a concrete illustration on a simulated example to explain the observed improved practical behavior, particularly when the MDP has reduced connectivity. To analyze this new algorithm, termed KL-UCRL, we also rely on recent deviation bounds for the KL divergence which compare favorably with the L1 deviation bounds used in previous works.

Domaines

Apprentissage [cs.LG] Autres [stat.ML] Statistiques [math.ST] Théorie [stat.TH]

Fichier principal

KLModelBased.pdf (297.5 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Sarah Filippi : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00476116

Soumis le : vendredi 23 avril 2010-16:42:48

Dernière modification le : lundi 9 octobre 2023-12:49:40

Archivage à long terme le : mardi 28 septembre 2010-13:27:55

Dates et versions

hal-00476116 , version 1 (23-04-2010)

hal-00476116 , version 2 (17-06-2010)

hal-00476116 , version 3 (12-10-2010)

Identifiants

HAL Id : hal-00476116 , version 1
ARXIV : 1004.5229

Citer

Sarah Filippi, Olivier Cappé, Aurélien Garivier. Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence. 2010. ⟨hal-00476116v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

220 Consultations

1055 Téléchargements

Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager