Online Learning in Episodic Markovian Decision Processes by Relative Entropy Policy Search

Alexander Zimin; Gergely Neu

Communication Dans Un Congrès Année : 2013

Online Learning in Episodic Markovian Decision Processes by Relative Entropy Policy Search

(1) , (2)

1
2

Alexander Zimin

Fonction : Auteur
PersonId : 961178

Institute of Science and Technology [Klosterneuburg, Austria]

Gergely Neu

Fonction : Auteur
PersonId : 961179

Sequential Learning

Résumé

We study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 sqrt(L|X ||A|T log(|X ||A|/L)) in the bandit setting and 2L sqrt(T log(|X ||A|/L)) in the full information setting, given that the learner has perfect knowledge of the transition probabilities of the underlying MDP. These guarantees largely improve previously known results under much milder assumptions and cannot be significantly improved under general assumptions.

Domaines

Apprentissage [cs.LG] Optimisation et contrôle [math.OC]

Fichier principal

ZN13.pdf (281.04 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gergely Neu : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01079423

Soumis le : samedi 1 novembre 2014-19:25:00

Dernière modification le : vendredi 19 avril 2024-16:02:08

Archivage à long terme le : lundi 2 février 2015-17:00:35

Dates et versions

hal-01079423 , version 1 (01-11-2014)

Identifiants

HAL Id : hal-01079423 , version 1

Citer

Alexander Zimin, Gergely Neu. Online Learning in Episodic Markovian Decision Processes by Relative Entropy Policy Search. Neural Information Processing Systems 26, Dec 2013, Lake Tahoe, United States. ⟨hal-01079423⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS INRIA2 TDS-MACS

608 Consultations

192 Téléchargements

Online Learning in Episodic Markovian Decision Processes by Relative Entropy Policy Search

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager