Skip to Main content Skip to Navigation
Conference papers

Online Learning in Episodic Markovian Decision Processes by Relative Entropy Policy Search

Alexander Zimin 1 Gergely Neu 2
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 sqrt(L|X ||A|T log(|X ||A|/L)) in the bandit setting and 2L sqrt(T log(|X ||A|/L)) in the full information setting, given that the learner has perfect knowledge of the transition probabilities of the underlying MDP. These guarantees largely improve previously known results under much milder assumptions and cannot be significantly improved under general assumptions.
Complete list of metadata

Cited literature [23 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01079423
Contributor : Gergely Neu Connect in order to contact the contributor
Submitted on : Saturday, November 1, 2014 - 7:25:00 PM
Last modification on : Tuesday, November 24, 2020 - 2:18:20 PM
Long-term archiving on: : Monday, February 2, 2015 - 5:00:35 PM

File

ZN13.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01079423, version 1

Collections

`

Citation

Alexander Zimin, Gergely Neu. Online Learning in Episodic Markovian Decision Processes by Relative Entropy Policy Search. Neural Information Processing Systems 26, Dec 2013, Lake Tahoe, United States. ⟨hal-01079423⟩

Share

Metrics

Record views

749

Files downloads

336