Navigating to the Best Policy in Markov Decision Processes

Aymen Al Marjani; Aurélien Garivier; Alexandre Proutiere

Proceedings/Recueil Des Communications Année : 2021

Navigating to the Best Policy in Markov Decision Processes

(1) , (1) , (2)

1
2

Aymen Al Marjani

Fonction : Auteur
PersonId : 1118574

Unité de Mathématiques Pures et Appliquées

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Unité de Mathématiques Pures et Appliquées

Alexandre Proutiere

Fonction : Auteur

KTH School of Electrical Engineering

Résumé

We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We propose a problem-dependent lower bound on the average number of steps required before a correct answer can be given with probability at least 1 − δ. We further provide the first algorithm with an instance-specific sample complexity in this setting. This algorithm addresses the general case of communicating MDPs; we also propose a variant with a reduced exploration rate (and hence faster convergence) under an additional ergodicity assumption. This work extends previous results relative to the generative setting [MP21], where the agent could at each step query the random outcome of any (state, action) pair. In contrast, we show here how to deal with the navigation constraints, induced by the online setting. Our analysis relies on an ergodic theorem for non-homogeneous Markov chains which we consider of wide interest in the analysis of Markov Decision Processes.

Domaines

Statistiques [math.ST]

Fichier principal

navigating_to_the_best_policy_.pdf (465.21 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Aymen Al Marjani : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03454436

Soumis le : lundi 29 novembre 2021-11:00:54

Dernière modification le : jeudi 14 mars 2024-03:14:50

Dates et versions

hal-03454436 , version 1 (29-11-2021)

Identifiants

HAL Id : hal-03454436 , version 1

Citer

Aymen Al Marjani, Aurélien Garivier, Alexandre Proutiere. Navigating to the Best Policy in Markov Decision Processes. 2021. ⟨hal-03454436⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON CNRS UDL ANR

21 Consultations

22 Téléchargements

Navigating to the Best Policy in Markov Decision Processes

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager