Is the Bellman residual a bad proxy?

Matthieu Geist; Bilal Piot; Olivier Pietquin

Communication Dans Un Congrès Année : 2017

Is the Bellman residual a bad proxy?

(1) , (2) , (3)

1
2
3

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

Laboratoire Interdisciplinaire des Environnements Continentaux

Bilal Piot

Fonction : Auteur

DeepMind [London]

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

Sequential Learning

Résumé

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual T * v π − v π 1,ν over policies. A theoretical analysis shows how good this proxy is to policy optimization , and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.

Domaines

Machine Learning [stat.ML]

Fichier principal

rps_nips17_cr_full.pdf (1.4 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Matthieu GEIST : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01629739

Soumis le : lundi 6 novembre 2017-17:15:32

Dernière modification le : jeudi 11 avril 2024-13:10:04

Dates et versions

hal-01629739 , version 1 (06-11-2017)

Identifiants

HAL Id : hal-01629739 , version 1

Citer

Matthieu Geist, Bilal Piot, Olivier Pietquin. Is the Bellman residual a bad proxy?. NIPS 2017 - Advances in Neural Information Processing Systems, Dec 2017, Long Beach, United States. pp.1-13. ⟨hal-01629739⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSU CNRS INRIA CRISTAL UNIV-LORRAINE INRIA2 CRISTAL-SEQUEL LIEC-UL OTELO-UL UNIV-LILLE

349 Consultations

614 Téléchargements

Is the Bellman residual a bad proxy?

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager