A Fitted-Q Algorithm for Budgeted MDPs

Nicolas Carrara 1, 2 Romain Laroche 3 Jean-Léon Bouraoui 4, 1 Tanguy Urvoy 1 Olivier Pietquin 5, 2
2 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We address the problem of bud-geted/constrained reinforcement learning in continuous state-space using a batch of transitions. For this purpose, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). We carry out some preliminary benchmarks on a continuous 2-D world. They show that BFTQ performs as well as a penalized Fitted-Q algorithm while also allowing ones to adapt the trained policy on-the-fly for a given amount of budget and without the need of engineering the reward penalties. We believe that the general principles used to design BFTQ could be used to extend others classical reinforcement learning algorithms to budget-oriented applications.
Type de document :
Autre publication
Workshop on Safety, Risk and Uncertainty in Reinforcement Learning. https://sites.google.com/view.. 2018
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01867353
Contributeur : Nicolas Carrara <>
Soumis le : mardi 4 septembre 2018 - 11:42:34
Dernière modification le : jeudi 7 février 2019 - 17:09:34
Document(s) archivé(s) le : mercredi 5 décembre 2018 - 14:33:03

Fichier

ncarrara-saferl-uai-2018.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01867353, version 1

Citation

Nicolas Carrara, Romain Laroche, Jean-Léon Bouraoui, Tanguy Urvoy, Olivier Pietquin. A Fitted-Q Algorithm for Budgeted MDPs. Workshop on Safety, Risk and Uncertainty in Reinforcement Learning. https://sites.google.com/view.. 2018. 〈hal-01867353〉

Partager

Métriques

Consultations de la notice

148

Téléchargements de fichiers

90