Dora Q-Learning -making better use of explorations

Esther Nicart; Bruno Zanuttini; Bruno Grilhères; Fredéric Praca

Communication Dans Un Congrès Année : 2016

Dora Q-Learning -making better use of explorations

(1, 2) , (2) , (3) , (1)

1
2
3

Esther Nicart

Fonction : Auteur

Cordon Electronics DS2i

Equipe MAD - Laboratoire GREYC - UMR6072

Bruno Zanuttini

Fonction : Auteur
PersonId : 952903

Equipe MAD - Laboratoire GREYC - UMR6072

Bruno Grilhères

Fonction : Auteur

Airbus Defence & Space [Elancourt] (Airbus group)

Fredéric Praca

Fonction : Auteur

Cordon Electronics DS2i

Résumé

(hereafter referred to as Q(λ)) record the stack of (state, action) pairs enacted during a learning episode, enabling any rewards observed to be back-propagated down the stack, thus speeding up learning. In standard Q(λ), after an explore action, the eligibility trace is cut (reset to an empty stack), meaning that any good results found further on can take a long time to percolate back to the initial state. We present here Dora, an adaptation of Q(λ) which makes better use of results found when exploring, and therefore learns consistently faster. In Dora, our aim is to avoid cutting the trace on an explore if possible. This idea is quite simple and natural, but to the best of our knowledge, it has not been developed like this before. We note that the principle of Dora could be argued to resemble that of experience replay [Long-Ji Lin, 1991], but Dora is not model-based, has fewer parameters, and consumes less memory, whilst still giving excellent results.

Domaines

Informatique [cs] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

Nicart.JFPDA.2016a.pdf (182.3 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mad Greyc : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01356078

Soumis le : mercredi 24 août 2016-17:09:40

Dernière modification le : mercredi 20 mars 2024-16:20:04

Archivage à long terme le : vendredi 25 novembre 2016-13:47:11

Dates et versions

hal-01356078 , version 1 (24-08-2016)

Identifiants

HAL Id : hal-01356078 , version 1

Citer

Esther Nicart, Bruno Zanuttini, Bruno Grilhères, Fredéric Praca. Dora Q-Learning -making better use of explorations. 11es Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes (JFPDA~2016), Jul 2016, Grenoble, France. ⟨hal-01356078⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS GREYC GREYC-MAD COMUE-NORMANDIE ENSICAEN UNICAEN

99 Consultations

60 Téléchargements

Dora Q-Learning -making better use of explorations

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager