Dora Q-Learning -making better use of explorations - GREYC mad Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Dora Q-Learning -making better use of explorations

Résumé

(hereafter referred to as Q(λ)) record the stack of (state, action) pairs enacted during a learning episode, enabling any rewards observed to be back-propagated down the stack, thus speeding up learning. In standard Q(λ), after an explore action, the eligibility trace is cut (reset to an empty stack), meaning that any good results found further on can take a long time to percolate back to the initial state. We present here Dora, an adaptation of Q(λ) which makes better use of results found when exploring, and therefore learns consistently faster. In Dora, our aim is to avoid cutting the trace on an explore if possible. This idea is quite simple and natural, but to the best of our knowledge, it has not been developed like this before. We note that the principle of Dora could be argued to resemble that of experience replay [Long-Ji Lin, 1991], but Dora is not model-based, has fewer parameters, and consumes less memory, whilst still giving excellent results.
Fichier principal
Vignette du fichier
Nicart.JFPDA.2016a.pdf (182.3 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01356078 , version 1 (24-08-2016)

Identifiants

  • HAL Id : hal-01356078 , version 1

Citer

Esther Nicart, Bruno Zanuttini, Bruno Grilhères, Fredéric Praca. Dora Q-Learning -making better use of explorations. 11es Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes (JFPDA~2016), Jul 2016, Grenoble, France. ⟨hal-01356078⟩
99 Consultations
60 Téléchargements

Partager

Gmail Facebook X LinkedIn More