Speedy Q-learning - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Speedy Q-learning

Résumé

We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor γ only T = O(log(n)/(ε^2 (1 - γ)^4)) steps are required for the SQL algorithm to converge to an ε-optimal action-value function with high probability. This bound has a better dependency on 1/ε and 1/(1 - γ), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning.
Fichier principal
Vignette du fichier
speedy-QL_nips2011.pdf (122.14 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00830140 , version 1 (04-06-2013)

Identifiants

  • HAL Id : hal-00830140 , version 1

Citer

Mohammad Gheshlaghi Azar, Rémi Munos, Mohammad Ghavamzadeh, Hilbert Kappen. Speedy Q-learning. Advances in Neural Information Processing Systems, 2011, Spain. ⟨hal-00830140⟩
918 Consultations
786 Téléchargements

Partager

Gmail Facebook X LinkedIn More