Skip to Main content Skip to Navigation
Conference papers

Speedy Q-learning

Mohammad Gheshlaghi Azar 1 Rémi Munos 2 Mohammad Ghavamzadeh 2 Hilbert Kappen 3
1 Department of Medical Physics and Biophysics
Biophysics - Department of Medical Physics and Biophysics
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor γ only T = O(log(n)/(ε^2 (1 - γ)^4)) steps are required for the SQL algorithm to converge to an ε-optimal action-value function with high probability. This bound has a better dependency on 1/ε and 1/(1 - γ), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning.
Document type :
Conference papers
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download
Contributor : Rémi Munos Connect in order to contact the contributor
Submitted on : Tuesday, June 4, 2013 - 2:54:11 PM
Last modification on : Tuesday, November 24, 2020 - 2:18:20 PM
Long-term archiving on: : Thursday, September 5, 2013 - 4:22:34 AM


Files produced by the author(s)


  • HAL Id : hal-00830140, version 1




Mohammad Gheshlaghi Azar, Rémi Munos, Mohammad Ghavamzadeh, Hilbert Kappen. Speedy Q-learning. Advances in Neural Information Processing Systems, 2011, Spain. ⟨hal-00830140⟩



Record views


Files downloads