Speedy Q-learning

Mohammad Gheshlaghi Azar 1 Rémi Munos 2 Mohammad Ghavamzadeh 2 Hilbert Kappen 3
1 Department of Medical Physics and Biophysics
Biophysics - Department of Medical Physics and Biophysics
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor γ only T = O(log(n)/(ε^2 (1 - γ)^4)) steps are required for the SQL algorithm to converge to an ε-optimal action-value function with high probability. This bound has a better dependency on 1/ε and 1/(1 - γ), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning.
Document type :
Conference papers
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00830140
Contributor : Rémi Munos <>
Submitted on : Tuesday, June 4, 2013 - 2:54:11 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Thursday, September 5, 2013 - 4:22:34 AM

File

speedy-QL_nips2011.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00830140, version 1

Collections

Citation

Mohammad Gheshlaghi Azar, Rémi Munos, Mohammad Ghavamzadeh, Hilbert Kappen. Speedy Q-learning. Advances in Neural Information Processing Systems, 2011, Spain. ⟨hal-00830140⟩

Share

Metrics

Record views

523

Files downloads

517