On the Sample Complexity of Reinforcement Learning with a Generative Model

Mohammad Gheshlaghi Azar 1 Rémi Munos 2 Hilbert Kappen 1
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider the problem of learning the optimal action-value function in the discounted-reward Markov decision processes (MDPs). We prove a new PAC bound on the sample-complexity of model-based value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N state-action pairs and the discount factor \gamma\in[0,1) only O(N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) samples are required to find an \epsilon-optimal estimation of the action-value function with the probability 1-\delta. We also prove a matching lower bound of \Theta (N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of our knowledge, this is the first matching result on the sample complexity of estimating the optimal (action-) value function in which the upper bound matches the lower bound of RL in terms of N, \epsilon, \delta and 1/(1-\gamma). Also, both our lower bound and our upper bound significantly improve on the state-of-the-art in terms of 1/(1-\gamma).
Document type :
Conference papers
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00840331
Contributor : Rémi Munos <>
Submitted on : Tuesday, July 2, 2013 - 11:46:48 AM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Thursday, October 3, 2013 - 4:08:22 AM

File

RLcomplexity.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00840331, version 1

Collections

Citation

Mohammad Gheshlaghi Azar, Rémi Munos, Hilbert Kappen. On the Sample Complexity of Reinforcement Learning with a Generative Model. International Conference on Machine Learning, 2012, United Kingdom. ⟨hal-00840331⟩

Share

Metrics

Record views

331

Files downloads

232