Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Machine Learning Année : 2008

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Csaba Szepesvari
  • Fonction : Auteur
  • PersonId : 844057
Rémi Munos
  • Fonction : Auteur
  • PersonId : 836863

Résumé

We consider the problem of finding a near-optimal policy using value-function methods in continuous space, discounted Markovian Decision Problems (MDP) when only a single trajectory underlying some policy can be used as the input. Since the state-space is continuous, one must resort to the use of function approximation. In this paper we study a policy iteration algorithm iterating over action-value functions where the iterates are obtained by empirical risk minimization, where the loss function used penalizes high magnitudes of the Bellman-residual. It turns out that when a linear parameterization is used the algorithm is equivalent to least-squares policy iteration. Our main result is a finite-sample, high-probability bound on the performance of the computed policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept (the VC-crossing dimension), the approximation power of the function set and the controllability properties of the MDP. To the best of our knowledge this is the first theoretical result for off-policy control learning over continuous state-spaces using a single trajectory.
Fichier principal
Vignette du fichier
sapi_MLJ08.pdf (403.04 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00830201 , version 1 (04-06-2013)

Identifiants

Citer

Andras Antos, Csaba Szepesvari, Rémi Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 2008, 71, pp.89-129. ⟨10.1007/s10994-007-5038-2⟩. ⟨hal-00830201⟩
184 Consultations
1008 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More