Faster Rates for Policy Learning

Abstract : This article improves the existing proven rates of regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Four examples are considered. In these examples, the regret is expressed in terms of either the mean value or the median value; the number of possible actions is either two or finitely many; and the sampling scheme is either independent and identically distributed or sequential, where the latter represents a contextual bandit sampling scheme.
Type de document :
Pré-publication, Document de travail
2017
Liste complète des métadonnées

Littérature citée [29 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01511409
Contributeur : Antoine Chambaz <>
Soumis le : jeudi 20 avril 2017 - 22:20:01
Dernière modification le : jeudi 20 juillet 2017 - 09:28:38
Document(s) archivé(s) le : vendredi 21 juillet 2017 - 14:11:59

Fichiers

fasterRatesForPolicyLearning.p...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01511409, version 1
  • ARXIV : 1704.06431

Collections

Citation

Alexander Luedtke, Antoine Chambaz. Faster Rates for Policy Learning. 2017. 〈hal-01511409〉

Partager

Métriques

Consultations de
la notice

128

Téléchargements du document

63