Towards an Understanding of Default Policies in Multitask Policy Optimization - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Towards an Understanding of Default Policies in Multitask Policy Optimization

Jack Parker-Holder
  • Fonction : Auteur
  • PersonId : 1118668
Aldo Pacchiano
  • Fonction : Auteur
  • PersonId : 1118669

Résumé

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms, with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training more generally capable agents. Here, we take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. Using these results, we then derive a principled RPO algorithm for multitask learning with strong performance guarantees.
Fichier principal
Vignette du fichier
2111.02994.pdf (748.23 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03455465 , version 1 (29-11-2021)

Identifiants

  • HAL Id : hal-03455465 , version 1

Citer

Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano. Towards an Understanding of Default Policies in Multitask Policy Optimization. 25th International Conference on Artificial Intelligence and Statistics, Mar 2022, Online, France. ⟨hal-03455465⟩
60 Consultations
97 Téléchargements

Partager

Gmail Facebook X LinkedIn More