Skip to Main content Skip to Navigation
New interface
Conference papers

Towards an Understanding of Default Policies in Multitask Policy Optimization

Abstract : Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms, with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training more generally capable agents. Here, we take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. Using these results, we then derive a principled RPO algorithm for multitask learning with strong performance guarantees.
Document type :
Conference papers
Complete list of metadata
Contributor : Michael Arbel Connect in order to contact the contributor
Submitted on : Monday, November 29, 2021 - 4:31:23 PM
Last modification on : Friday, November 18, 2022 - 9:27:22 AM


Files produced by the author(s)


  • HAL Id : hal-03455465, version 1



Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano. Towards an Understanding of Default Policies in Multitask Policy Optimization. 25th International Conference on Artificial Intelligence and Statistics, Mar 2022, Online, France. ⟨hal-03455465⟩



Record views


Files downloads