Path Integral Policy Improvement with Covariance Matrix Adaptation

Freek Stulp 1, 2 Olivier Sigaud 3
1 Flowers - Flowing Epigenetic Robots and Systems
Inria Bordeaux - Sud-Ouest, U2IS - Unité d'Informatique et d'Ingénierie des Systèmes
Abstract : There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2- as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. At the conceptual level, we compare PI2 to other members of the same family, being Cross-Entropy Methods and CMAES. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for ''Path Integral Policy Improvement with Covariance Matrix Adaptation''. PI2-CMA's main advantage is that it determines the magnitude of the exploration noise automatically
Document type :
Conference papers
Liste complète des métadonnées
Contributor : Freek Stulp <>
Submitted on : Monday, February 18, 2013 - 10:57:31 AM
Last modification on : Thursday, March 21, 2019 - 1:05:18 PM


  • HAL Id : hal-00789391, version 1


Freek Stulp, Olivier Sigaud. Path Integral Policy Improvement with Covariance Matrix Adaptation. Proceedings of the 29th International Conference on Machine Learning (ICML), 2012, United Kingdom. pp.0-0. ⟨hal-00789391⟩



Record views