Particle filter-based policy gradient for pomdps

Pierre-Arnaud Coquelin 1 Romain Deguest 2 Rémi Munos 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resampling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency.
Document type :
Conference papers
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00830173
Contributor : Rémi Munos <>
Submitted on : Tuesday, June 4, 2013 - 3:16:18 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Thursday, September 5, 2013 - 4:22:59 AM

File

gradient_POMDP_nips08.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00830173, version 1

Citation

Pierre-Arnaud Coquelin, Romain Deguest, Rémi Munos. Particle filter-based policy gradient for pomdps. Advances in Neural Information Processing Systems, 2008, Canada. ⟨hal-00830173⟩

Share

Metrics

Record views

675

Files downloads

115