Entropy-based adaptive exploit-explore coefficient for Monte-Carlo path planning - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Entropy-based adaptive exploit-explore coefficient for Monte-Carlo path planning

Résumé

Efficient path planning for autonomous vehicles in cluttered environments is a challenging sequential decision-making problem under uncertainty. In this context, this paper implements a partially observable stochastic shortest path (PO-SSP) planning problem for autonomous urban navigation of Unmanned Aerial Vehicles (UAVs). To solve this planning problem, the POMCP-GO algorithm is used, which is goal oriented variant of POMCP, one of the fastest online state-of-the-art solvers for partially observable environments based on Monte Carlo Planning. This algorithm relies on the Upper Confidence Bounds (UCB1) algorithm as action selection strategy. UCB1 depends on an exploration constant typically adjusted empirically. Its best value varies significantly between planning problems, and hence, an exhaustive search to find the most suitable value is required. This exhaustive search applied to a complex path planning problem may be extremely time consuming. Moreover, considering real applications where online planning is needed, this extensive search is not suitable. Thereby this paper explores the use of an adaptive exploration coefficient for action selection during planning. Monte-Carlo value backup approximation is also applied which empirically demonstrates to accelerate the policy value convergence. Simulation results show that the use of the adaptive exploration co- efficient within a user-defined interval achieves better convergence and success rates when compared with most hand-tuned fixed coefficients in said interval, although never achieving the same results as the best fixed coefficient. Therefore, a compromise must be made between the desired quality of the results and the time one is willing to spend on the exhaustive search for the best coefficient value before planning.

Domaines

Autre
Fichier principal
Vignette du fichier
Carmo_26589.pdf (468.76 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03125159 , version 1 (29-01-2021)

Identifiants

  • HAL Id : hal-03125159 , version 1
  • OATAO : 26589

Citer

Ana Raquel Carmo, Jean-Alexis Delamer, Yoko Watanabe, Rodrigo Ventura, Caroline Ponzoni Carvalho Chanel. Entropy-based adaptive exploit-explore coefficient for Monte-Carlo path planning. 10th International Conference on Prestigious Applications of Intelligent Systems (PAIS 2020), a subconference of the 24th European Conference on Artificial Intelligence (ECAI 2020), Aug 2020, Virtual, Spain. pp.1-8. ⟨hal-03125159⟩
95 Consultations
34 Téléchargements

Partager

Gmail Facebook X LinkedIn More