Skip to Main content Skip to Navigation
Conference papers

Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

Mathieu Seurin 1, 2, 3, 4 Florian Strub 5 Philippe Preux 1, 2, 3, 4 Olivier Pietquin 6
1 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the environment's salient interaction opportunities. We propose a new exploration method, called Don't Do What Doesn't Matter (DoWhaM), shifting the emphasis from state novelty to state with relevant actions. While most actions consistently change the state when used, e.g. moving the agent, some actions are only effective in specific states, e.g., opening a door, grabbing an object. DoWhaM detects and rewards actions that seldom affect the environment. We evaluate DoWhaM on the procedurallygenerated environment MiniGrid, against state-ofthe-art methods. Experiments consistently show that DoWhaM greatly reduces sample complexity, installing the new state-of-the-art in MiniGrid.
Document type :
Conference papers
Complete list of metadata
Contributor : Mathieu Seurin Connect in order to contact the contributor
Submitted on : Monday, June 14, 2021 - 9:19:15 AM
Last modification on : Tuesday, January 11, 2022 - 1:27:05 PM
Long-term archiving on: : Thursday, September 16, 2021 - 8:21:05 AM


Files produced by the author(s)


  • HAL Id : hal-03259315, version 1



Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin. Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness. Internationnal Joint Conference on Artificial Intelligence (IJCAI), Aug 2021, Montreal, Canada. ⟨hal-03259315⟩



Les métriques sont temporairement indisponibles