A Cohenence Approach to Learning from Reward. Application to the Reactive Navigation of a simulated Mobile Robot
Résumé
Within this paper, a new kind of learning agents - so-called Constraint based Memory Units (CbMU) - is described. The framework is the incremental building of a complex behaviour, given a set of basic tasks and a set of perceptive constraints that must be fulfilled to achieve the behavior; the decision problem may be non-Markovian. At each time, one of the basic tasks is executed, so that the complex behavior is a temporal sequence of elementary tasks.
A CbMU can be modelled as an adaptive switch which learns to choose among its set of output channels the one to be activated (given its perceptive data and a short term memory), in order to respect a particular constraint. An output channel may be linked either to the firing of a basic task or to the activation of another CbMU; this allows a hierarchical decision process, implying different levels of contexts.
The dynamics of the system is learnt by the mean of a perceptive graph and the cycles detected by the short term memory of a CbMU are utilized as sub-goals to build internal contexts. The learning procedure of a CbMU is a reinforcement learning inspired algorithm based on a heuristic which does not need internal parameters. It is achieved by a consistency law between the binary values of the connected nodes of the perceptive graph, inspired from the AI minimax algorithm.
In this article, an example of programming with CbMUs is given, using a simulated Khepera robot. The objective is to build a goal-reaching behavior which is formulated by a high level strategy composed of logical rules using perceptive primitives. Four CbMUs are created, each one dedicated to the exploitation of particular perceptive data, and five basic tasks are utilized.
Origine : Fichiers produits par l'(les) auteur(s)