Apprentissage par renforcement en environnement non stationnaire

Erwan Lecarpentier

Thèse Année : 2020

Reinforcement Learning in Non-Stationary Environments

Apprentissage par renforcement en environnement non stationnaire

(1)

Erwan Lecarpentier

Fonction : Auteur

DTIS, ONERA, Université de Toulouse [Toulouse]

Résumé

How should an agent act in the face of uncertainty on the evolution of its environment? In this dissertation, we give a Reinforcement Learning perspective on the resolution of nonstationary problems. The question is seen from three different aspects. First, we study the planning vs. re-planning trade-off of tree search algorithms in stationary Markov Decision Processes. We propose a method to lower the computational requirements of such an algorithm while keeping theoretical guarantees on the performance. Secondly, we study the case of environments evolving gradually over time. This hypothesis is expressed through a mathematical framework called Lipschitz Non-Stationary Markov Decision Processes. We derive a risk averse planning algorithm provably converging to the minimax policy in this setting. Thirdly, we consider abrupt temporal evolution in the setting of lifelong Reinforcement Learning. We propose a non-negative transfer method based on the theoretical study of the optimal Q-function’s Lipschitz continuity with respect to the task space. The approach allows to accelerate learning in new tasks. Overall, this dissertation proposes answers to the question of solving Non-Stationary Markov Decision Processes under three different settings.

Comment un agent doit-il agir étant donné que son environnement évolue de manière incertaine ? Dans cette thèse, nous fournissons une réponse à cette question du point de vue de l’apprentissage par renforcement. Le problème est vu sous trois aspects différents. Premièrement, nous étudions le compromis planification vs. re-planification des algorithmes de recherche arborescente dans les Processus Décisionnels Markoviens. Nous proposons une méthode pour réduire la complexité de calcul d’un tel algorithme, tout en conservant des guaranties théoriques sur la performance. Deuxièmement, nous étudions le cas des environnements évoluant graduellement au cours du temps. Cette hypothèse est formulée dans un cadre mathématique appelé Processus de Décision Markoviens Non-Stationnaires Lipschitziens. Dans ce cadre, nous proposons un algorithme de planification robuste aux évolutions possibles, dont nous montrons qu’il converge vers la politique minmax. Troisièmement, nous considérons le cas de l’évolution temporelle abrupte dans le cadre du “lifelong learning” (apprentissage tout au long de la vie). Nous proposons une méthode de transfert non-négatif basée sur l’étude théorique de la continuité de Lipschitz de la Q-fonction optimale par rapport à l’espace des tâches. L’approche permet d’accélérer l’apprentissage dans de nouvelles tâches. Dans l’ensemble, cette dissertation propose des réponses à la question de la résolution des Processus de Décision Markoviens Non-Stationnaires dans trois cadres d’hypothèses.

Mots clés

REINFORCEMENT LEARNING PLANNING MARKOV DECISION PROCESS NON-STATIONARY MARKOV DECISION PROCESS LIFELONG LEARNING

PLANIFICATION APPRENTISSAGE PAR RENFORCEMENT PROCESSUS DECISION MARKOV PROCESSUS DÉCISIONNEL DE MARKOV NON-STATIONNAIRE APPRENTISSAGE TOUT AU LONG DE LA VIE

Domaines

Automatique / Robotique Apprentissage [cs.LG]

Fichier principal

DTIS20148.1600269665.pdf (3.58 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Cécile André : Connectez-vous pour contacter le contributeur

https://hal.science/tel-02962985

Soumis le : vendredi 9 octobre 2020-16:05:13

Dernière modification le : vendredi 5 avril 2024-14:22:02

Archivage à long terme le : dimanche 10 janvier 2021-18:54:44

Dates et versions

tel-02962985 , version 1 (09-10-2020)

Identifiants

HAL Id : tel-02962985 , version 1

Citer

Erwan Lecarpentier. Apprentissage par renforcement en environnement non stationnaire. Automatique / Robotique. Institut Supérieur de l'Aéronautique et de l'Espace (ISAE), 2020. Français. ⟨NNT : ⟩. ⟨tel-02962985⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ONERA TDS-MACS

430 Consultations

116 Téléchargements

Reinforcement Learning in Non-Stationary Environments

Apprentissage par renforcement en environnement non stationnaire

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager