Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems

Thomas Degris; Olivier Sigaud; Pierre-Henri Wuillemin

doi:10.1145/1143844.1143877

Communication Dans Un Congrès Année : 2006

Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems

(1) , (1) , (2)

1
2

Thomas Degris

Fonction : Auteur
PersonId : 968965

Animatlab

Olivier Sigaud

Fonction : Auteur
PersonId : 14932
IdHAL : olivier-sigaud
ORCID : 0000-0002-8544-0229
IdRef : 072724714

Animatlab

Pierre-Henri Wuillemin

Fonction : Auteur
PersonId : 8633
IdHAL : pierre-henri-wuillemin
ORCID : 0000-0003-3691-4886
IdRef : 12747627X

DECISION

Résumé

Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (FMDPs). However, these algorithms need a perfect knowledge of the structure of the problem. In this paper, we propose SDYNA, a general framework for addressing large reinforcement learning problems by trial-and-error and with no initial knowledge of their structure. SDYNA integrates incremental planning algorithms based on FMDPs with supervised learning techniques building structured representations of the problem. We describe SPITI, an instantiation of SDYNA, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm. We show that SPITI can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms by exploiting the generalization property of decision tree induction algorithms.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01336925

Soumis le : vendredi 24 juin 2016-11:07:11

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-01336925 , version 1 (24-06-2016)

Identifiants

HAL Id : hal-01336925 , version 1
DOI : 10.1145/1143844.1143877

Citer

Thomas Degris, Olivier Sigaud, Pierre-Henri Wuillemin. Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems. The 23rd International Conference on Machine Learning, Jun 2006, Pittsburgh, Pennsylvania, United States. pp.257-264, ⟨10.1145/1143844.1143877⟩. ⟨hal-01336925⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

152 Consultations

0 Téléchargements

Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager