Reward Function Learning for Dialogue Management

Layla El Asri; Romain Laroche; Olivier Pietquin

doi:10.3233/978-1-61499-096-3-95

Communication Dans Un Congrès Année : 2012

Reward Function Learning for Dialogue Management

(1) , (2) , (1)

1
2

Layla El Asri

Fonction : Auteur
PersonId : 932394

IMS : Information, Multimodalité & Signal

Romain Laroche

Fonction : Auteur

Orange Labs [Issy les Moulineaux]

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

IMS : Information, Multimodalité & Signal

Résumé

This paper addresses the problem of defining, from data, a reward function in a Reinforcement Learning (RL) problem. This issue is applied to the case of Spoken Dialogue Systems (SDS), which are interfaces enabling users to interact in natural language. A new methodology which, from system evaluation, apportions rewards over the system's state space, is suggested. A corpus of dialogues is collected on-line and then evaluated by experts, assigning a numerical performance score to each dialogue according to the quality of dialogue management. The approach described in this paper infers, from these scores, a locally distributed reward function which can be used on-line. Two algorithms achieving this goal are proposed. These algorithms are tested on an SDS and it is showed that in both cases, the resulting numerical rewards are close to the performance scores and thus, that it is possible to extract relevant information from performance evaluation to optimise on- line learning.

Domaines

Apprentissage [cs.LG]

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00749430

Soumis le : mercredi 7 novembre 2012-15:14:08

Dernière modification le : mardi 14 février 2023-03:37:58

Dates et versions

hal-00749430 , version 1 (07-11-2012)

Identifiants

HAL Id : hal-00749430 , version 1
DOI : 10.3233/978-1-61499-096-3-95

Citer

Layla El Asri, Romain Laroche, Olivier Pietquin. Reward Function Learning for Dialogue Management. STAIRS 2012, Aug 2012, Montpellier, France. pp.95-106, ⟨10.3233/978-1-61499-096-3-95⟩. ⟨hal-00749430⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CENTRALESUPELEC

71 Consultations

0 Téléchargements

Reward Function Learning for Dialogue Management

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager