Reward-­based online learning in non­stationary environments: adapting a P300­-speller with a ``Backspace’’ key

Emmanuel Daucé; Timothée Proix; Liva Ralaivola

Communication Dans Un Congrès Année : 2015

Reward-based online learning in nonstationary environments: adapting a P300-speller with a ``Backspace’’ key

(1) , (1) , (2)

1
2

Emmanuel Daucé

Fonction : Auteur
PersonId : 943315

Institut de Neurosciences des Systèmes

Timothée Proix

Fonction : Auteur
PersonId : 969978

Institut de Neurosciences des Systèmes

Liva Ralaivola

Fonction : Auteur
PersonId : 5004
IdHAL : livaralaivola
ORCID : 0000-0002-4571-1119
IdRef : 089319060

éQuipe AppRentissage et MultimediA [Marseille]

Résumé

We adapt a policy gradient approach to the problem of reward-based online learning of a non-invasive EEG-based ``P300''-speller. We first clarify the nature of the P300-speller classification problem and present a general regularized gradient ascent formula. We then show that when the reward is immediate and binary (namely ``bad response'' or ``good response''), each update is expected to improve the classifier accuracy, whether the actual response is correct or not. We also estimate the robustness of the method to occasional mistaken rewards, i.e. show that the learning efficacy may only linearly decrease with the rate of invalid rewards. The effectiveness of our approach is tested in a series of simulations reproducing the conditions of real experiments. We show in a first experiment that a systematic improvement of the spelling rate is obtained for all subjects in the absence of initial calibration. In a second experiment, we consider the case of the online recovery that is expected to follow unforeseen impairments. Combined with a specific failure detection algorithm, the spelling error information (typically contained in a ``backspace'' hit), is shown useful for the policy gradient to adapt the P300 classifier to the new situation, provided the feedback is reliable enough (namely having a reliability greater than 70%).

Mots clés

policy gradient P300 speller online learning

Domaines

Apprentissage [cs.LG] Machine Learning [stat.ML]

Liva Ralaivola : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01196513

Soumis le : mercredi 9 septembre 2015-23:01:06

Dernière modification le : vendredi 24 mars 2023-14:53:01

Dates et versions

hal-01196513 , version 1 (09-09-2015)

Identifiants

HAL Id : hal-01196513 , version 1

Citer

Emmanuel Daucé, Timothée Proix, Liva Ralaivola. Reward-based online learning in nonstationary environments: adapting a P300-speller with a ``Backspace’’ key. IJCNN 2015, Jul 2015, Killarney, Ireland. ⟨hal-01196513⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

LIF CNRS UNIV-AMU EC-MARSEILLE LIS-LAB INS

87 Consultations

0 Téléchargements