Which Temporal Difference Learning algorithm best reproduces dopamine activity in a multi-choice task?

Jean Bellot; Olivier Sigaud; Mehdi Khamassi

doi:10.1007/978-3-642-33093-3_29

Communication Dans Un Congrès Année : 2012

Which Temporal Difference Learning algorithm best reproduces dopamine activity in a multi-choice task?

(1, 2) , (1, 2) , (1, 2)

1
2

Jean Bellot

Fonction : Auteur

Institut des Systèmes Intelligents et de Robotique

AMAC

Olivier Sigaud

Fonction : Auteur
PersonId : 14932
IdHAL : olivier-sigaud
ORCID : 0000-0002-8544-0229
IdRef : 072724714

Institut des Systèmes Intelligents et de Robotique

AMAC

Mehdi Khamassi

Fonction : Auteur correspondant
PersonId : 186
IdHAL : mehdi-khamassi
ORCID : 0000-0002-2515-1046
IdRef : 12845072X

Connectez-vous pour contacter l'auteur

Institut des Systèmes Intelligents et de Robotique

AMAC

Résumé

The activity of dopaminergic (DA) neurons has been hypothesized to encode a reward prediction error (RPE) which corresponds to the error signal in Temporal Difference (TD) learning algorithms. This hypothesis has been reinforced by numerous studies showing the relevance of TD learning algorithms to describe the role of basal ganglia in classical conditioning. However, recent recordings of DA neurons during multi-choice tasks raised contradictory interpretations on whether DA's RPE signal is action dependent or not. Thus the precise TD algorithm (i.e. Actor-Critic, Q-learning or SARSA) that best describes DA signals remains unknown. Here we simulate and precisely analyze these TD algorithms on a multi-choice task performed by rats. We find that DA activity previously reported in this task is best fitted by a TD error which has not fully converged, and which converged faster than observed behavioral adaptation.

Domaines

Neurosciences [q-bio.NC] Apprentissage [cs.LG]

Fichier principal

BellotSAB2012_submission2.pdf (385.52 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mehdi Khamassi : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00731475

Soumis le : mercredi 12 septembre 2012-19:55:46

Dernière modification le : mercredi 27 mars 2024-15:02:03

Archivage à long terme le : jeudi 13 décembre 2012-03:47:30

Dates et versions

hal-00731475 , version 1 (12-09-2012)

Identifiants

HAL Id : hal-00731475 , version 1
DOI : 10.1007/978-3-642-33093-3_29

Citer

Jean Bellot, Olivier Sigaud, Mehdi Khamassi. Which Temporal Difference Learning algorithm best reproduces dopamine activity in a multi-choice task?. International Conference on Simulation of Adaptive Behaviour (SAB 2012), Aug 2012, Odense, Denmark. pp.289-298, ⟨10.1007/978-3-642-33093-3_29⟩. ⟨hal-00731475⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS ISIR SORBONNE-UNIVERSITE SU-SCIENCES ISIR_AMAC

166 Consultations

152 Téléchargements

Which Temporal Difference Learning algorithm best reproduces dopamine activity in a multi-choice task?

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager