Which Temporal Difference Learning algorithm best reproduces dopamine activity in a multi-choice task? - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Which Temporal Difference Learning algorithm best reproduces dopamine activity in a multi-choice task?

Résumé

The activity of dopaminergic (DA) neurons has been hypothesized to encode a reward prediction error (RPE) which corresponds to the error signal in Temporal Difference (TD) learning algorithms. This hypothesis has been reinforced by numerous studies showing the relevance of TD learning algorithms to describe the role of basal ganglia in classical conditioning. However, recent recordings of DA neurons during multi-choice tasks raised contradictory interpretations on whether DA's RPE signal is action dependent or not. Thus the precise TD algorithm (i.e. Actor-Critic, Q-learning or SARSA) that best describes DA signals remains unknown. Here we simulate and precisely analyze these TD algorithms on a multi-choice task performed by rats. We find that DA activity previously reported in this task is best fitted by a TD error which has not fully converged, and which converged faster than observed behavioral adaptation.
Fichier principal
Vignette du fichier
BellotSAB2012_submission2.pdf (385.52 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00731475 , version 1 (12-09-2012)

Identifiants

Citer

Jean Bellot, Olivier Sigaud, Mehdi Khamassi. Which Temporal Difference Learning algorithm best reproduces dopamine activity in a multi-choice task?. International Conference on Simulation of Adaptive Behaviour (SAB 2012), Aug 2012, Odense, Denmark. pp.289-298, ⟨10.1007/978-3-642-33093-3_29⟩. ⟨hal-00731475⟩
166 Consultations
152 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More