G. Alexander, M. Crutcher, and M. Delong, Chapter 6 Basal ganglia-thalamocortical circuits: Parallel substrates for motor, oculomotor, ???prefrontal??? and ???limbic??? functions, Prog Brain Res, vol.85, pp.119-165, 1990.
DOI : 10.1016/S0079-6123(08)62678-3

W. Alexander and J. Brown, Medial prefrontal cortex as an action-outcome predictor, Nature Neuroscience, vol.8, issue.10, pp.1338-1344, 2011.
DOI : 10.1073/pnas.012470999
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3183374/pdf

C. Amiez, J. Joseph, P. Edehaene, S. Duhamel, J. Hauser et al., Primate anterior cingulate cortex and adaptation of behaviour In: From monkey brain to human brain, 2005.

C. Amiez, J. Joseph, and E. Procyk, Anterior cingulate error-related activity is modulated by predicted reward, European Journal of Neuroscience, vol.7, issue.12, pp.3447-3452, 2005.
DOI : 10.1007/s00221-002-1353-9
URL : https://hal.archives-ouvertes.fr/inserm-00132130

C. Amiez, J. Joseph, and E. Procyk, Reward Encoding in the Monkey Anterior Cingulate Cortex, Cerebral Cortex, vol.16, issue.7, pp.1040-1055, 2006.
DOI : 10.1093/cercor/bhj046
URL : https://hal.archives-ouvertes.fr/inserm-00132137

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

D. Badre and A. Wagner, Selection, Integration, and Conflict Monitoring, Neuron, vol.41, issue.3, pp.473-487, 2004.
DOI : 10.1016/S0896-6273(03)00851-1
URL : https://doi.org/10.1016/s0896-6273(03)00851-1

H. Bayer and P. Glimcher, Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal, Neuron, vol.47, issue.1, pp.129-141, 2005.
DOI : 10.1016/j.neuron.2005.05.020
URL : https://doi.org/10.1016/j.neuron.2005.05.020

T. Behrens, M. Woolrich, M. Walton, and M. Rushworth, Learning the value of information in an uncertain world, Nature Neuroscience, vol.1104, issue.9, pp.1214-1221, 2007.
DOI : 10.1038/nn1954

M. Botvinick, T. Braver, D. Barch, C. Carter, and J. Cohen, Conflict monitoring and cognitive control., Psychological Review, vol.108, issue.3, pp.624-652, 2001.
DOI : 10.1037/0033-295X.108.3.624

J. Brown and T. Braver, Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex, Science, vol.307, issue.5712, pp.1118-1121, 2005.
DOI : 10.1126/science.1105783

N. Cesa-bianchi, L. Gabor, and G. Stoltz, Regret minimization under partial monitoring, Math Oper Res, vol.31, 2006.
DOI : 10.1109/itw.2006.1633784
URL : https://hal.archives-ouvertes.fr/hal-00007538

R. Chavarriaga, T. Strösslin, D. Sheynikhovich, and W. Gerstner, A Computational Model of Parallel Navigation Systems in Rodents, Neuroinformatics, vol.3, issue.3, pp.223-265, 2005.
DOI : 10.1385/NI:3:3:223

J. Cohen, S. Mcclure, and A. Yu, Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.46, issue.4, pp.933-975, 2007.
DOI : 10.1037/0033-295X.111.4.939
URL : http://rstb.royalsocietypublishing.org/content/royptb/362/1481/933.full.pdf

N. Daw, Y. Niv, and P. Dayan, Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control, Nature Neuroscience, vol.58, issue.12, pp.1704-1711, 2005.
DOI : 10.1080/14640748208400878

N. Daw, O. Doherty, J. Dayan, P. Seymour, B. Dolan et al., Cortical substrates for exploratory decisions in humans, Nature, vol.15, issue.7095, pp.876-879, 2006.
DOI : 10.1038/nature04766
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2635947/pdf

K. Doya, Metalearning and neuromodulation, Neural Networks, vol.15, issue.4-6, pp.495-506, 2002.
DOI : 10.1016/S0893-6080(02)00044-8

K. Doya, Modulators of decision making, Nature Neuroscience, vol.55, issue.4, pp.410-426, 2008.
DOI : 10.1162/003355397555253

D. Durstewitz and J. Seamans, The Dual-State Theory of Prefrontal Cortex Dopamine Function with Relevance to Catechol-O-Methyltransferase Genotypes and Schizophrenia, Biological Psychiatry, vol.64, issue.9, pp.739-749, 2008.
DOI : 10.1016/j.biopsych.2008.05.015

K. Fluxe, T. Hokfelt, O. Johansson, G. Jonsson, P. Lidbrink et al., The origin of the dopamine nerve terminals in limbic and frontal cortex. Evidence for mesocortico dopamine neurons, Brain Research, vol.82, pp.349-55, 1974.

M. Frank, Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism, Journal of Cognitive Neuroscience, vol.16, issue.1, pp.51-72, 2005.
DOI : 10.1016/S0028-3932(02)00068-4

M. Frank, B. Doll, J. Oas-terpstra, and F. Moreno, Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation, Nature Neuroscience, vol.23, issue.8, pp.1062-1070, 2009.
DOI : 10.1093/nar/29.17.e88
URL : http://www.nature.com/neuro/journal/v13/n5/pdf/nn0510-649a.pdf

A. Garivier and E. Moulines, On upper-confidence bound policies for nonstationary bandit problems, 2008.
DOI : 10.1007/978-3-642-24412-4_16
URL : https://hal.archives-ouvertes.fr/hal-00281392

C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud, and M. S. , Multi-armed bandit, dynamic environments and meta-bandits Online trading between exploration and exploitation, NIPS-2006 workshop, 2006.

C. Holroyd and M. Coles, The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity., Psychological Review, vol.109, issue.4, pp.679-709, 2002.
DOI : 10.1037/0033-295X.109.4.679

J. Houk, J. Adams, and A. Barto, A model of how the basal ganglia generate and use neural signals that predict reinforcement In: Models of information processing in the basal ganglia, pp.249-270, 1995.

M. Humphries and T. Prescott, The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward., Progress in Neurobiology, vol.90, issue.4, pp.385-417, 2010.
DOI : 10.1016/j.pneurobio.2009.11.003

S. Ishii, W. Yoshida, and J. Yoshimoto, Control of exploitation???exploration meta-parameter in reinforcement learning, Neural Networks, vol.15, issue.4-6, pp.665-687, 2002.
DOI : 10.1016/S0893-6080(02)00056-4

K. Johnston, H. Levin, M. Koval, and S. Everling, Top-Down Control-Signal Dynamics in Anterior Cingulate and Prefrontal Cortex Neurons following Task Switching, Neuron, vol.53, issue.3, pp.453-462, 2007.
DOI : 10.1016/j.neuron.2006.12.023
URL : https://doi.org/10.1016/j.neuron.2006.12.023

S. Kennerley, M. Walton, T. Behrens, M. Buckley, and M. Rushworth, Optimal decision making and the anterior cingulate cortex, Nature Neuroscience, vol.336, issue.7, pp.940-947, 2006.
DOI : 10.1038/nn1724

M. Khamassi, L. Lachèze, B. Girard, A. Berthoz, and A. Guillot, Actor???Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats, Adaptive Behavior, vol.19, issue.1, pp.131-179, 2005.
DOI : 10.1111/j.1460-9568.2004.03095.x
URL : https://hal.archives-ouvertes.fr/hal-00016390

M. Khamassi, A. Mulder, E. Tabuchi, V. Douchamps, and S. Wiener, Anticipatory reward signals in ventral striatal neurons of behaving rats, European Journal of Neuroscience, vol.7, issue.9, pp.1849-1866, 2008.
DOI : 10.1007/11840541_33
URL : https://hal.archives-ouvertes.fr/hal-00618294

M. Khamassi, S. Lallée, P. Enel, E. Procyk, and P. Dominey, Robot Cognitive Control with a Neurophysiologically Inspired Reinforcement Learning Model, Frontiers in Neurorobotics, vol.5, 2011.
DOI : 10.3389/fnbot.2011.00001
URL : https://hal.archives-ouvertes.fr/hal-00688931

N. Kolling, T. Behrens, R. Mars, and M. Rushworth, Neural Mechanisms of Foraging, Science, vol.21, issue.4, pp.95-98, 2012.
DOI : 10.1016/j.neuroimage.2003.12.023
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3440844/pdf

F. Kouneiher, S. Charron, and E. Koechlin, Motivation and cognitive control in the human prefrontal cortex, Nature Neuroscience, vol.19, issue.7, pp.939-945, 2009.
DOI : 10.1016/S1053-8119(03)00058-2

J. Krichmar, The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World, Adaptive Behavior, vol.46, issue.6, pp.385-399, 2008.
DOI : 10.1016/j.neuron.2005.04.026

G. Luksys, W. Gerstner, and C. Sandi, Stress, genotype and norepinephrine in the prediction of mouse behavior using reinforcement learning, Nature Neuroscience, vol.11, issue.9, pp.1180-1186, 2009.
DOI : 10.1097/00008877-200302000-00004

A. Macdonald, J. Cohen, V. Stenger, and C. Carter, Dissociating the Role of the Dorsolateral Prefrontal and Anterior Cingulate Cortex in Cognitive Control, Science, vol.288, issue.5472, pp.1835-1838, 2000.
DOI : 10.1126/science.288.5472.1835

R. Mars, J. Sallet, M. Rushwort, and N. Yeung, Neural basis of motivational and cognitive control, 2011.
DOI : 10.7551/mitpress/9780262016438.001.0001

M. Matsumoto, K. Matsumoto, H. Abe, and K. Tanaka, Medial prefrontal cell activity signaling prediction errors of action values, Nature Neuroscience, vol.93, issue.5, pp.647-656, 2007.
DOI : 10.1126/science.1069504

E. Miller and J. Cohen, An Integrative Theory of Prefrontal Cortex Function, Annual Review of Neuroscience, vol.24, issue.1, pp.167-202, 2001.
DOI : 10.1146/annurev.neuro.24.1.167

G. Morris, A. Nevet, D. Arkadir, E. Vaadia, and H. Bergman, Midbrain dopamine neurons encode decisions for future action, Nature Neuroscience, vol.92, issue.8, pp.1057-1063, 2006.
DOI : 10.1038/nn1743

T. Paus, Primate anterior cingulate cortex: Where motor control, drive and cognition interface, Nature Reviews Neuroscience, vol.359, issue.6, pp.417-424, 2001.
DOI : 10.1002/cne.903590310

E. Procyk, Y. Tanaka, and J. Joseph, Anterior cingulate activity during routine and non-routine sequential behaviors in macaques, Nature Neuroscience, vol.3, issue.5, pp.502-508, 2000.
DOI : 10.1038/74880
URL : https://hal.archives-ouvertes.fr/hal-00131425

E. Procyk and J. Joseph, Characterization of serial order encoding in the monkey anterior cingulate sulcus, European Journal of Neuroscience, vol.39, issue.6, pp.1041-1046, 2001.
DOI : 10.1016/S0168-0102(00)00198-X
URL : https://hal.archives-ouvertes.fr/inserm-00132258

E. Procyk and P. Goldman-rakic, Modulation of Dorsolateral Prefrontal Delay Activity during Self-Organized Behavior, Journal of Neuroscience, vol.26, issue.44, pp.11313-11323, 2006.
DOI : 10.1523/JNEUROSCI.2157-06.2006
URL : https://hal.archives-ouvertes.fr/inserm-00132158

R. Quilodran, M. Rothé, and E. Procyk, Behavioral Shifts and Action Valuation in the Anterior Cingulate Cortex, Neuron, vol.57, issue.2, pp.314-325, 2008.
DOI : 10.1016/j.neuron.2007.11.031
URL : https://hal.archives-ouvertes.fr/inserm-00906686

J. Reynolds, B. Hyland, and J. Wickens, A cellular mechanism of reward-related learning, Nature, vol.48, issue.6851, pp.67-70, 2001.
DOI : 10.1016/0091-3057(94)90192-9

M. Rothé, R. Quilodran, J. Sallet, and E. Procyk, Coordination of High Gamma Activity in Anterior Cingulate and Lateral Prefrontal Cortical Areas during Adaptation, Journal of Neuroscience, vol.31, issue.31, pp.11110-11117, 2011.
DOI : 10.1523/JNEUROSCI.1016-11.2011

P. Rudebeck, T. Behrens, S. Kennerley, M. Baxter, M. Buckley et al., Frontal Cortex Subregions Play Distinct Roles in Choices between Actions and Stimuli, Journal of Neuroscience, vol.28, issue.51, pp.13775-13785, 2008.
DOI : 10.1523/JNEUROSCI.3541-08.2008
URL : http://www.jneurosci.org/content/jneuro/28/51/13775.full.pdf

M. Rushworth and T. Behrens, Choice, uncertainty and value in prefrontal and cingulate cortex, Nature Neuroscience, vol.9, issue.4, pp.389-397, 2008.
DOI : 10.1038/nn2066

J. Sallet, R. Quilodran, M. Rothé, J. Vezoli, J. Joseph et al., Expectations, gains, and losses in the anterior cingulate cortex, Cognitive, Affective, & Behavioral Neuroscience, vol.7, issue.4, pp.327-336, 2007.
DOI : 10.3758/CABN.7.4.327
URL : https://hal.archives-ouvertes.fr/inserm-00256218

K. Samejima, Y. Ueda, K. Doya, and M. Kimura, Representation of Action-Specific Reward Values in the Striatum, Science, vol.310, issue.5752, pp.1337-1340, 2005.
DOI : 10.1126/science.1115270

W. Schultz, P. Dayan, and P. Montague, A Neural Substrate of Prediction and Reward, Science, vol.263, issue.5149, pp.1593-1599, 1997.
DOI : 10.1126/science.7508638

N. Schweighofer and K. Doya, Meta-learning in Reinforcement Learning, Neural Networks, vol.16, issue.1, pp.5-9, 2003.
DOI : 10.1016/S0893-6080(02)00228-9

N. Schweighofer, S. Tanaka, and K. Doya, Serotonin and the Evaluation of Future Rewards: Theory, Experiments, and Possible Neural Mechanisms, Annals of the New York Academy of Sciences, vol.23, issue.1, pp.289-300, 2007.
DOI : 10.1046/j.0953-816x.2001.01616.x

H. Seo and D. Lee, Temporal Filtering of Reward Signals in the Dorsal Anterior Cingulate Cortex during a Mixed-Strategy Game, Journal of Neuroscience, vol.27, issue.31, pp.8366-8377, 2007.
DOI : 10.1523/JNEUROSCI.2369-07.2007

H. Seo and D. Lee, Cortical mechanisms for reinforcement learning in competitive games, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.447, issue.7148, pp.3845-3857, 2008.
DOI : 10.1038/nature05852
URL : http://rstb.royalsocietypublishing.org/content/royptb/363/1511/3845.full.pdf

H. Seo and D. Lee, Behavioral and Neural Changes after Gains and Losses of Conditioned Reinforcers, Journal of Neuroscience, vol.29, issue.11, pp.3627-3641, 2009.
DOI : 10.1523/JNEUROSCI.4726-08.2009
URL : http://www.jneurosci.org/content/jneuro/29/11/3627.full.pdf

K. Shima and J. Tanji, Role for Cingulate Motor Area Cells in Voluntary Movement Selection Based on Reward, Science, vol.282, issue.5392, pp.1335-1338, 1998.
DOI : 10.1126/science.282.5392.1335

R. Silton, W. Heller, D. Towers, A. Engels, J. Spielberg et al., The time course of activity in dorsolateral prefrontal cortex and anterior cingulate cortex during top-down attentional control, NeuroImage, vol.50, issue.3, pp.1292-1302, 2010.
DOI : 10.1016/j.neuroimage.2009.12.061

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

J. Sul, H. Kim, N. Huh, D. Lee, and M. Jung, Distinct Roles of Rodent Orbitofrontal and Medial Prefrontal Cortex in Decision Making, Neuron, vol.66, issue.3, pp.449-460, 2010.
DOI : 10.1016/j.neuron.2010.03.033

S. Tanaka, K. Doya, G. Okada, K. Ueda, Y. Okamoto et al., Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops, Nature Neuroscience, vol.21, issue.8, pp.887-893, 2004.
DOI : 10.1126/science.1084204

A. Yu and P. Dayan, Uncertainty, Neuromodulation, and Attention, Neuron, vol.46, issue.4, pp.681-92, 2005.
DOI : 10.1016/j.neuron.2005.04.026
URL : https://doi.org/10.1016/j.neuron.2005.04.026