R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). A Bradford Book, 1998.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.518, pp.529-533, 2015.

M. G. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton et al., Unifying count-based exploration and intrinsic motivation, Advances in Neural Information Processing Systems, pp.1471-1479, 2016.

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, Curriculum learning, Proceedings of the 26th annual international conference on machine learning, pp.41-48, 2009.

M. E. Taylor and P. Stone, Transfer Learning for Reinforcement Learning Domains : A Survey, Journal of Machine Learning Research, vol.10, pp.1633-1685, 2009.

M. Asada, K. Hosoda, Y. Kuniyoshi, H. Ishiguro, T. Inui et al., Cognitive Developmental Robotics : A Survey, IEEE Transactions on Autonomous Mental Development, vol.1, issue.1, pp.1-44, 2009.

F. Guerin, Learning like a baby: a survey of artificial intelligence approaches, The Knowledge Engineering Review, vol.26, issue.02, pp.209-236, 2011.

M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, The arcade learning environment: An evaluation platform for general agents, International Joint Conference on Artificial Intelligence, vol.47, pp.253-279, 2013.

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman et al., OpenAI Gym, 2016.

R. Smith, Open dynamics engine, 2005.

E. Todorov, T. Erez, and Y. Tassa, Mujoco: A physics engine for modelbased control, IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.5026-5033, 2012.

J. Peters, K. Mülling, and Y. Altun, Relative Entropy Policy Search, Association for the Advancement of Artificial Intelligence. Atlanta, pp.1607-1612, 2010.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine learning, vol.8, issue.3-4, pp.229-256, 1992.

N. Hansen and A. Ostermeier, Completely derandomized selfadaptation in evolution strategies, Evolutionary computation, vol.9, issue.2, pp.159-195, 2001.

V. R. Konda and J. N. Tsitsiklis, Actor-Critic Algorithms, Neural Information Processing Systems, vol.13, pp.1008-1014, 1999.

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup et al., Deep Reinforcement Learning that Matters, 2017.

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez et al., Continuous control with deep reinforcement learning, 2015.

R. Hafner and M. Riedmiller, Reinforcement learning in feedback control, Machine Learning, vol.84, pp.137-169, 2011.

S. Ioffe and C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015.

M. Zimmer, Y. Boniface, and A. Dutech, Neural Fitted Actor-Critic, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01350651

M. Riedmiller, Neural fitted Q iteration-First experiences with a data efficient neural Reinforcement Learning method, Lecture Notes in Computer Science, vol.3720, pp.317-328, 2005.

H. Van-hasselt, Reinforcement Learning in Continuous State and Action Spaces, Reinforcement Learning, pp.207-251, 2012.

M. Zimmer, Apprentissage par renforcement développemental, 2018.

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

R. M. French, Catastrophic forgetting in connectionist networks, Trends in Cognitive Sciences, vol.3, issue.4, pp.128-135, 1999.

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins et al., Overcoming catastrophic forgetting in neural networks, Proceedings of the National Academy of Sciences, p.201611835, 2017.

P. Oudeyer and F. Kaplan, How can we define intrinsic motivation, 8th Conf on Epigenetic Robotics, pp.93-101, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00420175

P. Wawrzy´nskiwawrzy´nski, Learning to control a 6-degree-of-freedom walking robot, International Conference on Computer as a Tool, pp.698-705, 2007.

Y. Tassa, T. Erez, and E. Todorov, Synthesis and stabilization of complex behaviors through online trajectory optimization, IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.4906-4913, 2012.

D. P. Kingma and J. L. Ba, Adam: a Method for Stochastic Optimization, International Conference on Learning Representations, pp.1-13, 2015.

M. Zimmer and S. Doncieux, Bootstrapping Q-Learning for Robotics from Neuro-Evolution Results, IEEE Transactions on Cognitive and Developmental Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01494744

D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent et al., Why does unsupervised pre-training help deep learning?, Journal of Machine Learning Research, vol.11, pp.625-660, 2010.

A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick et al., Progressive Neural Networks, 2016.

C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha et al., Pathnet: Evolution channels gradient descent in super neural networks, 2017.