S. Richard, A. G. Sutton, and . Barto, Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning), 1998.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, pp.529-533, 2015.
DOI : 10.1038/nature14236

M. Asada, K. Hosoda, Y. Kuniyoshi, H. Ishiguro, T. Inui et al., Cognitive Developmental Robotics: A Survey, IEEE Transactions on Autonomous Mental Development, vol.1, issue.1, pp.1-44, 2009.
DOI : 10.1109/TAMD.2009.2021702

J. Steven, A. G. Bradtke, P. Barto, and . Kaelbling, Linear least-squares algorithms for temporal difference learning, 1996.

M. Riedmiller, Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, In Lecture Notes in Computer Science, vol.3720, pp.317-328, 2005.
DOI : 10.1007/11564096_32

C. Igel and M. Hüsken, Improving the Rprop learning algorithm, International Symposium on Neural Computation, pp.115-121, 2000.

M. Riedmiller and H. Braun, RPROP -A Fast Adaptive Learning Algorithm, International Symposium on Computer and Information Science VII, 1992.

R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in Neural Information Processing Systems 12, pp.1057-1063, 1999.

N. Hansen and A. Ostermeier, Completely Derandomized Self-Adaptation in Evolution Strategies, Evolutionary Computation, vol.9, issue.2, pp.159-195, 2001.
DOI : 10.1016/0004-3702(95)00124-7

I. Grondman, L. Bu¸soniubu¸soniu, A. Gabriel, R. Lopes, and . Babu?ka, A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol.42, issue.6, pp.1291-1307, 2012.
DOI : 10.1109/TSMCC.2012.2218595

URL : https://hal.archives-ouvertes.fr/hal-00756747

R. Vijay, J. N. Konda, and . Tsitsiklis, Actor-Critic Algorithms, Neural Information Processing Systems, pp.1008-1014, 1999.

H. Van-hasselt and M. A. Wiering, Reinforcement Learning in Continuous Action Spaces, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.272-279, 2007.
DOI : 10.1109/ADPRL.2007.368199

W. Mark and . Spong, Swing up control problem for the acrobot, IEEE Control Systems Magazine, vol.15, issue.1, pp.49-55, 1995.

M. Riedmiller, J. Peters, and S. Schaal, Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.254-261, 2007.
DOI : 10.1109/ADPRL.2007.368196

R. Smith, Open dynamics engine, 2005.

H. Van-hasselt, Reinforcement Learning in Continuous State and Action Spaces, Reinforcement Learning, pp.207-251, 2012.
DOI : 10.1007/978-3-642-27645-3_7