[. References and . Colas, GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms, International Conference on Machine Learning, 2018.

. Henderson, Approximately optimal approximate reinforcement learning, Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, vol.2, pp.1-13, 2002.

R. Vijay, J. N. Konda, . Tsitsiklis, and . Lillicrap, Josef Maatyas. Random optimization. Automation and Remote control, Continuous control with deep reinforcement learning. International Conference on Learning Representations, vol.13, p.16, 1965.

. Sutton, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Proceedings of the 31st International Conference on Machine Learning, vol.12, pp.1057-1063, 1999.

R. S. Sutton and . Sutton, Learning to predict by the methods of temporal differences, Machine learning, vol.3, issue.1, pp.9-44, 1988.

W. Hasselt, M. A. Hasselt, . Wiering, and . Zimmer, Developmental reinforcement learning through sensorimotor space enlargement, The 8th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, pp.272-279, 2007.