GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms, International Conference on Machine Learning, 2018. ,
Approximately optimal approximate reinforcement learning, Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, vol.2, pp.1-13, 2002. ,
Josef Maatyas. Random optimization. Automation and Remote control, Continuous control with deep reinforcement learning. International Conference on Learning Representations, vol.13, p.16, 1965. ,
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Proceedings of the 31st International Conference on Machine Learning, vol.12, pp.1057-1063, 1999. ,
Learning to predict by the methods of temporal differences, Machine learning, vol.3, issue.1, pp.9-44, 1988. ,
Developmental reinforcement learning through sensorimotor space enlargement, The 8th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, pp.272-279, 2007. ,