Linear least-squares algorithms for temporal difference learning, 1996. ,
A survey of actor-critic reinforcement learning : Standard and natural policy gradients, IEEE Transactions on Systems, Man and Cybernetics, issue.6, pp.42-1291, 2012. ,
Completely derandomized self-adaptation in evolution strategies, Evolutionary computation, vol.9, issue.2, pp.159-195, 2001. ,
Improving the Rprop learning algorithm, International Symposium on Neural Computation, pp.115-121, 2000. ,
{GNU Octave} version 4.0.0 manual : a high-level interactive language for numerical computations, 2015. ,
Actor-Critic Algorithms, Neural Information Processing Systems, pp.1008-1014, 1999. ,
Human-level control through deep reinforcement learning, Nature, vol.518, pp.529-533, 2015. ,
RPROP -A Fast Adaptive Learning Algorithm, International Symposium on Computer and Information Science VII, 1992. ,
Evaluation of policy gradient methods and variants on the cart-pole benchmark, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.254-261, 2007. ,
Reinforcement Learning, 1998. ,
DOI : 10.1016/B978-012526430-3/50003-9
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in Neural Information Processing Systems 12, pp.1057-1063, 1999. ,
Reinforcement Learning in Continuous State and Action Spaces, Reinforcement Learning, pp.207-251, 2012. ,
DOI : 10.1007/978-3-642-27645-3_7
Reinforcement Learning in Continuous Action Spaces, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.272-279, 2007. ,
DOI : 10.1109/ADPRL.2007.368199