B. S. Barto-a and . Kaelbling-p, Linear least-squares algorithms for temporal difference learning, 1996.

G. I. Bu¸soniubu¸, . A. Lopes-g, and . Babu?ka-r, A survey of actor-critic reinforcement learning : Standard and natural policy gradients, IEEE Transactions on Systems, Man and Cybernetics, issue.6, pp.42-1291, 2012.

H. N. Ostermeier-a, Completely derandomized self-adaptation in evolution strategies, Evolutionary computation, vol.9, issue.2, pp.159-195, 2001.

I. C. Hüsken-m, Improving the Rprop learning algorithm, International Symposium on Neural Computation, pp.115-121, 2000.

J. W. Eaton-david-bateman-s and . Wehbring-r, {GNU Octave} version 4.0.0 manual : a high-level interactive language for numerical computations, 2015.

K. V. Tsitsiklis-j, Actor-Critic Algorithms, Neural Information Processing Systems, pp.1008-1014, 1999.

M. V. Kavukcuoglu-k, R. A. Silver-d, . Veness-j, . G. Bellemare-m, R. M. Graves-a et al., Human-level control through deep reinforcement learning, Nature, vol.518, pp.529-533, 2015.

R. M. Braun-h, RPROP -A Fast Adaptive Learning Algorithm, International Symposium on Computer and Information Science VII, 1992.

R. M. and P. J. Schaal-s, Evaluation of policy gradient methods and variants on the cart-pole benchmark, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.254-261, 2007.

S. R. Barto-a, Reinforcement Learning, 1998.
DOI : 10.1016/B978-012526430-3/50003-9

S. R. , M. D. Singh-s, and . Mansour-y, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in Neural Information Processing Systems 12, pp.1057-1063, 1999.

V. Hasselt and H. , Reinforcement Learning in Continuous State and Action Spaces, Reinforcement Learning, pp.207-251, 2012.
DOI : 10.1007/978-3-642-27645-3_7

V. Hasselt and H. A. Wiering-m, Reinforcement Learning in Continuous Action Spaces, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.272-279, 2007.
DOI : 10.1109/ADPRL.2007.368199