J. S. Albus, A New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC), Journal of Dynamic Systems, pp.220-227, 1975.
DOI : 10.1115/1.3426922

C. W. Anderson, Strategy Learning with Multilayer Connectionist Representations, Proceedings of the Fourth International Workshop on Machine Learning, pp.103-114, 1987.
DOI : 10.1016/B978-0-934613-41-5.50014-3

C. W. Anderson, Approximating a policy can be easier than approximating a value function, p.118, 2000.

G. Christopher and . Atkeson, Using local trajectory optimizers to speed up global optimization in dynamic programming, Advances in Neural Information Processing Systems 6, p.50, 1994.

G. Christopher, J. C. Atkeson, and . Santamaría, A comparison of direct and model-based reinforcement learning, International Conference on Robotics and Automation, p.29, 1997.

C. Leemon and I. Baird, Advantage updating Available from the Defense Technical Information Center, pp.22304-6145, 1993.

C. Leemon and I. Baird, Residual algorithms: Reinforcement learning with function approximation, Machine Learning: Proceedings of the Twelfth International Conference, p.69, 1995.

C. Leemon, I. Baird, and A. H. Klopf, Reinforcement learning with high-dimensional, continuous actions, Available from the Defense Technical Information Center, pp.22304-6145, 1993.

R. Andrew and . Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions on Information Theory, vol.39, issue.3, pp.930-945, 1966.

A. G. Barto, S. J. Bradtke, and S. P. Singh, Learning to act using real-time dynamic programming, Artificial Intelligence, vol.72, issue.1-2, pp.81-138, 1995.
DOI : 10.1016/0004-3702(94)00011-O
URL : http://doi.org/10.1016/0004-3702(94)00011-o

A. G. Barto, R. S. Sutton, and C. W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, vol.13, issue.5, pp.835-846, 1983.
DOI : 10.1109/TSMC.1983.6313077

J. Baxter, A. Tridgell, and L. Weaver, Experiments in parameter learning using temporal differences, ICCA Journal, vol.21, issue.22, pp.84-89, 1998.

R. Bellman, Dynamic Programming, pp.14-35, 1957.

D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, p.35, 1995.

P. Dimitri, J. N. Bertsekas, and . Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, vol.67, issue.74, p.114, 1996.

C. M. Bishop, Neural Networks for Pattern Recognition, p.55, 1995.

G. Boone, Efficient reinforcement learning: model-based Acrobot control, Proceedings of International Conference on Robotics and Automation, pp.229-234, 1997.
DOI : 10.1109/ROBOT.1997.620043

G. Boone, Minimum-time control of the Acrobot, Proceedings of International Conference on Robotics and Automation, pp.3281-3287, 1997.
DOI : 10.1109/ROBOT.1997.606789

J. A. Boyan and A. W. Moore, Generalization in reinforcement learning: Safely approximating the value function, Advances in Neural Information Processing Systems, p.72, 1995.

H. Robert, A. G. Crites, and . Barto, Elevator group control using multiple reinforcement learning agents, Machine Learning, pp.235-262, 1998.

P. Dayan, The Convergence of TD(??) for General ??, Machine Learning, pp.341-362, 1992.
DOI : 10.1007/978-1-4615-3618-5_7

P. Dayan and T. J. Sejnowski, TD(?) converges with probability 1, Machine Learning, pp.295-301, 1994.

K. Doya, Temporal difference learning in continuous time and space, Advances in Neural Information Processing Systems, pp.1073-1079, 1996.

K. Doya, Reinforcement Learning in Continuous Time and Space, Neural Computation, vol.3, issue.1, pp.243-269, 2000.
DOI : 10.1109/9.580874

V. Stanislav, S. K. Emelyanov, L. V. Korovin, and . Levantovsky, Higher order sliding modes in binary control systems, Soviet Physics, Doklady, vol.31, issue.4, pp.291-293, 1986.

E. Scott and . Fahlman, An empirical study of learning speed in backpropagation networks, p.93, 1988.

A. F. Filippov, Differential equations with discontinuous right-hand side, Trans. Amer. Math. Soc. Ser, vol.2, issue.42, pp.199-231, 1964.
DOI : 10.1090/trans2/042/13

C. Gaskett, D. Wettergreen, and A. Zelinsky, Q-Learning in Continuous State and Action Spaces, Proceedings of 12th Australian Joint Conference on Artificial Intelligence, p.76, 1999.
DOI : 10.1007/3-540-46695-9_35

G. J. Gordon, Stable Function Approximation in Dynamic Programming, Machine Learning: Proceedings of the Twelfth International Conference, pp.261-268, 1995.
DOI : 10.1016/B978-1-55860-377-6.50040-2

M. Hardt, K. Kreutz-delgado, J. W. Helton, and O. Stryk, Obtaining minimum energy biped walking gaits with symbolic models and numerical optimal control, Workshop?Biomechanics meets Robotics , Modelling and Simulation of Motion, p.27, 1999.

R. A. Howard, Dynamic Programming and Markov Processes, p.41, 1960.

T. Jaakkola, M. I. Jordan, and S. P. Singh, On the Convergence of Stochastic Iterative Dynamic Programming Algorithms, Neural Computation, vol.8, issue.6, pp.1185-1201, 1994.
DOI : 10.1214/aoms/1177729586

L. Pack-kaelbling, M. L. Littman, and A. W. Moore, Reinforcement learning: A survey, Journal of Artificial Intelligence Research, vol.4, issue.11, pp.237-285, 1996.

Y. Koike and K. Doya, Multiple state estimation reinforcement learning for driving model: driver model of automobile, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), pp.504-509, 1999.
DOI : 10.1109/ICSMC.1999.815603

R. Lachner, M. H. Breitner, and H. J. Pesch, Real-time collision avoidance against wrong drivers: Differential game approach, numerical solution , and synthesis of strategies with neural networks, Proceedings of the Seventh International Symposium on Dynamic Games and Applications, p.28, 1996.

Y. Le-cun, Learning processes in an asymmetric threshold network, Disordered Systems and Biological Organization, pp.233-240, 1986.

Y. Le-cun, L. Bottou, G. B. Orr, and K. Müller, Efficient BackProp, Neural Networks: Tricks of the Trade, pp.59-61, 1998.

J. Meyer, S. Doncieux, D. Filliat, and A. Guillot, Evolutionary Approaches to Neural Control of Rolling, Walking, Swimming and Flying Animats or Robots, Biologically Inspired Robot Behavior Engineering, p.28
DOI : 10.1007/978-3-7908-1775-1_1
URL : https://hal.archives-ouvertes.fr/hal-00655476

F. Martin and . Møller, A scaled conjugate gradient algorithm for fast supervised learning, Neural Networks, vol.6, issue.59, pp.525-533, 1993.

J. Moody and C. Darken, Fast Learning in Networks of Locally-Tuned Processing Units, Neural Computation, vol.1, issue.2, pp.281-294, 1989.
DOI : 10.1109/MASSP.1987.1165576

J. Morimoto and K. Doya, Hierarchical reinforcement learning of low-dimensional subgoals and high-dimensional trajectories, Proceedings of the Fifth International Conference on Neural Information Processing, pp.850-853, 1998.

J. Morimoto and K. Doya, Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Proceedings of 17th International Conference on Machine Learning, pp.623-630, 2000.
DOI : 10.1016/S0921-8890(01)00113-0

R. Munos, A convergent reinforcement learning algorithm in the continuous case based on a finite difference method, International Joint Conference on Artificial Intelligence, p.50, 1997.

R. Munos, L. C. Baird, and A. W. Moore, Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), p.134, 1999.
DOI : 10.1109/IJCNN.1999.832721

R. Munos and A. Moore, Variable resolution discretization for high-accuracy solutions of optimal control problems, International Joint Conference on Artificial Intelligence, p.105, 1999.

R. Neuneier and H. Zimmermann, How to Train Neural Networks, Neural Networks: Tricks of the Trade, pp.31-94, 1998.
DOI : 10.1002/0471725315

G. B. Orr and T. K. Leen, Weight space probability densities in stochastic learning: II. transients and basin hopping times, Advances in Neural Information Processing Systems 5, p.61, 1993.

G. B. Orr and T. K. Leen, Using curvature information for fast stochastic search, Advances in Neural Information Processing Systems 9, p.93, 1997.

. Michiel-van-de-panne, Control for simulated human and animal motion, Also published in proceedings of IFAC Workshop on Motion Control, pp.189-199, 1998.
DOI : 10.1016/S1367-5788(00)90035-X

S. Pareigis, Adaptive choice of grid and time in reinforcement learning, Advances in Neural Information Processing Systems 10, pp.1036-1042, 1998.

A. Barak and . Pearlmutter, Fast exact multiplication by the Hessian, Neural Computation, vol.6, pp.147-160, 1994.

H. William, S. A. Press, W. T. Teukolsky, B. P. Vetterling, and . Flannery, Numerical Recipes in C?The Art of Scientific Computing, p.85, 1992.

J. Randløv and P. Alstrøm, Learning to drive a bicycle using reinforcement learning and shaping, Machine Learning: Proceedings of the Fifteenth International Conference (ICML'98, p.117, 1998.

M. Riedmiller and H. Braun, A direct adaptive method for faster backpropagation learning: the RPROP algorithm, IEEE International Conference on Neural Networks, p.93, 1993.
DOI : 10.1109/ICNN.1993.298623

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal Representations by Error Propagation
DOI : 10.1016/B978-1-4832-1446-7.50035-2

. Mcclelland, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp.318-362, 1986.

J. C. Santamaría, R. S. Sutton, and A. Ram, Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces, Adaptive Behavior, vol.2, issue.5, pp.163-218, 1998.
DOI : 10.1177/105971239700600201

S. Warren and . Sarle, Neural Network FAQ Available via anonymous ftp from ftp

S. Schaal and C. G. Atkeson, Robot juggling: implementation of memory-based learning, IEEE Control Systems, vol.14, issue.1, pp.57-71, 1994.
DOI : 10.1109/37.257895

N. Nicol and . Schraudolph, Local gain adaptation in stochastic gradient descent, Proceedings of the 9th International Conference on Artificial Neural Networks, p.93, 1999.

J. R. Shewchuk, An introduction to the conjugate gradient method without the agonizing pain, 1994.

K. Sims, Evolving 3D Morphology and Behavior by Competition, Artificial Life IV Proceedings, pp.28-39, 1994.
DOI : 10.1145/964965.808571

K. Sims, Evolving virtual creatures, Proceedings of the 21st annual conference on Computer graphics and interactive techniques , SIGGRAPH '94, pp.15-22, 1994.
DOI : 10.1145/192161.192167
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.380.2734

M. Spong, The swing up control problem for the Acrobot, IEEE Control Systems Magazine, vol.15, issue.1, pp.49-55, 0105.
DOI : 10.1109/37.341864

R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, pp.9-44, 1988.
DOI : 10.1007/BF00115009

R. S. Sutton, Generalization in reinforcement learning: Successful examples using sparse coarse coding, Advances in Neural Information Processing Systems 8, pp.1038-1044, 1996.

S. Richard, A. G. Sutton, and . Barto, Reinforcement Learning: An Introduction, pp.74-76, 1998.

R. S. Sutton, D. Mcallester, S. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Advances in Neural Information Processing Systems 12, p.127, 1999.

G. Tesauro, Temporal difference learning and TD-Gammon, Communications of the ACM, vol.38, issue.3, pp.58-68, 1995.
DOI : 10.1145/203330.203343

E. Mitchell and . Timin, The robot auto racing simulator, p.147, 1995.

N. John and . Tsitsiklis, On the convergence of optimistic policy iteration, Journal of Machine Learning Research, vol.3, pp.59-72, 2002.

N. John, B. Tsitsiklis, and R. Van, Feature-based methods for large scale dynamic programming, Machine Learning, pp.59-94, 1996.

N. John, B. Tsitsiklis, and R. Van, An analysis of temporaldifference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1975.

N. Vladimir and . Vapnik, The Nature of Statistical Learning Theory, p.55, 1995.

L. Thomas, W. J. Vincent, and . Grantham, Nonlinear and Optimal Control Systems, 1997.

E. Scott, L. C. Weaver, M. M. Baird, and . Polycarpou, An analytical framework for local feedforward networks, IEEE Transactions on Neural NetworksECECS, vol.9, issue.3, pp.473-482, 1998.

E. Scott, L. C. Weaver, M. M. Baird, and . Polycarpou, Preventing unlearning during on-line training of feedforward networks, Proceedings of the International Symposium of Intelligent Control, p.128, 1998.

J. Yoshimoto, S. Ishii, and M. Sato, Application of reinforcement learning to balancing of Acrobot, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), pp.516-521, 1999.
DOI : 10.1109/ICSMC.1999.815605

W. Zhang and T. G. Dietterich, High-performance job-shop scheduling with a time-delay TD(?) network