A New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC), Journal of Dynamic Systems, pp.220-227, 1975. ,

DOI : 10.1115/1.3426922

Strategy Learning with Multilayer Connectionist Representations, Proceedings of the Fourth International Workshop on Machine Learning, pp.103-114, 1987. ,

DOI : 10.1016/B978-0-934613-41-5.50014-3

Approximating a policy can be easier than approximating a value function, p.118, 2000. ,

Using local trajectory optimizers to speed up global optimization in dynamic programming, Advances in Neural Information Processing Systems 6, p.50, 1994. ,

A comparison of direct and model-based reinforcement learning, International Conference on Robotics and Automation, p.29, 1997. ,

Advantage updating Available from the Defense Technical Information Center, pp.22304-6145, 1993. ,

Residual algorithms: Reinforcement learning with function approximation, Machine Learning: Proceedings of the Twelfth International Conference, p.69, 1995. ,

Reinforcement learning with high-dimensional, continuous actions, Available from the Defense Technical Information Center, pp.22304-6145, 1993. ,

Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions on Information Theory, vol.39, issue.3, pp.930-945, 1966. ,

Learning to act using real-time dynamic programming, Artificial Intelligence, vol.72, issue.1-2, pp.81-138, 1995. ,

DOI : 10.1016/0004-3702(94)00011-O

URL : http://doi.org/10.1016/0004-3702(94)00011-o

Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, vol.13, issue.5, pp.835-846, 1983. ,

DOI : 10.1109/TSMC.1983.6313077

Experiments in parameter learning using temporal differences, ICCA Journal, vol.21, issue.22, pp.84-89, 1998. ,

Dynamic Programming, pp.14-35, 1957. ,

Dynamic Programming and Optimal Control, Athena Scientific, p.35, 1995. ,

Neuro-Dynamic Programming, Athena Scientific, vol.67, issue.74, p.114, 1996. ,

Neural Networks for Pattern Recognition, p.55, 1995. ,

Efficient reinforcement learning: model-based Acrobot control, Proceedings of International Conference on Robotics and Automation, pp.229-234, 1997. ,

DOI : 10.1109/ROBOT.1997.620043

Minimum-time control of the Acrobot, Proceedings of International Conference on Robotics and Automation, pp.3281-3287, 1997. ,

DOI : 10.1109/ROBOT.1997.606789

Generalization in reinforcement learning: Safely approximating the value function, Advances in Neural Information Processing Systems, p.72, 1995. ,

Elevator group control using multiple reinforcement learning agents, Machine Learning, pp.235-262, 1998. ,

The Convergence of TD(??) for General ??, Machine Learning, pp.341-362, 1992. ,

DOI : 10.1007/978-1-4615-3618-5_7

TD(?) converges with probability 1, Machine Learning, pp.295-301, 1994. ,

Temporal difference learning in continuous time and space, Advances in Neural Information Processing Systems, pp.1073-1079, 1996. ,

Reinforcement Learning in Continuous Time and Space, Neural Computation, vol.3, issue.1, pp.243-269, 2000. ,

DOI : 10.1109/9.580874

Higher order sliding modes in binary control systems, Soviet Physics, Doklady, vol.31, issue.4, pp.291-293, 1986. ,

An empirical study of learning speed in backpropagation networks, p.93, 1988. ,

Differential equations with discontinuous right-hand side, Trans. Amer. Math. Soc. Ser, vol.2, issue.42, pp.199-231, 1964. ,

DOI : 10.1090/trans2/042/13

Q-Learning in Continuous State and Action Spaces, Proceedings of 12th Australian Joint Conference on Artificial Intelligence, p.76, 1999. ,

DOI : 10.1007/3-540-46695-9_35

Stable Function Approximation in Dynamic Programming, Machine Learning: Proceedings of the Twelfth International Conference, pp.261-268, 1995. ,

DOI : 10.1016/B978-1-55860-377-6.50040-2

Obtaining minimum energy biped walking gaits with symbolic models and numerical optimal control, Workshop?Biomechanics meets Robotics , Modelling and Simulation of Motion, p.27, 1999. ,

Dynamic Programming and Markov Processes, p.41, 1960. ,

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms, Neural Computation, vol.8, issue.6, pp.1185-1201, 1994. ,

DOI : 10.1214/aoms/1177729586

Reinforcement learning: A survey, Journal of Artificial Intelligence Research, vol.4, issue.11, pp.237-285, 1996. ,

Multiple state estimation reinforcement learning for driving model: driver model of automobile, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), pp.504-509, 1999. ,

DOI : 10.1109/ICSMC.1999.815603

Real-time collision avoidance against wrong drivers: Differential game approach, numerical solution , and synthesis of strategies with neural networks, Proceedings of the Seventh International Symposium on Dynamic Games and Applications, p.28, 1996. ,

Learning processes in an asymmetric threshold network, Disordered Systems and Biological Organization, pp.233-240, 1986. ,

Efficient BackProp, Neural Networks: Tricks of the Trade, pp.59-61, 1998. ,

Evolutionary Approaches to Neural Control of Rolling, Walking, Swimming and Flying Animats or Robots, Biologically Inspired Robot Behavior Engineering, p.28 ,

DOI : 10.1007/978-3-7908-1775-1_1

URL : https://hal.archives-ouvertes.fr/hal-00655476

A scaled conjugate gradient algorithm for fast supervised learning, Neural Networks, vol.6, issue.59, pp.525-533, 1993. ,

Fast Learning in Networks of Locally-Tuned Processing Units, Neural Computation, vol.1, issue.2, pp.281-294, 1989. ,

DOI : 10.1109/MASSP.1987.1165576

Hierarchical reinforcement learning of low-dimensional subgoals and high-dimensional trajectories, Proceedings of the Fifth International Conference on Neural Information Processing, pp.850-853, 1998. ,

Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Proceedings of 17th International Conference on Machine Learning, pp.623-630, 2000. ,

DOI : 10.1016/S0921-8890(01)00113-0

A convergent reinforcement learning algorithm in the continuous case based on a finite difference method, International Joint Conference on Artificial Intelligence, p.50, 1997. ,

Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), p.134, 1999. ,

DOI : 10.1109/IJCNN.1999.832721

Variable resolution discretization for high-accuracy solutions of optimal control problems, International Joint Conference on Artificial Intelligence, p.105, 1999. ,

How to Train Neural Networks, Neural Networks: Tricks of the Trade, pp.31-94, 1998. ,

DOI : 10.1002/0471725315

Weight space probability densities in stochastic learning: II. transients and basin hopping times, Advances in Neural Information Processing Systems 5, p.61, 1993. ,

Using curvature information for fast stochastic search, Advances in Neural Information Processing Systems 9, p.93, 1997. ,

Control for simulated human and animal motion, Also published in proceedings of IFAC Workshop on Motion Control, pp.189-199, 1998. ,

DOI : 10.1016/S1367-5788(00)90035-X

Adaptive choice of grid and time in reinforcement learning, Advances in Neural Information Processing Systems 10, pp.1036-1042, 1998. ,

Fast exact multiplication by the Hessian, Neural Computation, vol.6, pp.147-160, 1994. ,

Numerical Recipes in C?The Art of Scientific Computing, p.85, 1992. ,

Learning to drive a bicycle using reinforcement learning and shaping, Machine Learning: Proceedings of the Fifteenth International Conference (ICML'98, p.117, 1998. ,

A direct adaptive method for faster backpropagation learning: the RPROP algorithm, IEEE International Conference on Neural Networks, p.93, 1993. ,

DOI : 10.1109/ICNN.1993.298623

Learning Internal Representations by Error Propagation ,

DOI : 10.1016/B978-1-4832-1446-7.50035-2

Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp.318-362, 1986. ,

Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces, Adaptive Behavior, vol.2, issue.5, pp.163-218, 1998. ,

DOI : 10.1177/105971239700600201

Neural Network FAQ Available via anonymous ftp from ftp ,

Robot juggling: implementation of memory-based learning, IEEE Control Systems, vol.14, issue.1, pp.57-71, 1994. ,

DOI : 10.1109/37.257895

Local gain adaptation in stochastic gradient descent, Proceedings of the 9th International Conference on Artificial Neural Networks, p.93, 1999. ,

An introduction to the conjugate gradient method without the agonizing pain, 1994. ,

Evolving 3D Morphology and Behavior by Competition, Artificial Life IV Proceedings, pp.28-39, 1994. ,

DOI : 10.1145/964965.808571

Evolving virtual creatures, Proceedings of the 21st annual conference on Computer graphics and interactive techniques , SIGGRAPH '94, pp.15-22, 1994. ,

DOI : 10.1145/192161.192167

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.380.2734

The swing up control problem for the Acrobot, IEEE Control Systems Magazine, vol.15, issue.1, pp.49-55, 0105. ,

DOI : 10.1109/37.341864

Learning to predict by the methods of temporal differences, Machine Learning, pp.9-44, 1988. ,

DOI : 10.1007/BF00115009

Generalization in reinforcement learning: Successful examples using sparse coarse coding, Advances in Neural Information Processing Systems 8, pp.1038-1044, 1996. ,

Reinforcement Learning: An Introduction, pp.74-76, 1998. ,

Policy gradient methods for reinforcement learning with function approximation, Advances in Neural Information Processing Systems 12, p.127, 1999. ,

Temporal difference learning and TD-Gammon, Communications of the ACM, vol.38, issue.3, pp.58-68, 1995. ,

DOI : 10.1145/203330.203343

The robot auto racing simulator, p.147, 1995. ,

On the convergence of optimistic policy iteration, Journal of Machine Learning Research, vol.3, pp.59-72, 2002. ,

Feature-based methods for large scale dynamic programming, Machine Learning, pp.59-94, 1996. ,

An analysis of temporaldifference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1975. ,

The Nature of Statistical Learning Theory, p.55, 1995. ,

Nonlinear and Optimal Control Systems, 1997. ,

An analytical framework for local feedforward networks, IEEE Transactions on Neural NetworksECECS, vol.9, issue.3, pp.473-482, 1998. ,

Preventing unlearning during on-line training of feedforward networks, Proceedings of the International Symposium of Intelligent Control, p.128, 1998. ,

Application of reinforcement learning to balancing of Acrobot, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), pp.516-521, 1999. ,

DOI : 10.1109/ICSMC.1999.815605

High-performance job-shop scheduling with a time-delay TD(?) network ,