P. Auer, N. Cesa-bianchi, and P. Fischer, Finite time analysis of multiarmed bandit problems, Machine Learning, pp.235-256, 2002.

L. Busoniu and R. Munos, Optimistic planning for Markov decision processes, International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR W & CP 22, pp.182-189, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00756736

E. F. Camacho and C. Bordons, Model Predictive Control, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00683813

R. Coulom, Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Computers and Games, pp.72-83, 2007.
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992

B. Defourny, D. Ernst, and L. Wehenkel, Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees, Recent Advances in Reinforcement Learning -European Workshop on Reinforcement Learning (EWRL), pp.1-14, 2008.
DOI : 10.1007/978-3-540-89722-4_1

R. Fonteneau, L. Busoniu, and R. Munos, Optimistic planning for belief-augmented Markov Decision Processes, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2013.
DOI : 10.1109/ADPRL.2013.6614992
URL : https://hal.archives-ouvertes.fr/hal-00840202

S. Gelly, Y. Wang, R. Munos, and O. Teytaud, Modification of UCT with patterns in Monte- Carlo go, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00117266

P. E. Hart, N. J. Nilsson, and B. Raphael, A Formal Basis for the Heuristic Determination of Minimum Cost Paths, IEEE Transactions on Systems Science and Cybernetics, vol.4, issue.2, pp.100-107, 1968.
DOI : 10.1109/TSSC.1968.300136

J. F. Hren and R. Munos, Optimistic Planning of Deterministic Systems, Recent Advances in Reinforcement Learning, pp.151-164, 2008.
DOI : 10.1007/978-3-540-89722-4_12
URL : https://hal.archives-ouvertes.fr/hal-00830182

J. E. Ingersoll, Theory of Financial Decision Making, 1987.

M. Kearns, Y. Mansour, and A. Y. Ng, A sparse sampling algorithm for near-optimal planning in large Markov decision processes, Machine Learning, pp.193-208, 2002.

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, Machine Learning: ECML 2006, pp.282-293, 2006.
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296

R. Munos, From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, To appear in Foundations and Trends in Machine Learning, 2013.
DOI : 10.1561/2200000038
URL : https://hal.archives-ouvertes.fr/hal-00747575

S. A. Murphy, Optimal dynamic treatment regimes, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.34, issue.2, pp.331-366, 2003.
DOI : 10.1016/0270-0255(86)90088-6

J. Peters, S. Vijayakumar, and S. Schaal, Reinforcement learning for humanoid robotics, IEEE-RAS International Conference on Humanoid Robots, pp.1-20, 2003.

T. J. Walsh, S. Goschin, and M. L. Littman, Integrating sample-based planning and model-based reinforcement learning, AAAI Conference on Artificial Intelligence, 2010.