S. M. Valle, Planning Algorithms, 2006.

M. J. Kearns, Y. Mansour, and A. Y. Ng, A sparse sampling algorithm for near-optimal planning in large Markov decision processes, Machine Learning, pp.193-208, 2002.

L. Péret and F. Garcia, Online Resolution Techniques, Markov Decision Processes in Artificial Intelligence, pp.153-183, 2010.
DOI : 10.1002/9781118557426.ch6

B. Defourny, D. Ernst, and L. Wehenkel, Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees, Recent Advances in Reinforcement Learning, ser. Lecture Notes in Computer Science, pp.1-14, 2008.
DOI : 10.1007/978-3-540-89722-4_1

J. M. Maciejowski, Predictive Control with Constraints, 2002.

E. F. Camacho and C. Bordons, Model Predictive Control, 2004.
DOI : 10.1002/oca.2167

URL : https://hal.archives-ouvertes.fr/hal-00683813

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, pp.235-256, 2002.

P. Coquelin and R. Munos, Bandit algorithms for tree search, Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI-07), pp.19-22, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00150207

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, Online optimization in X-armed bandits, Advances in Neural Information Processing Systems 21, pp.201-208, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00329797

M. Likhachev, G. J. Gordon, and S. Thrun, Planning for Markov decision processes with sparse stochasticity, Advances in Neural Information Processing Systems 17, 2004.

H. S. Chang, M. C. Fu, J. Hu, and S. I. Marcus, Simulation-Based Algorithms for Markov Decision Processes, 2007.

J. Hren and R. Munos, Optimistic Planning of Deterministic Systems, Proceedings 8th European Workshop on Reinforcement Learning (EWRL-08), pp.151-164, 2008.
DOI : 10.1007/978-3-540-89722-4_12

URL : https://hal.archives-ouvertes.fr/hal-00830182

B. Defourny, D. Ernst, and L. Wehenkel, Planning under uncertainty, ensembles of disturbance trees and kernelized discrete action spaces, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp.145-152, 2009.
DOI : 10.1109/ADPRL.2009.4927538

S. Bubeck and R. Munos, Open loop optimistic planning, Proceedings 23rd Annual Conference on Learning Theory (COLT-10), pp.27-29, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00943119

D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, vol.2, 2007.

. Cs and . Szepesvári, Algorithms for Reinforcement Learning, 2010.

L. Bus¸oniubus¸oniu, R. Babu?ka, B. D. Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators, ser. Automation and Control Engineering, 2010.

L. Bus¸oniubus¸oniu, D. Ernst, B. D. Schutter, and R. Babu?ka, Approximate dynamic programming with a fuzzy parameterization, Automatica, vol.46, issue.5, pp.804-814, 2010.
DOI : 10.1016/j.automatica.2010.02.006

B. Adams, H. Banks, H. Kwon, and H. Tran, Dynamic multidrug therapies for HIV: Optimal and STI control approaches, Mathematical Biosciences and Engineering, vol.1, issue.2, pp.223-241, 2004.

J. Lisziewicz, E. Rosenberg, and J. Liebermann, Control of HIV despite the Discontinuation of Antiretroviral Therapy, New England Journal of Medicine, vol.340, issue.21, pp.1683-1684, 1999.
DOI : 10.1056/NEJM199905273402114

D. Ernst, G. Stan, J. Gonçalves, and L. Wehenkel, Clinical data based optimal STI strategies for HIV: a reinforcement learning approach, Proceedings of the 45th IEEE Conference on Decision and Control, pp.13-15, 2006.
DOI : 10.1109/CDC.2006.377527

URL : https://hal.archives-ouvertes.fr/hal-00121732

L. Bus¸oniubus¸oniu, D. Ernst, B. D. Schutter, and R. Babu?ka, Cross-entropy optimization of control policies with adaptive basis functions, IEEE Transactions on Systems, Man, and Cybernetics?Part B: Cybernetics, vol.41, issue.1, 2011.