Finite time analysis of multiarmed bandit problems, Machine Learning, pp.235-256, 2002. ,
Optimistic planning for Markov decision processes, International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR W & CP 22, pp.182-189, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00756736
Model Predictive Control, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00683813
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Computers and Games, pp.72-83, 2007. ,
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992
Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees, Recent Advances in Reinforcement Learning -European Workshop on Reinforcement Learning (EWRL), pp.1-14, 2008. ,
DOI : 10.1007/978-3-540-89722-4_1
Optimistic planning for belief-augmented Markov Decision Processes, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2013. ,
DOI : 10.1109/ADPRL.2013.6614992
URL : https://hal.archives-ouvertes.fr/hal-00840202
Modification of UCT with patterns in Monte- Carlo go, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00117266
A Formal Basis for the Heuristic Determination of Minimum Cost Paths, IEEE Transactions on Systems Science and Cybernetics, vol.4, issue.2, pp.100-107, 1968. ,
DOI : 10.1109/TSSC.1968.300136
Optimistic Planning of Deterministic Systems, Recent Advances in Reinforcement Learning, pp.151-164, 2008. ,
DOI : 10.1007/978-3-540-89722-4_12
URL : https://hal.archives-ouvertes.fr/hal-00830182
Theory of Financial Decision Making, 1987. ,
A sparse sampling algorithm for near-optimal planning in large Markov decision processes, Machine Learning, pp.193-208, 2002. ,
Bandit Based Monte-Carlo Planning, Machine Learning: ECML 2006, pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, To appear in Foundations and Trends in Machine Learning, 2013. ,
DOI : 10.1561/2200000038
URL : https://hal.archives-ouvertes.fr/hal-00747575
Optimal dynamic treatment regimes, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.34, issue.2, pp.331-366, 2003. ,
DOI : 10.1016/0270-0255(86)90088-6
Reinforcement learning for humanoid robotics, IEEE-RAS International Conference on Humanoid Robots, pp.1-20, 2003. ,
Integrating sample-based planning and model-based reinforcement learning, AAAI Conference on Artificial Intelligence, 2010. ,