Near-optimal BRL using optimistic local transitions, International Conference on Machine Learning (ICML), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00755270
A Bayesian sampling approach to exploration in reinforcement learning, Uncertainty in Artificial Intelligence (UAI), pp.19-26, 2009. ,
Approaching Bayes-optimalilty using Monte-Carlo tree search, International Conference on Automated Planning and Scheduling (ICAPS), 2011. ,
Finite time analysis of multiarmed bandit problems, Machine Learning, pp.235-256, 2002. ,
Dynamic Programming, 1957. ,
R-max -a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, vol.3, pp.213-231, 2003. ,
Open loop optimistic planning, Conference on Learning Theory (COLT), pp.477-489, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00943119
Online optimization in X-armed bandits, Neural Information Processing Systems (NIPS), pp.201-208, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00329797
Optimistic planning for markov decision processes, International Conference on Artificial Intelligence and Satistics (AISTATS), JMLR W & CP 22, pp.182-189, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00756736
Optimistic planning for sparsely stochastic systems, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp.48-55, 2011. ,
DOI : 10.1109/ADPRL.2011.5967375
URL : https://hal.archives-ouvertes.fr/hal-00830125
Smarter Sampling in Model-Based Bayesian Reinforcement Learning, Machine Learning and Knowledge Discovery in Databases, pp.200-214, 2010. ,
DOI : 10.1007/978-3-642-15880-3_19
Learning exploration/exploitation strategies for single trajectory reinforcement learning, European Workshop on Reinforcement Learning (EWRL), 2012. ,
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Computers and Games, pp.72-83, 2007. ,
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992
Bayesian Q-learning, National Conference on Artificial Intelligence, pp.761-768, 1998. ,
Tree Exploration for Bayesian RL Exploration, 2008 International Conference on Computational Intelligence for Modelling Control & Automation, pp.1029-1034, 2008. ,
DOI : 10.1109/CIMCA.2008.32
URL : http://arxiv.org/abs/0902.0392
Rollout sampling approximate policy iteration, Machine Learning, pp.157-171, 2008. ,
DOI : 10.1007/978-3-540-87479-9_6
URL : http://arxiv.org/abs/0805.2027
Optimal Learning: Computational procedures for Bayesadaptive Markov decision processes, 2002. ,
Dual control theory. Automation and Remote Control, pp.874-1039, 1960. ,
Modification of UCT with patterns in Monte-Carlo go, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00117266
Multiarmed Bandit Allocation Indices, 1989. ,
DOI : 10.1002/9780470980033
Efficient Bayes-adaptive reinforcement learning using sample-based search, Neural Information Processing Systems (NIPS), 2012. ,
Optimistic Planning of Deterministic Systems, Recent Advances in Reinforcement Learning, pp.151-164, 2008. ,
DOI : 10.1007/978-3-540-89722-4_12
URL : https://hal.archives-ouvertes.fr/hal-00830182
Theory of Financial Decision Making, 1987. ,
Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010. ,
Bandit Based Monte-Carlo Planning, Machine Learning: ECML 2006, pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
Near-Bayesian exploration in polynomial time, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.513-520, 2009. ,
DOI : 10.1145/1553374.1553441
Optimistic optimization of deterministic functions without the knowledge of its smoothness, Neural Information Processing Systems (NIPS), 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00830143
The optimistic principle applied to games, optimization and planning: Towards Foundations of Monte-Carlo Tree Search, 2012. ,
Optimal dynamic treatment regimes, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.34, issue.2, pp.331-366, 2003. ,
DOI : 10.1016/0270-0255(86)90088-6
Logarithmic online regret bounds for undiscounted reinforcement learning, Neural Information Processing Systems (NIPS), 2007. ,
Reinforcement learning for humanoid robotics, IEEE-RAS International Conference on Humanoid Robots, pp.1-20, 2003. ,
An analytic solution to discrete Bayesian reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.697-704, 2006. ,
DOI : 10.1145/1143844.1143932
Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, European Conference on Machine Learning (ECML), pp.317-328, 2005. ,
DOI : 10.1007/11564096_32
Monte-Carlo planning in large POMDPs, Neural Information Processing Systems (NIPS), 2010. ,
Variance-based rewards for approximate Bayesian reinforcement learning, Uncertainty in Artificial Intelligence, 2010. ,
A Bayesian framework for reinforcement learning, International Conference on Machine Learning (ICML), pp.943-950, 2000. ,
Learning to predict by the methods of temporal differences, Machine Learning, pp.9-44, 1988. ,
DOI : 10.1007/BF00115009
Integrating sample-based planning and model-based reinforcement learning, AAAI Conference on Artificial Intelligence (AAAI), 2010. ,
Bandit-based planning and learning in continuous-action Markov decision processes, International Conference on Automated Planning and Scheduling (ICAPS), 2012. ,