ASAP-UCT: Abstraction of state-action pairs in UCT, Proc. of IJCAI, pp.1509-1515, 2015. ,
Near-optimal regret bounds for reinforcement learning, JMLR, vol.11, pp.1563-1600, 2010. ,
Adaptive aggregation for reinforcement learning in average reward Markov decision processes, Annals of Operations Research, vol.208, issue.1, pp.321-336, 2013. ,