A. Anand, A. Grover, P. Mausam, and . Singla, ASAP-UCT: Abstraction of state-action pairs in UCT, Proc. of IJCAI, pp.1509-1515, 2015.

T. Jaksch, R. Ortner, and P. Auer, Near-optimal regret bounds for reinforcement learning, JMLR, vol.11, pp.1563-1600, 2010.

R. Ortner, Adaptive aggregation for reinforcement learning in average reward Markov decision processes, Annals of Operations Research, vol.208, issue.1, pp.321-336, 2013.