P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

A. Auger, J. Bader, D. Brockhoff, and E. Zitzler, Theory of the hypervolume indicator, Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms, FOGA '09, pp.87-102, 2009.
DOI : 10.1145/1527125.1527138
URL : https://hal.archives-ouvertes.fr/inria-00430540

L. Barrett and S. Narayanan, Learning all optimal policies with multiple criteria, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.41-47, 2008.
DOI : 10.1145/1390156.1390162
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.2715

V. Berthier, H. Doghmen, and O. Teytaud, Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search, LION4, pp.111-124, 2010.
DOI : 10.1007/978-3-642-13800-3_9
URL : https://hal.archives-ouvertes.fr/inria-00437146

N. Beume, B. Naujoks, and M. Emmerich, SMS-EMOA: Multiobjective selection based on dominated hypervolume, European Journal of Operational Research, vol.181, issue.3, pp.1653-1669, 2007.
DOI : 10.1016/j.ejor.2006.08.008

N. Beume, C. M. Fonseca, M. Lopez-ibanez, L. Paquete, and J. Vahrenhold, On the Complexity of Computing the Hypervolume Indicator, IEEE Transactions on Evolutionary Computation, vol.13, issue.5, pp.1075-1082, 2009.
DOI : 10.1109/TEVC.2009.2015575

G. Chaslot, L. Chatriot, S. Fiter, J. B. Gelly, J. Hoock et al., Combining expert, offline, transient and online knowledge in monte-carlo exploration, 2008.

K. Chatterjee, Markov Decision Processes with Multiple Long-Run Average Objectives, FSTTCS Foundations of Software Technology and Theoretical Computer Science, vol.4855, pp.473-484, 2007.
DOI : 10.1007/978-3-540-77050-3_39
URL : http://arxiv.org/abs/1104.3489

P. Ciancarini and G. P. Favini, Monte-Carlo Tree Search techniques in the game of kriegspiel, IJCAI'09, pp.474-479, 2009.

P. A. Coquelin and R. Munos, Bandit algorithms for tree search. arXiv preprint cs/0703062, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00150207

R. Coulom, Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proc. Computers and Games, pp.72-83, 2006.
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992

K. Deb, Multi-objective optimization using evolutionary algorithms, pp.55-58, 2001.

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, A fast elitist non-dominated sorting genetic algorithm for multiobjective optimization: NSGA-II, PPSN VI, pp.849-858, 1917.

K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, Scalable multi-objective optimization test problems, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600), pp.825-830, 2002.
DOI : 10.1109/CEC.2002.1007032
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.7531

M. Fleischer, The Measure of Pareto Optima Applications to Multi-objective Metaheuristics, EMO'03, pp.519-533, 2003.
DOI : 10.1007/3-540-36970-8_37

Z. Gábor, Z. Kalmár, and C. Szepesvári, Multi-criteria reinforcement learning, ICML'98, pp.197-205, 1998.

S. Gelly and D. Silver, Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007.
DOI : 10.1145/1273496.1273531
URL : https://hal.archives-ouvertes.fr/inria-00164003

N. Hansen, The cma evolution strategy: a comparing review. Towards a new evolutionary computation, pp.75-102, 2006.

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, pp.282-293, 2006.
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296

D. J. Lizotte, M. Bowling, and S. A. Murphy, Linear fitted-q iteration with multiple reward functions, Journal of Machine Learning Research, vol.13, pp.3253-3295, 2012.

F. Maes, L. Wehenkel, and D. Ernst, Automatic Discovery of Ranking Formulas for Playing with Multi-armed Bandits, Recent Advances in Reinforcement Learning -9th European Workshop, pp.5-17, 2011.
DOI : 10.1007/978-3-642-29946-9_5

S. Mannor and N. Shimkin, A geometric approach to multi-criterion reinforcement learning, Journal of Machine Learning Research, pp.325-360, 2004.

H. Nakhost and M. Müller, Monte-Carlo exploration for deterministic planning, IJCAI'09, pp.1766-1771, 2009.

S. Natarajan and P. Tadepalli, Dynamic preferences in multi-criteria reinforcement learning, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102427

C. H. Papadimitriou and M. Yannakakis, On the approximability of trade-offs and optimal access of Web sources, Proceedings 41st Annual Symposium on Foundations of Computer Science, pp.86-92, 2000.
DOI : 10.1109/SFCS.2000.892068

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

C. Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010.
DOI : 10.2200/S00268ED1V01Y201005AIM009

G. Tesauro, R. Das, H. Chan, J. Kephart, D. Levine et al., Managing power consumption and performance of computing systems using reinforcement learning, NIPS'07, pp.1-8, 2007.

J. D. Ullman, NP-complete scheduling problems, Journal of Computer and System Sciences, vol.10, issue.3, pp.384-393, 1975.
DOI : 10.1016/S0022-0000(75)80008-0
URL : http://doi.org/10.1016/s0022-0000(75)80008-0

P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker, Empirical evaluation methods for multiobjective reinforcement learning algorithms, Machine Learning, vol.7, issue.2, pp.51-80, 2010.
DOI : 10.1007/s10994-010-5232-5

D. A. Van-veldhuizen, Multiobjective evolutionary algorithms: classifications, analyses, and new innovations, 1999.

W. Wang and M. Sebag, Multi-objective Monte-Carlo Tree Search, Asian Conference on Machine Learning, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758379

Y. Wang and S. Gelly, Modifications of UCT and sequence-like simulations for Monte-Carlo Go, 2007 IEEE Symposium on Computational Intelligence and Games, pp.175-182, 2007.
DOI : 10.1109/CIG.2007.368095

Y. Wang, J. Audibert, and R. Munos, Algorithms for infinitely many-armed bandits, NIPS'08, pp.1-8, 2008.

J. Yu, R. Buyya, and K. Ramamohanarao, Workflow Scheduling Algorithms for Grid Computing, Studies in Computational Intelligence, vol.146, pp.173-214, 2008.
DOI : 10.1007/978-3-540-69277-5_7

E. Zitzler and L. Thiele, Multiobjective optimization using evolutionary algorithms ??? A comparative case study, PPSN V, pp.292-301, 1998.
DOI : 10.1007/BFb0056872