, a result, the following holds: Pr (X 1 ? m ? X 2 ? m) = 1 ? (1 ? p 1 ) m + 1 ? (1 ? p 2 ) m ? (1 ? (1 ?

, which imply that the probability for Equation A.20 to hold is at least equal to the probability of sampling both tasks. Formally, Pr D max (s, a) + ? D max (s, a) ? Pr, Equation A.20 holds if M 1 and M 2 have been sampled during the first m trials

D. Pr and . Max, s, a) + ? D max (s, a) ? 1 ? ?, which concludes the proof

. Abbasi, P. L. Yasin, V. Bartlett, Y. Kanade, C. Seldin et al., Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions, Advances in Neural Information Processing Systems, vol.27, p.64, 2013.

S. Abdallah and M. Kaisers, Addressing Environment Non-stationarity by Repeating Q-learning Updates, Journal of Machine Learning Research, vol.17, issue.1, p.64, 2016.

D. Abel, Y. Jinnai, S. Y. Guo, G. Konidaris, and M. L. Littman, Policy and Value Transfer in Lifelong Reinforcement Learning, Proceedings of the 35th International Conference on Machine Learning (ICML 2018), vol.99, pp.116-118, 2018.

H. Ammar, E. Bou, M. E. Eaton, C. Taylor, K. Mocanu et al., An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning, Workshops at the 28th AAAI Conference on Artificial Intelligence (AAAI, 2014.

M. Araya-lópez, V. Thomas, O. Buffet, and F. Charpillet, A Closer Look at MOMDPs, Proceedings of the 22nd International Conference on Tools with Artificial Intelligence (ICTAI 2010), vol.2, p.65, 2010.

K. Asadi, D. Misra, and M. L. Littman, Lipschitz Continuity in Model-Based Reinforcement Learning, Proceedings of the 35th International Conference on Machine Learning, p.70, 2018.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time Analysis of the Multiarmed Bandit Problem, Machine Learning, vol.47, pp.235-256, 2002.

D. Auger, A. Couëtoux, and O. Teytaud, Continuous Upper Confidence Trees with Polynomial Exploration -Consistency, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2013, p.36, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00835352

C. Baker, G. Ramchurn, L. Teacy, and N. Jennings, Factored Monte-Carlo Tree Search for Coordinating UAVs in Disaster Response, p.36, 2016.

T. Banerjee, M. Liu, and J. P. How, Quickest Change Detection Approach to Optimal Control in Markov Decision Processes with Model Changes, 2017 American Control Conference (ACC). IEEE, p.66, 2017.

M. G. Bellemare, W. Dabney, and R. Munos, A Distributional Perspective on Reinforcement Learning, Proceedings of the 34th International Conference on Machine Learning, p.47, 2017.

M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, The Arcade Learning Environment: An Evaluation Platform for General Agents, In: Journal of Artificial Intelligence Research, vol.47, p.115, 2013.

R. Bellman, Dynamic Programming, 1957.

. Bibliography,

V. Boutin, A. Franciosini, F. Ruffier, and L. Perrinet, Meaningful representations emerge from Sparse Deep Predictive Coding, p.126, 2019.

B. Bouzy, M. Métivier, and D. Pellier, MCTS Experiments on the Voronoi Game, Advances in Computer Games: 13th International Conference, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01365573

. Springer, , p.36

C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling et al., A survey of Monte Carlo Tree Search methods, IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG) 4.1, pp.1-43, 2012.

E. Brunskill and L. Li, Sample Complexity of Multi-task Reinforcement Learning, Proceedings of the 29th conference on Uncertainty in Artificial Intelligence (UAI 2013, 2013.

, PAC-inspired Option Discovery in Lifelong Reinforcement Learning, Proceedings of the 31st International Conference on Machine Learning, pp.316-324, 2014.

S. Bubeck and R. Munos, Open Loop Optimistic Planning, Proceedings of the 23rd Conference on Learning Theory (COLT 2010) (cit. on pp. 22, vol.36, p.42, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00943119

J. L. Carroll, Task Localization, Similarity, and Transfer; Towards a Reinforcement Learning Task Library System, p.93, 2005.

J. L. Carroll and K. Seppi, Task Similarity Measures for Transfer in Reinforcement Learning Task Libraries, Proceedings of the 5th International Joint Conference on Neural Networks (IJCNN 2005), vol.2, p.93, 2005.

C. P. Chanel and . Carvalho, Planification de perception et de mission en environnement incertain: Application à la détection et à la reconnaissance de cibles par un hélicoptère autonome, p.65, 2013.

G. Chaslot, M. Winands, J. Uiterwijk, H. Van-den, B. Herik et al., Progressive Strategies for Monte-Carlo Tree Search, Proceedings of the 10th Joint Conference on Information Sciences (JCIS 2007), p.41, 2007.

S. P. Choi, N. L. Dit-yan-yeung, and . Zhang, Hidden-Mode Markov Decision Processes, IJCAI Workshop on Neural, Symbolic, and Reinforcement Methods for Sequence Learning, p.64, 1999.

S. P. Choi, . Dit-yan, N. L. Yeung, and . Zhang, Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making, Sequence Learning, p.64, 2000.

S. Choi, N. L. Ping-man, D. Zhang, and . Yeung, Solving Hidden-Mode Markov Decision Problems, Proceedings of the 8th International Conference on Artificial Intelligence and Statistics, p.65, 2001.

J. Chung, . Jen, R. J. Nicholas, S. Lawrance, and . Sukkarieh, Learning to soar: Resource-constrained exploration in reinforcement learning, International Journal of Robotics Research, vol.34, issue.2, p.69, 2015.

A. Couëtoux, J. Hoock, N. Sokolovska, O. Teytaud, and N. Bonnard, Continuous Upper Confidence Trees, Proceedings of the 4th International Conference on Learning and Intelligent Optimization, vol.41, p.35, 2011.

R. Coulom, Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proceedings of the 5th International Conference on Computer and Games, p.35, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00116992

, Computing Elo Ratings of Move Patterns in the Game of Go, Computer Games Workshop, p.41, 2007.

B. Csáji, L. Csanád, and . Monostori, Value Function Based Reinforcement Learning in Changing Markovian Environments, Journal of Machine Learning Research 9, p.63, 2008.

D. Silva, C. Bruno, W. Eduardo, A. L. Basso, P. M. Bazzan et al., Dealing with Non-Stationary Environments using Context Detection, Proceedings of the 23rd International Conference on Machine Learning, vol.66, p.65, 2006.

W. Dabney, M. Rowland, M. G. Bellemare, and R. Munos, Distributional Reinforcement Learning with Quantile Regression, Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018) (cit, p.70, 2018.

C. Dann, T. Lattimore, and E. Brunskill, Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning, Advances in Neural Information Processing Systems, vol.30, p.121, 2017.

D. Maesschalck, D. Roy, D. L. Jouan-rimbaud, and . Massart, The Mahalanobis distance, Chemometrics and Intelligent Laboratory Systems 50.1, p.53, 2000.

D. C. Dennett, Why the Law of Effect will not Go Away, Journal for the Theory of Social Behaviour, vol.5, p.17, 1975.

T. Dick, A. Gyorgy, and C. Szepesvari, Online Learning in Markov Decision Processes with Changing Cost Sequences, Proceedings of the 31st International Conference on Machine Learning, p.63, 2014.

K. Doya, K. Samejima, K. Katagiri, and M. Kawato, Multiple Model-based Reinforcement Learning, Neural Computation 14, vol.6, p.65, 2002.

M. Enzenberger, M. Muller, B. Arneson, and R. Segal, Fuego -an Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search, IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG) 2.4, p.35, 2010.

. Even-dar, . Eyal, M. Sham, Y. Kakade, and . Mansour, Online Markov Decision Processes, Mathematics of Operations Research, vol.34, p.63, 2009.

Z. Feldman and C. Domshlak, Monte-Carlo Tree Search: To MC or to DP?, In: Proceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), p.36, 2014.

A. Fern and P. Lewis, Ensemble Monte-Carlo Planning: An Empirical Study, Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS, p.37, 2011.

. Bibliography,

F. Fernández and M. Veloso, Probabilistic Policy Reuse in a Reinforcement Learning Agent, Proceedings of the 5th International Conference on Autonomous Agents and Multiagent Systems, p.93, 2006.

N. Ferns, P. Panangaden, and D. Precup, Metrics for Finite Markov Decision Processes, Proceedings of the 20th conference on Uncertainty in Artificial Intelligence (UAI, pp.162-169, 2004.

D. Fudenberg and J. Tirole, Game Theory, vol.393, p.77, 1991.

S. Gelly and D. Silver, Combining Online and Offline Knowledge in UCT, Proceedings of the 24th International Conference on Machine Learning, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00164003

, Achieving Master Level Play in 9 x 9 Computer Go, Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI, vol.8, p.35, 2008.

, Monte-Carlo Tree Search and Rapid Action Value Estimation in Computer Go, Artificial Intelligence 175.11, p.36, 2011.

M. Ghallab, D. Nau, and P. Traverso, The Actor's View of Automated Planning and Acting: A Position Paper, Artificial Intelligence, vol.208, p.126, 2014.

, Automated Planning and Acting, 2016.

E. Hadoux, Markovian sequential decision-making in non-stationary environments: application to argumentative debates, p.66, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01298214

E. Hadoux, A. Beynier, and P. Weng, Sequential Decision-Making under Non-stationary Environments via Sequential Change-point Detection, Learning over Multiple Contexts (LMCE): workshop at the the 13th European Conference on Machine Learning, p.66, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01200817

, Solving Hidden-Semi-Markov-Mode Markov Decision Problems, Proceedings of the 8th International Conference on Scalable Uncertainty Management, 2014.

. Springer, , p.65

A. Hallak, D. D. Castro, and S. Mannor, Contextual Markov Decision Processes, Proceedings of the 12th European Workshop on Reinforcement Learning, p.96, 2015.

M. Heusner, UCT for Pac-Man, vol.39, p.37, 2011.

J. Hostetler, A. Fern, and T. Dietterich, State Aggregation in Monte Carlo Tree Search, Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2014) (cit, p.36, 2014.

R. A. Howard, Semi-Markovian Decision Processes, Proceedings of International Statistical Institute, p.68, 1963.

G. N. Iyengar, Robust Dynamic Programming, Mathematics of Operations Research, vol.30, pp.257-280, 2005.

R. Jaulmes, J. Pineau, and D. Precup, Learning in non-stationary Partially Observable Markov Decision Processes, ECML Workshop on Reinforcement Learning in non-stationary environments, vol.25, p.65, 2005.

L. Kaelbling, M. L. Pack, A. R. Littman, and . Cassandra, Planning and acting in partially observable stochastic domains, Artificial Intelligence, vol.101, issue.1, pp.99-134, 1998.

Z. Kalmár, C. Szepesvári, and A. L?rincz, Module-Based Reinforcement Learning: Experiments with a Real Robot, Autonomous Robots, vol.5, issue.3-4, p.63, 1998.

M. N. Katehakis and A. F. Veinott, The Multi-Armed Bandit Problem: Decomposition and Computation, In: Mathematics of Operations Research, vol.12, issue.2, p.25, 1987.

T. Keller and P. Eyerich, PROST: Probabilistic planning based on UCT, Proceedings of the 22nd International Conference on Automated Planning and Scheduling (ICAPS 2012, vol.36, p.35, 2012.

T. Keller and M. Helmert, Trial-Based Heuristic Tree Search for Finite Horizon MDPs, Proceedings of the 23rd International Conference on Automated Planning and Scheduling (ICAPS 2013) (cit. on pp. 22, vol.35, p.42, 2013.

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-Armed Bandits in Metric Spaces, Proceedings of the 40th annual ACM symposium on Theory of computing, p.70, 2008.

L. Kocsis and C. Szepesvári, Bandit Based Monte Carlo Planning, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2006), vol.6, pp.282-293, 2006.

A. Kok, E. W. Leendert, J. M. Hans, and . Schutten, Vehicle routing under time-dependent travel times: The impact of congestion avoidance, Computers, vol.39, p.69, 2012.

G. Konidaris, I. Scheidwasser, and A. Barto, Transfer in Reinforcement Learning via Shared Features, Journal of Machine Learning Research 13, p.94, 2012.

A. Lazaric, Transfer in Reinforcement Learning: a Framework and a Survey, Reinforcement Learning, p.93, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00772626

A. Lazaric, M. Restelli, and A. Bonarini, Transfer of Samples in Batch Reinforcement Learning, Proceedings of the 25th International Conference on Machine Learning, pp.544-551, 2008.

E. Lecarpentier, D. Abel, K. Asadi, Y. Jinnai, E. Rachelson et al., Lipschitz Lifelong Reinforcement Learning, p.3, 2020.

E. Lecarpentier, G. Infantes, C. Lesire, and E. Rachelson, Open Loop Execution of Tree Search Algorithms, Proceedings of the 27th International Joint Conferences on Artificial Intelligence (IJCAI, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01840583

E. Lecarpentier and E. Rachelson, Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning, Ad-Bibliography vances in Neural Information Processing Systems, vol.32, pp.7214-7223, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02882205

E. Lecarpentier, S. Rapp, M. Melo, and E. Rachelson, Empirical evaluation of a Q-Learning Algorithm for Model-free Autonomous Soaring, 12èmes Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01576344

J. Levine, C. Bates-congdon, M. Ebner, G. Kendall, S. M. Lucas et al., General Video Game Playing, Artificial and Computational Intelligence in Games 6, p.36, 2013.

S. Lim, H. Hong, S. Xu, and . Mannor, Reinforcement Learning in Robust Markov Decision Processes, Advances in Neural Information Processing Systems, vol.27, p.63, 2013.

W. Lotter, G. Kreiman, and D. Cox, Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning, p.126, 2016.

M. M. Mahmud, M. Hawasly, B. Rosman, and S. Ramamoorthy, , 2013.

, Clustering Markov Decision Processes for Continual Transfer, vol.121, p.94

C. Mansley, A. Weinstein, and M. L. Littman, Sample-Based Planning for Continuous Action Markov Decision Processes, Proceedings of the 21st International Conference on Automated Planning and Scheduling, p.35, 2011.

R. Munos, From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Foundations and Trends in Machine Learning, p.70, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00747575

J. Neyman, X-outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability, Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, vol.236, p.161, 1937.

S. C. Ong, D. Shao-wei-png, W. Hsu, and . Lee, POMDPs for Robotic Tasks with Mixed Observability, Proceedings of the 5th Robotics: Science and Systems, vol.5, p.65, 2009.

J. Pazis and R. Parr, PAC Optimal Exploration in Continuous Space Markov Decision Processes, Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI 2013) (cit, p.70, 2013.

J. Pazis, R. E. Parr, and J. P. How, Improving PAC Exploration using the Median of Means, Advances in Neural Information Processing Systems, vol.29, p.121, 2016.

D. Perez, P. Rohlfshagen, and S. M. Lucas, Monte Carlo Tree Search: Long-term Versus Short-term Planning, Proceedings of the 8th Computational Intelligence and Games conference, pp.219-226, 2012.

, The Physical Travelling Salesman Problem: WCCI 2012 Competition, Proceedings of the 14th Congress on Evolutionary Computation, p.53, 2012.

P. Liebana, J. Diego, M. Dieskau, S. Hunermund, S. Mostaghim et al., Open Loop Search for General Video Game Playing, Proceedings of the 17th, 2015.

, Genetic and Evolutionary Computation Conference, vol.36, p.37, 2015.

P. Liebana, S. Diego, J. Samothrakis, T. Togelius, S. M. Schaul et al., , 2014.

, General Video Game Playing Competition, IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG) 8.3, p.36

M. Pirotta, M. Restelli, and L. Bascetta, Policy gradient in Lipschitz Markov Decision Processes, Machine Learning, vol.100, p.70, 2015.

E. J. Powley, I. Peter, D. Cowling, and . Whitehouse, Memory Bounded Monte-Carlo Tree Search, Proceedings of the 13th Artificial Intelligence and Interactive Digital Entertainment Conference, p.37, 2017.

K. J. Prabuchandran, N. Singh, P. Dayama, and V. Pandit, Change Point Detection for Compositional Multivariate Data, p.66, 2019.

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, vol.15, pp.67-69, 2014.

E. Rachelson and M. G. Lagoudakis, On the Locality of Action Domination in Sequential Decision Making, Proceedings of the 11th International Symposium on Artificial Intelligence and Mathematics, 2010.

K. Rao and S. Whiteson, V-MAX: Tempered Optimism for Better PAC Reinforcement Learning, Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), p.121, 2012.

A. Rimmel and F. Teytaud, Multiple overlapping tiles for contextual Monte Carlo tree search, Proceedings of the 1st International Conference on the Applications of Evolutionary Computation, p.35, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00456422

W. Schultz, Neuronal Reward and Decision Signals: from Theories to Data, Physiological Reviews, vol.95, p.7, 2015.

O. Sigaud and F. Stulp, Policy search in continuous action domains: An overview, Neural Networks (cit, p.28, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02182466

D. L. Silver, Q. Yang, and L. Li, Lifelong Machine Learning Systems: Beyond Learning Algorithms, AAAI Spring Symposium: Lifelong Machine Learning, vol.13, p.66, 2013.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the Game of Go with Deep Neural Networks and Tree Search, Nature 529, vol.7587, p.35, 2016.

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang et al., Mastering the Game of Go without Human Knowledge, Nature 550, vol.7676, p.37, 2017.

D. Silver and G. Tesauro, Monte-Carlo Simulation Balancing, Proceedings of the 26th International Conference on Machine Learning, p.35, 2009.

D. Silver and J. Veness, Monte Carlo Planning in Large POMDPs, Advances in Neural Information Processing Systems, vol.24, pp.2164-2172, 2010.

. Sindhu, . Padakandla, K. J. Prabuchandran, and B. Shalabh, Reinforcement Learning for Non-Stationary Environments, p.66, 2019.

D. J. Soemers, F. Chiara, T. Sironi, M. H. Schuster, and . Winands, Enhancements for Real-Time Monte-Carlo Tree Search in General Video Game Playing, Proceedings of the 12th Computational Intelligence and Games conference, 2016.

, IEEE, vol.39, p.37

J. Song, Y. Gao, H. Wang, and B. An, Measuring the Distance Between Finite Markov Decision Processes, Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), pp.468-476, 2016.

J. Sorg and S. Singh, International Foundation for Autonomous Agents and Multiagent Systems, Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, p.94, 2009.

M. W. Spratling, A Hierarchical Predictive Coding Model of Object Recognition in Natural Images, Cognitive Computation 9, p.126, 2017.

A. L. Strehl, L. Li, and M. L. Littman, Reinforcement Learning in Finite MDPs, PAC Analysis". In: Journal of Machine Learning Research, vol.10, pp.2413-2444, 2009.

R. S. Sutton, Dyna, an Integrated Architecture for Learning, Planning, and Reacting, ACM SIGART Bulletin, vol.2, issue.4, p.17, 1991.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2018.

R. S. Sutton, D. Precup, and S. Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artificial Intelligence, vol.112, issue.1-2, p.68, 1999.

C. Szepesvári and M. L. Littman, Generalized Markov Decision Processes: Dynamic-Programming and Reinforcement Learning Algorithms, Proceedings of the 13th International Conference on Machine Learning, vol.96, p.63, 1996.

I. Szita and C. Szepesvári, Model-Based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds, Proceedings of the 27th International Conference on Machine Learning (ICML 2010), p.121, 2010.

I. Szita, B. Takács, and A. L?rincz, ?-MDPs: Learning in Varying Environments, Journal of Machine Learning Research, vol.3, p.63, 2002.

M. E. Taylor and P. Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, Journal of Machine Learning Research, vol.10, pp.1633-1685, 2009.

P. M. Vaidya, Speeding-up linear programming using fast matrix multiplication, Proceedings of the 30th Annual Symposium on Foundations of Computer Science (FOCS 1989, p.134, 1989.

. Van-woensel, L. Tom, H. Kerbache, N. Peremans, and . Vandaele, Vehicle routing with dynamic travel times: A queueing approach, European Journal of Operational Research, vol.186, p.69, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00465127

C. Villani, Optimal Transport: Old and New, vol.338, p.70, 2008.

C. J. Watkins and P. Dayan, Q-learning, Machine Learning, vol.8, pp.279-292, 1992.

A. Weinstein and M. L. Littman, Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes, Proceedings of the 22nd International Conference on Automated Planning and Scheduling (ICAPS 2012, 2012.

M. A. Wiering, Reinforcement Learning in Dynamic Environments using Instantiated Information, Proceedings of the 18th International Conference on Machine Learning, p.65, 2001.

R. J. Williams, C. Leemon, and . Baird, Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions, p.19, 1993.

A. Wilson, A. Fern, S. Ray, and P. Tadepalli, Multi-Task Reinforcement Learning: A Hierarchical Bayesian Approach, Proceedings of the 24th International Conference on Machine Learning, p.96, 2007.

F. Xie and Z. Liu, Backpropagation Modification in Monte-Carlo Game Tree Search, Proceedings of the 3rd International Symposium on Intelligent Information Technology Application, vol.2, p.36, 2009.