Skip to Main content Skip to Navigation
Conference papers

Combining Online and Offline Knowledge in UCT

Sylvain Gelly 1 David Silver 2
1 TANC - Algorithmic number theory for cryptology
Inria Saclay - Ile de France, LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau]
Abstract : The UCT algorithm learns a value function online using sample-based search. The T D(lambda) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 × 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 × 9 Go program. Each technique significantly improves MoGo's playing strength.
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

https://hal.inria.fr/inria-00164003
Contributor : Sylvain Gelly <>
Submitted on : Thursday, July 19, 2007 - 1:51:04 PM
Last modification on : Wednesday, March 27, 2019 - 4:41:29 PM
Document(s) archivé(s) le : Thursday, April 8, 2010 - 11:37:50 PM

File

GellySilverICML2007.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : inria-00164003, version 1

Collections

Citation

Sylvain Gelly, David Silver. Combining Online and Offline Knowledge in UCT. International Conference of Machine Learning, Jun 2007, Corvallis, United States. ⟨inria-00164003⟩

Share

Metrics

Record views

3989

Files downloads

1836