Life In 19x19 http://www.lifein19x19.com/ |
|
Monte Carlo (upper confidence bounds applied to trees) http://www.lifein19x19.com/viewtopic.php?f=18&t=2184 |
Page 1 of 1 |
Author: | ChessGo [ Fri Oct 22, 2010 9:41 am ] |
Post subject: | Monte Carlo (upper confidence bounds applied to trees) |
Hello all! I would much appreciate if someone could explain me what exactly "upper confidence bounds applied to trees" is. I added Monte Carlo to my engine, and now it plays even worse ![]() |
Author: | Li Kao [ Fri Oct 22, 2010 11:22 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
It's an expansion rule for monte carlo. You always need to decide from which point in the move tree you want to start the monte carlo playout. You can use the node where WinningProbability+SomeConstant/Sqrt(PlayoutsOfThatNode) is maximal. This tries to balance playing out the most promising node with exploring new untried regions of the tree. 1/Sqrt(PlayoutsOfThatNode) scales like the standard deviation. So the expression is something like ExpectancyValue+C*StandardDeviation, so the real value of the move is probably below the estimated value. There are several papers out there. And I think SL has some articles too. But MonteCarlo only works well if the moves in playouts aren't aren't evenly distributed, but "good looking" moves are played more often. edit: Those articles are a good starting point: http://senseis.xmp.net/?MonteCarlo http://senseis.xmp.net/?UCT |
Author: | emeraldemon [ Fri Oct 22, 2010 3:31 pm ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works: http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf |
Author: | ChessGo [ Sat Oct 23, 2010 6:19 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
Li Kao wrote: Those articles are a good starting point: http://senseis.xmp.net/?UCT Thank you. I didn't understand what author of the article (Magnus Persson?) wanted to say. It seems that English isn't his native language. Or I'm not a smart guy ![]() Code: The sum of the winrate and the number (UCTvalue) for each candidate move n is computed with UCTValue(parent, n) = winrate + sqrt((ln(parent.visits))/([b]5[/b]*n.nodevisits)) The question is why 5? |
Author: | ChessGo [ Sat Oct 23, 2010 6:22 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
emeraldemon wrote: The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works: http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf Thanks a lot! The questions still remain. Am I correct thinking that the program creates a tree search (with which depth, by the way) and then plays simulations for terminal nodes? |
Author: | Li Kao [ Sat Oct 23, 2010 8:15 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
One simple variant of MCT works like this: 1) From the existing tree choose the node that has the maximum UCT score 2) Run a MC playout from that node on 3) Add the first move of the MC playout as a new childnode repeat Important for high playing strength is that the playout is guided by a good heuristic. Another trick is that because go moves don't depend that much on order one can score the result of one playout not only for its first move but for other moves as well(see RAVE). |
Author: | emeraldemon [ Sat Oct 23, 2010 11:39 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this: Code: / \ /\ /\ And I run a simulation from the leftmost branch: Code: / \ /\ /\ \ / \ / ... I keep the first move of that simulation as the next node added to the tree: Code: / \ /\ /\ \ hope that helps, sorry I can't make the trees look right |
Author: | ChessGo [ Mon Oct 25, 2010 9:11 pm ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
Why winrate takes into account only wins? Code: winrate := Wins/Visits What is better: 3 wins, 0 draws, 7 loses or 2 wins, 6 draws, 2 loses ? |
Author: | ChessGo [ Mon Oct 25, 2010 11:50 pm ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
emeraldemon wrote: What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this: Sorry, I'm still not sure that UCT works good. For example, I have a position with two possible moves (a1 and b1): Code: O / \ (a1) (b1) First random game (using a1 as first move) returns 0, second (b1) returns 1: Code: O / \ (a1) (b1) (0/1)(1/1) Ok, let's try to explore b1 move (the program likes it, because uct of a1 = 0): Code: O / \ (a1) (b1) (0/1)(1/1) / | \ a2 b2 c2 a2 returns 0, b2 and c2 returns 1: Code: O / \ (a1) (b1) (0/1)(3/4) / | \ a2 b2 c2 0 1 1 So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0. |
Author: | Li Kao [ Tue Oct 26, 2010 12:57 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
ChessGo wrote: Why winrate takes into account only wins? Since the komi is usually half integral draws are not possible(at least with superko). But if you're using a game where there are draws you could count it as half a win. |
Author: | Mike Novack [ Tue Oct 26, 2010 6:22 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
ChessGo wrote: Sorry, I'm still not sure that UCT works good. .................... So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0. I think the problem is that we have over simplified the description? What you have concluded would be the case if the algorithm at each node were to CERTAINLY take the path currently higher. But suppose that were a matter of probability? The greater the difference between the two paths the more likely the higher will be taken but a smaller (but non zero) chance remains that the lower will be tried. Then if as you suggest that (currently) lower path is "the only move" eventually its count will increase until it isn't the lower count. Given enough tries. These algorithms work only when the number of trials is very large. And the devil in the details, performance might very well depend on just how that diffence in count affects the probablity of which path taken. |
Author: | Li Kao [ Tue Oct 26, 2010 7:04 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity. |
Author: | ChessGo [ Tue Oct 26, 2010 11:36 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
Li Kao wrote: To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity. You are correct. It contains two parts: Win probability and SQRT(ln...) Let me ask one more question please. Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT? |
Author: | Li Kao [ Tue Oct 26, 2010 11:46 am ] |
Post subject: | Re: Monte Carlo (upper confidence bounds applied to trees) |
ChessGo wrote: Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT? Adding a term for the error when choosing the move doesn't make sense. You don't want to play a move with an uncertain result. So winrate is definitely the better one of the two. It might even make sense to subtract something representing uncertainty so it prefers moves it has well analyzed. But since I've never actually written a MCTS program, I don't know about that. |
Page 1 of 1 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |