Monte Carlo (upper confidence bounds applied to trees)

For discussing go computing, software announcements, etc.
Post Reply
ChessGo
Beginner
Posts: 6
Joined: Fri Oct 22, 2010 9:36 am
GD Posts: 0

Monte Carlo (upper confidence bounds applied to trees)

Post by ChessGo »

Hello all!
I would much appreciate if someone could explain me what exactly "upper confidence bounds applied to trees" is.

I added Monte Carlo to my engine, and now it plays even worse :(
User avatar
Li Kao
Lives in gote
Posts: 643
Joined: Wed Apr 21, 2010 10:37 am
Rank: KGS 3k
GD Posts: 0
KGS: LiKao / Loki
Location: Munich, Germany
Has thanked: 115 times
Been thanked: 102 times

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by Li Kao »

It's an expansion rule for monte carlo. You always need to decide from which point in the move tree you want to start the monte carlo playout.
You can use the node where WinningProbability+SomeConstant/Sqrt(PlayoutsOfThatNode) is maximal.
This tries to balance playing out the most promising node with exploring new untried regions of the tree. 1/Sqrt(PlayoutsOfThatNode) scales like the standard deviation. So the expression is something like ExpectancyValue+C*StandardDeviation, so the real value of the move is probably below the estimated value.
There are several papers out there. And I think SL has some articles too.

But MonteCarlo only works well if the moves in playouts aren't aren't evenly distributed, but "good looking" moves are played more often.

edit:
Those articles are a good starting point:
http://senseis.xmp.net/?MonteCarlo
http://senseis.xmp.net/?UCT
Sanity is for the weak.
User avatar
emeraldemon
Gosei
Posts: 1744
Joined: Sun May 02, 2010 1:33 pm
GD Posts: 0
KGS: greendemon
Tygem: greendemon
DGS: smaragdaemon
OGS: emeraldemon
Has thanked: 697 times
Been thanked: 287 times

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by emeraldemon »

The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works:

http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf
ChessGo
Beginner
Posts: 6
Joined: Fri Oct 22, 2010 9:36 am
GD Posts: 0

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by ChessGo »

Li Kao wrote:Those articles are a good starting point:
http://senseis.xmp.net/?UCT


Thank you. I didn't understand what author of the article (Magnus Persson?) wanted to say. It seems that English isn't his native language. Or I'm not a smart guy :roll:

Code: Select all

The sum of the winrate and the number (UCTvalue) for each candidate move n is computed with

 UCTValue(parent, n) = winrate + sqrt((ln(parent.visits))/([b]5[/b]*n.nodevisits))


The question is why 5?
ChessGo
Beginner
Posts: 6
Joined: Fri Oct 22, 2010 9:36 am
GD Posts: 0

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by ChessGo »

emeraldemon wrote:The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works:

http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf


Thanks a lot!

The questions still remain. Am I correct thinking that the program creates a tree search (with which depth, by the way) and then plays simulations for terminal nodes?
User avatar
Li Kao
Lives in gote
Posts: 643
Joined: Wed Apr 21, 2010 10:37 am
Rank: KGS 3k
GD Posts: 0
KGS: LiKao / Loki
Location: Munich, Germany
Has thanked: 115 times
Been thanked: 102 times

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by Li Kao »

One simple variant of MCT works like this:
1) From the existing tree choose the node that has the maximum UCT score
2) Run a MC playout from that node on
3) Add the first move of the MC playout as a new childnode
repeat

Important for high playing strength is that the playout is guided by a good heuristic.

Another trick is that because go moves don't depend that much on order one can score the result of one playout not only for its first move but for other moves as well(see RAVE).
Sanity is for the weak.
User avatar
emeraldemon
Gosei
Posts: 1744
Joined: Sun May 02, 2010 1:33 pm
GD Posts: 0
KGS: greendemon
Tygem: greendemon
DGS: smaragdaemon
OGS: emeraldemon
Has thanked: 697 times
Been thanked: 287 times

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by emeraldemon »

What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this:


Code: Select all

 / \
/\ /\


And I run a simulation from the leftmost branch:

Code: Select all

 / \
/\ /\
\
/
\
/
...


I keep the first move of that simulation as the next node added to the tree:

Code: Select all

 / \
/\ /\
\


hope that helps, sorry I can't make the trees look right
ChessGo
Beginner
Posts: 6
Joined: Fri Oct 22, 2010 9:36 am
GD Posts: 0

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by ChessGo »

Why winrate takes into account only wins?

Code: Select all

winrate := Wins/Visits


What is better:

3 wins, 0 draws, 7 loses

or

2 wins, 6 draws, 2 loses

?
ChessGo
Beginner
Posts: 6
Joined: Fri Oct 22, 2010 9:36 am
GD Posts: 0

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by ChessGo »

emeraldemon wrote:What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this:



Sorry, I'm still not sure that UCT works good.

For example, I have a position with two possible moves (a1 and b1):

Code: Select all

    O
  /   \
(a1) (b1)


First random game (using a1 as first move) returns 0, second (b1) returns 1:

Code: Select all

    O 
  /   \
(a1) (b1)
(0/1)(1/1)


Ok, let's try to explore b1 move (the program likes it, because uct of a1 = 0):

Code: Select all

    O 
  /   \
(a1) (b1)
(0/1)(1/1)
     / | \
   a2 b2 c2


a2 returns 0, b2 and c2 returns 1:

Code: Select all

    O 
  /   \
(a1) (b1)
(0/1)(3/4)
     / | \
   a2 b2 c2
   0   1  1


So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0.
User avatar
Li Kao
Lives in gote
Posts: 643
Joined: Wed Apr 21, 2010 10:37 am
Rank: KGS 3k
GD Posts: 0
KGS: LiKao / Loki
Location: Munich, Germany
Has thanked: 115 times
Been thanked: 102 times

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by Li Kao »

ChessGo wrote:Why winrate takes into account only wins?

Since the komi is usually half integral draws are not possible(at least with superko). But if you're using a game where there are draws you could count it as half a win.
Sanity is for the weak.
Mike Novack
Lives in sente
Posts: 1045
Joined: Mon Aug 09, 2010 9:36 am
GD Posts: 0
Been thanked: 182 times

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by Mike Novack »

ChessGo wrote:

Sorry, I'm still not sure that UCT works good.
....................
So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0.


I think the problem is that we have over simplified the description? What you have concluded would be the case if the algorithm at each node were to CERTAINLY take the path currently higher. But suppose that were a matter of probability? The greater the difference between the two paths the more likely the higher will be taken but a smaller (but non zero) chance remains that the lower will be tried. Then if as you suggest that (currently) lower path is "the only move" eventually its count will increase until it isn't the lower count.

Given enough tries. These algorithms work only when the number of trials is very large. And the devil in the details, performance might very well depend on just how that diffence in count affects the probablity of which path taken.
User avatar
Li Kao
Lives in gote
Posts: 643
Joined: Wed Apr 21, 2010 10:37 am
Rank: KGS 3k
GD Posts: 0
KGS: LiKao / Loki
Location: Munich, Germany
Has thanked: 115 times
Been thanked: 102 times

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by Li Kao »

To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity.
Sanity is for the weak.
ChessGo
Beginner
Posts: 6
Joined: Fri Oct 22, 2010 9:36 am
GD Posts: 0

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by ChessGo »

Li Kao wrote:To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity.


You are correct. It contains two parts: Win probability and SQRT(ln...)

Let me ask one more question please.

Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT?
User avatar
Li Kao
Lives in gote
Posts: 643
Joined: Wed Apr 21, 2010 10:37 am
Rank: KGS 3k
GD Posts: 0
KGS: LiKao / Loki
Location: Munich, Germany
Has thanked: 115 times
Been thanked: 102 times

Re: Monte Carlo (upper confidence bounds applied to trees)

Post by Li Kao »

ChessGo wrote:Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT?

Adding a term for the error when choosing the move doesn't make sense. You don't want to play a move with an uncertain result. So winrate is definitely the better one of the two.
It might even make sense to subtract something representing uncertainty so it prefers moves it has well analyzed. But since I've never actually written a MCTS program, I don't know about that.
Sanity is for the weak.
Post Reply