It is currently Fri Apr 19, 2024 7:15 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 14 posts ] 
Author Message
Offline
 Post subject: Monte Carlo (upper confidence bounds applied to trees)
Post #1 Posted: Fri Oct 22, 2010 9:41 am 
Beginner

Posts: 6
Liked others: 0
Was liked: 0
Hello all!
I would much appreciate if someone could explain me what exactly "upper confidence bounds applied to trees" is.

I added Monte Carlo to my engine, and now it plays even worse :(

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #2 Posted: Fri Oct 22, 2010 11:22 am 
Lives in gote
User avatar

Posts: 643
Location: Munich, Germany
Liked others: 115
Was liked: 102
Rank: KGS 3k
KGS: LiKao / Loki
It's an expansion rule for monte carlo. You always need to decide from which point in the move tree you want to start the monte carlo playout.
You can use the node where WinningProbability+SomeConstant/Sqrt(PlayoutsOfThatNode) is maximal.
This tries to balance playing out the most promising node with exploring new untried regions of the tree. 1/Sqrt(PlayoutsOfThatNode) scales like the standard deviation. So the expression is something like ExpectancyValue+C*StandardDeviation, so the real value of the move is probably below the estimated value.
There are several papers out there. And I think SL has some articles too.

But MonteCarlo only works well if the moves in playouts aren't aren't evenly distributed, but "good looking" moves are played more often.

edit:
Those articles are a good starting point:
http://senseis.xmp.net/?MonteCarlo
http://senseis.xmp.net/?UCT

_________________
Sanity is for the weak.

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #3 Posted: Fri Oct 22, 2010 3:31 pm 
Gosei
User avatar

Posts: 1744
Liked others: 702
Was liked: 288
KGS: greendemon
Tygem: greendemon
DGS: smaragdaemon
OGS: emeraldemon
The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works:

http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #4 Posted: Sat Oct 23, 2010 6:19 am 
Beginner

Posts: 6
Liked others: 0
Was liked: 0
Li Kao wrote:
Those articles are a good starting point:
http://senseis.xmp.net/?UCT


Thank you. I didn't understand what author of the article (Magnus Persson?) wanted to say. It seems that English isn't his native language. Or I'm not a smart guy :roll:

Code:
The sum of the winrate and the number (UCTvalue) for each candidate move n is computed with

UCTValue(parent, n) = winrate + sqrt((ln(parent.visits))/([b]5[/b]*n.nodevisits))


The question is why 5?

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #5 Posted: Sat Oct 23, 2010 6:22 am 
Beginner

Posts: 6
Liked others: 0
Was liked: 0
emeraldemon wrote:
The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works:

http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf


Thanks a lot!

The questions still remain. Am I correct thinking that the program creates a tree search (with which depth, by the way) and then plays simulations for terminal nodes?

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #6 Posted: Sat Oct 23, 2010 8:15 am 
Lives in gote
User avatar

Posts: 643
Location: Munich, Germany
Liked others: 115
Was liked: 102
Rank: KGS 3k
KGS: LiKao / Loki
One simple variant of MCT works like this:
1) From the existing tree choose the node that has the maximum UCT score
2) Run a MC playout from that node on
3) Add the first move of the MC playout as a new childnode
repeat

Important for high playing strength is that the playout is guided by a good heuristic.

Another trick is that because go moves don't depend that much on order one can score the result of one playout not only for its first move but for other moves as well(see RAVE).

_________________
Sanity is for the weak.

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #7 Posted: Sat Oct 23, 2010 11:39 am 
Gosei
User avatar

Posts: 1744
Liked others: 702
Was liked: 288
KGS: greendemon
Tygem: greendemon
DGS: smaragdaemon
OGS: emeraldemon
What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this:


Code:
/ \
/\ /\


And I run a simulation from the leftmost branch:

Code:
/ \
/\ /\
\
/
\
/
...


I keep the first move of that simulation as the next node added to the tree:

Code:
/ \
/\ /\
\


hope that helps, sorry I can't make the trees look right

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #8 Posted: Mon Oct 25, 2010 9:11 pm 
Beginner

Posts: 6
Liked others: 0
Was liked: 0
Why winrate takes into account only wins?

Code:
winrate := Wins/Visits


What is better:

3 wins, 0 draws, 7 loses

or

2 wins, 6 draws, 2 loses

?

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #9 Posted: Mon Oct 25, 2010 11:50 pm 
Beginner

Posts: 6
Liked others: 0
Was liked: 0
emeraldemon wrote:
What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this:



Sorry, I'm still not sure that UCT works good.

For example, I have a position with two possible moves (a1 and b1):

Code:
    O
  /   \
(a1) (b1)


First random game (using a1 as first move) returns 0, second (b1) returns 1:

Code:
    O
  /   \
(a1) (b1)
(0/1)(1/1)


Ok, let's try to explore b1 move (the program likes it, because uct of a1 = 0):

Code:
    O
  /   \
(a1) (b1)
(0/1)(1/1)
     / | \
   a2 b2 c2


a2 returns 0, b2 and c2 returns 1:

Code:
    O
  /   \
(a1) (b1)
(0/1)(3/4)
     / | \
   a2 b2 c2
   0   1  1


So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0.

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #10 Posted: Tue Oct 26, 2010 12:57 am 
Lives in gote
User avatar

Posts: 643
Location: Munich, Germany
Liked others: 115
Was liked: 102
Rank: KGS 3k
KGS: LiKao / Loki
ChessGo wrote:
Why winrate takes into account only wins?

Since the komi is usually half integral draws are not possible(at least with superko). But if you're using a game where there are draws you could count it as half a win.

_________________
Sanity is for the weak.

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #11 Posted: Tue Oct 26, 2010 6:22 am 
Lives in sente

Posts: 1037
Liked others: 0
Was liked: 180
ChessGo wrote:


Sorry, I'm still not sure that UCT works good.
....................
So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0.


I think the problem is that we have over simplified the description? What you have concluded would be the case if the algorithm at each node were to CERTAINLY take the path currently higher. But suppose that were a matter of probability? The greater the difference between the two paths the more likely the higher will be taken but a smaller (but non zero) chance remains that the lower will be tried. Then if as you suggest that (currently) lower path is "the only move" eventually its count will increase until it isn't the lower count.

Given enough tries. These algorithms work only when the number of trials is very large. And the devil in the details, performance might very well depend on just how that diffence in count affects the probablity of which path taken.

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #12 Posted: Tue Oct 26, 2010 7:04 am 
Lives in gote
User avatar

Posts: 643
Location: Munich, Germany
Liked others: 115
Was liked: 102
Rank: KGS 3k
KGS: LiKao / Loki
To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity.

_________________
Sanity is for the weak.

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #13 Posted: Tue Oct 26, 2010 11:36 am 
Beginner

Posts: 6
Liked others: 0
Was liked: 0
Li Kao wrote:
To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity.


You are correct. It contains two parts: Win probability and SQRT(ln...)

Let me ask one more question please.

Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT?

Top
 Profile  
 
Offline
 Post subject: Re: Monte Carlo (upper confidence bounds applied to trees)
Post #14 Posted: Tue Oct 26, 2010 11:46 am 
Lives in gote
User avatar

Posts: 643
Location: Munich, Germany
Liked others: 115
Was liked: 102
Rank: KGS 3k
KGS: LiKao / Loki
ChessGo wrote:
Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT?

Adding a term for the error when choosing the move doesn't make sense. You don't want to play a move with an uncertain result. So winrate is definitely the better one of the two.
It might even make sense to subtract something representing uncertainty so it prefers moves it has well analyzed. But since I've never actually written a MCTS program, I don't know about that.

_________________
Sanity is for the weak.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: Bing [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group