I would much appreciate if someone could explain me what exactly "upper confidence bounds applied to trees" is.
I added Monte Carlo to my engine, and now it plays even worse
Li Kao wrote:Those articles are a good starting point:
http://senseis.xmp.net/?UCT
Code: Select all
The sum of the winrate and the number (UCTvalue) for each candidate move n is computed with
UCTValue(parent, n) = winrate + sqrt((ln(parent.visits))/([b]5[/b]*n.nodevisits))
emeraldemon wrote:The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works:
http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf
Code: Select all
/ \
/\ /\
Code: Select all
/ \
/\ /\
\
/
\
/
...
Code: Select all
/ \
/\ /\
\
Code: Select all
winrate := Wins/Visitsemeraldemon wrote:What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this:
Code: Select all
O
/ \
(a1) (b1)Code: Select all
O
/ \
(a1) (b1)
(0/1)(1/1)Code: Select all
O
/ \
(a1) (b1)
(0/1)(1/1)
/ | \
a2 b2 c2Code: Select all
O
/ \
(a1) (b1)
(0/1)(3/4)
/ | \
a2 b2 c2
0 1 1ChessGo wrote:Why winrate takes into account only wins?
ChessGo wrote:
Sorry, I'm still not sure that UCT works good.
....................
So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0.
Li Kao wrote:To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity.
ChessGo wrote:Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT?