EricBackus wrote:
Does it really help to analyze every legal top-level move but not every legal response to those moves? You clearly don't want to force analysis of every response to every move at every level of the tree, because then you're just doing a brute-force search and you don't have anywhere close to the computing power you would need for that. But with only 10 playouts for each legal move, do you really get any kind of accurate estimate of the value of those moves?
My reasoning is that you don't need an
accurate estimate, as the search works off error bounds rather than exact winrates. Consider:
- Move A: the one that the bot would pick as the best move with a winrate of 52%
- Move B: the "pro move" that the bot ignored, which if it were analysed would get a winrate of 51.5%
- Move C: pretty hopeless
On ten visits for every move, you might find:
- Move A: winrate looks like it's somewhere between 30% and 70%
- Move B: winrate looks like it's between 28% and 71%
- Move C: winrate looks like it's between 2% and 40%
Then moves A and B will both get a few extra playouts to refine the winrate estimate, but C won't. You end up being able to see an estimate for move B without the trouble of explicitly asking for it.
EricBackus wrote:But, of course, it is worth trying anyway just to see what happens.
Done (for LZ). I feel like the apocryphal plumber: the code changes only took a few minutes, but it was more than an hour's work to figure out that I needed to change the UCTNode::uct_select_child function. This is a dodgy first cut: the number 10 is hard-coded. If I get some more spare time, I'll add a command line option to change the number of playouts. Of course this is all open source, so no reason why someone else couldn't pick it up and run with it.
Initial results look promising but not spectacular. Using LZ network 242, in the medium term (5,000-10,000 playouts) you see a wider range of moves, but then it converges on to what the unmodified LZ-242 would have done anyway. Using the "bubblesId" network packaged with Lizzie 0.7, there seems to be more variety. Over the weekend I'll play with some different positions and different networks, and post some examples here unless I get distracted by the next shiny thing.