Quality of random in MCTS ?

For discussing go computing, software announcements, etc.
Post Reply
Ferran
Lives in gote
Posts: 664
Joined: Mon Apr 22, 2019 1:04 am
Rank: OGS ddk
GD Posts: 0
KGS: Ferran
IGS: Ferran
OGS: Ferran
Has thanked: 177 times
Been thanked: 121 times

Quality of random in MCTS ?

Post by Ferran »

Question,

how good does the random generator have to be in a MCTS program? Specially one attached to a Neural Network? Does anyone have an idea?

Thanks. Take care.
一碁一会
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Quality of random in MCTS ?

Post by lightvector »

Not sure what you mean. Typically modern forms of MCTS that uses a neural network for policy and value are fundamentally deterministic. So they don't need a random number generator at all.

They get some nondeterminism from multithreading and GPUs, because threads don't always run in the same order and GPUs sometimes vary in their results depending on batching, but that's not core to the algorithm and doesn't involve any built-in random generators, that's just a quirk of computer architecture.

And you can explicitly add in randomness if you want, separate from the algorithm. For example at the very end, entirely separately from search, if you wish to randomize between moves that are evaluated as nearly equal so that it doesn't always play the same game. Or if you want to randomize the orientation of a neural net or whatever when you evaluate a node within the search. But none of that is important or necessary at all for the algorithm, which itself has no randomness - these are just extra things you can do on the side depending on what you want.

MCTS used to be based on monte-carlo rollouts but those got replaced with neural nets, and DeepMind unfortunately didn't consider to change the name of the algorithm once they did that replacement, and it's too-widely published now to change the name, so we're stuck with "monte-carlo" being part of the name of an essentially deterministic algorithm.

The heart of MCTS is to recursively treat each node as a multi-armed bandit optimization (https://en.wikipedia.org/wiki/Multi-armed_bandit), and we now understand that whether it uses any randomness has nothing to do with the core of the algorithm, it entirely depends on what addons you combine that core with (and whether those add-ons themselves happen to have any randomness). The bandit-based part has been enduring though, so way back in 2006-2007, if the original authors had been more prescient, they might have called it "Bandit-based tree search" rather than "Monte-carlo tree search".
Ferran
Lives in gote
Posts: 664
Joined: Mon Apr 22, 2019 1:04 am
Rank: OGS ddk
GD Posts: 0
KGS: Ferran
IGS: Ferran
OGS: Ferran
Has thanked: 177 times
Been thanked: 121 times

Re: Quality of random in MCTS ?

Post by Ferran »

I was under the impression that you still needed to randomly choose which leaves to explore to the end, though. There's some previous selection, sure, but there's a point when you need to pick a smaller subset randomly, isn't there?

Take care.
一碁一会
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Quality of random in MCTS ?

Post by lightvector »

No, there is a direct formula that says exactly which node to explore (given the stats so far). You apply this formula repeatedly this until you hit a node not in the tree yet, add the node, do one neural net eval, and then repeat. So there isn't any random selection.

Note that the stopping condition is hitting a node not in the tree rather than hitting a node that is the end of the game. So ever since neural nets being used for everything, you also don't roll out to the end of the game, you just incrementally read each branch a little further each time.
User avatar
Harleqin
Lives in sente
Posts: 921
Joined: Sat Mar 06, 2010 10:31 am
Rank: German 2 dan
GD Posts: 0
Has thanked: 401 times
Been thanked: 164 times

Re: Quality of random in MCTS ?

Post by Harleqin »

Thanks, I was wondering about that, too.

So, what does the »number of playouts« mean now? Number of evaluated nodes? And blind spots mean that some relevant nodes were not given high enough priority? By the policy net?
A good system naturally covers all corner cases without further effort.
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Quality of random in MCTS ?

Post by lightvector »

Yes, playouts means number of nodes in the tree now - it's the number of times you repeated the process of descending the tree down by following that formula that tells you the "most useful" path to explore next, hit the end of the tree, and added and evaluated one new node.

Blind spot is a phrase that has lost a lot of meaning, because people tend to just call most mistakes by a bot a blind spot now, regardless of what kind of mistake it is. Yes, it used to refer to mistakes due to having extremely low policy prior on a move, rather than value misevaluation or lack of reasonable amount of compute time.

The reason why people stopped using the term correctly is because while it may have been clearer when bots were slightly weaker "why" a particular mistake happens, as bots have gotten better it has gotten slightly harder, and also it seems like fewer people are bothering to understand the cause of any particular mistake. When there is a blind spot, it's often *not* the first move you find that is correct but that the bot doesn't want to play. Usually the bot sees that move fine and it's something deeper, and many iterations of interactivity are needed to find it. For example: https://imgur.com/a/H4D6t7k
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: Quality of random in MCTS ?

Post by Uberdude »

Great worked example of finding the blind spot, thanks!
Ferran
Lives in gote
Posts: 664
Joined: Mon Apr 22, 2019 1:04 am
Rank: OGS ddk
GD Posts: 0
KGS: Ferran
IGS: Ferran
OGS: Ferran
Has thanked: 177 times
Been thanked: 121 times

Re: Quality of random in MCTS ?

Post by Ferran »

@lightvector

Thanks. I've been checking a few things on my off time these days, and I think I have a path. And that I need to check the pseudocode for UCB better.

Happy New Year
一碁一会
Schachus
Lives with ko
Posts: 211
Joined: Wed Jul 01, 2015 11:02 am
Rank: KGS 1k EGF 2k
GD Posts: 0
KGS: Schachus12
Has thanked: 16 times
Been thanked: 62 times

Re: Quality of random in MCTS ?

Post by Schachus »

So, I understand correctly, for the first „playout“ you just take the move with the highest policy and evaluate the value-net? How does the second playout work? How is it determined from policy-values and (and maybe the value of that 1 playout?) whether to explore that first branch deeper or whether to go into the second highest policy move?
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Quality of random in MCTS ?

Post by lightvector »

If you want the actual formula, look at:
https://ai.stackexchange.com/questions/ ... -root-node
or search online yourself for many other tutorials and explanations. Note that the handling of what Q should be initialized to for a node with 0 playouts so far varies between bots, and there are some other messy formulas that go into that as well. (That post assumes it's initialized to 0, but that's not a given, you can initialize to other things, different bots will tune to whatever parameters maximize their strength).
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: Quality of random in MCTS ?

Post by Kirby »

lightvector wrote:For example: https://imgur.com/a/H4D6t7k
This example is awesome.
be immersed
Post Reply