Quality of random in MCTS ?

Ferran · Post by **Ferran** » Mon Dec 27, 2021 6:19 am

Question,

how good does the random generator have to be in a MCTS program? Specially one attached to a Neural Network? Does anyone have an idea?

Thanks. Take care.

lightvector · Post by **lightvector** » Mon Dec 27, 2021 8:05 am

Not sure what you mean. Typically modern forms of MCTS that uses a neural network for policy and value are fundamentally deterministic. So they don't need a random number generator at all.

They get some nondeterminism from multithreading and GPUs, because threads don't always run in the same order and GPUs sometimes vary in their results depending on batching, but that's not core to the algorithm and doesn't involve any built-in random generators, that's just a quirk of computer architecture.

And you can explicitly add in randomness if you want, separate from the algorithm. For example at the very end, entirely separately from search, if you wish to randomize between moves that are evaluated as nearly equal so that it doesn't always play the same game. Or if you want to randomize the orientation of a neural net or whatever when you evaluate a node within the search. But none of that is important or necessary at all for the algorithm, which itself has no randomness - these are just extra things you can do on the side depending on what you want.

MCTS used to be based on monte-carlo rollouts but those got replaced with neural nets, and DeepMind unfortunately didn't consider to change the name of the algorithm once they did that replacement, and it's too-widely published now to change the name, so we're stuck with "monte-carlo" being part of the name of an essentially deterministic algorithm.

The heart of MCTS is to recursively treat each node as a multi-armed bandit optimization (https://en.wikipedia.org/wiki/Multi-armed_bandit), and we now understand that whether it uses any randomness has nothing to do with the core of the algorithm, it entirely depends on what addons you combine that core with (and whether those add-ons themselves happen to have any randomness). The bandit-based part has been enduring though, so way back in 2006-2007, if the original authors had been more prescient, they might have called it "Bandit-based tree search" rather than "Monte-carlo tree search".

Ferran · Post by **Ferran** » Mon Dec 27, 2021 11:49 am

I was under the impression that you still needed to randomly choose which leaves to explore to the end, though. There's some previous selection, sure, but there's a point when you need to pick a smaller subset randomly, isn't there?

Take care.

lightvector · Post by **lightvector** » Mon Dec 27, 2021 4:16 pm

No, there is a direct formula that says exactly which node to explore (given the stats so far). You apply this formula repeatedly this until you hit a node not in the tree yet, add the node, do one neural net eval, and then repeat. So there isn't any random selection.

Note that the stopping condition is hitting a node not in the tree rather than hitting a node that is the end of the game. So ever since neural nets being used for everything, you also don't roll out to the end of the game, you just incrementally read each branch a little further each time.

Harleqin · Post by **Harleqin** » Tue Dec 28, 2021 2:16 pm

Thanks, I was wondering about that, too.

So, what does the »number of playouts« mean now? Number of evaluated nodes? And blind spots mean that some relevant nodes were not given high enough priority? By the policy net?

lightvector · Post by **lightvector** » Tue Dec 28, 2021 4:01 pm

Yes, playouts means number of nodes in the tree now - it's the number of times you repeated the process of descending the tree down by following that formula that tells you the "most useful" path to explore next, hit the end of the tree, and added and evaluated one new node.

Blind spot is a phrase that has lost a lot of meaning, because people tend to just call most mistakes by a bot a blind spot now, regardless of what kind of mistake it is. Yes, it used to refer to mistakes due to having extremely low policy prior on a move, rather than value misevaluation or lack of reasonable amount of compute time.

The reason why people stopped using the term correctly is because while it may have been clearer when bots were slightly weaker "why" a particular mistake happens, as bots have gotten better it has gotten slightly harder, and also it seems like fewer people are bothering to understand the cause of any particular mistake. When there is a blind spot, it's often *not* the first move you find that is correct but that the bot doesn't want to play. Usually the bot sees that move fine and it's something deeper, and many iterations of interactivity are needed to find it. For example: https://imgur.com/a/H4D6t7k

Uberdude · Post by **Uberdude** » Tue Dec 28, 2021 5:03 pm

Great worked example of finding the blind spot, thanks!

Ferran · Post by **Ferran** » Wed Dec 29, 2021 2:32 am

@lightvector

Thanks. I've been checking a few things on my off time these days, and I think I have a path. And that I need to check the pseudocode for UCB better.

Happy New Year

Schachus · Post by **Schachus** » Wed Dec 29, 2021 6:06 am

So, I understand correctly, for the first „playout“ you just take the move with the highest policy and evaluate the value-net? How does the second playout work? How is it determined from policy-values and (and maybe the value of that 1 playout?) whether to explore that first branch deeper or whether to go into the second highest policy move?

lightvector · Post by **lightvector** » Wed Dec 29, 2021 11:58 am

If you want the actual formula, look at:
https://ai.stackexchange.com/questions/ ... -root-node
or search online yourself for many other tutorials and explanations. Note that the handling of what Q should be initialized to for a node with 0 playouts so far varies between bots, and there are some other messy formulas that go into that as well. (That post assumes it's initialized to 0, but that's not a given, you can initialize to other things, different bots will tune to whatever parameters maximize their strength).

Kirby · Post by **Kirby** » Wed Dec 29, 2021 12:19 pm

lightvector wrote:For example: https://imgur.com/a/H4D6t7k

This example is awesome.

Life In 19x19

Quality of random in MCTS ?

Quality of random in MCTS ?

Re: Quality of random in MCTS ?

Re: Quality of random in MCTS ?

Re: Quality of random in MCTS ?

Re: Quality of random in MCTS ?

Re: Quality of random in MCTS ?

Re: Quality of random in MCTS ?

Re: Quality of random in MCTS ?

Re: Quality of random in MCTS ?

Re: Quality of random in MCTS ?

Re: Quality of random in MCTS ?