Deep learning and ladder blindness

MikeKyle · #1

I was wondering about the implications of the ladder problem to the actual process of learning from self play. Maybe someone with better expertise or insight can help me understand.

As I understand it, due to limmitations of the "zero method" all of these wonderfull strong zero-based bots have a blindness for what will happen at the end of long ladders. But surely they have enough pattern recognition prowes in their neural nets to recognise the kind of thing that happens at the start of a ladder. In this case if the neural net "thought process" could be translated into English (currently impossible of course) wouldn't the bot be thinking something like "This variation I'm reading leads to one of those mysterious ladder things. From my self play I know that playing out one of these leads to either a nearly-won game or a nearly-lost game. If I try this out then I'm deciding the game based on some kind of a coin toss I don't understand!".

If this is the case then wouldn't bots avoid ladders pretty heavily when they think they have >50% win rate and seek them out when they think they have <50% win rate? In their self play experience wouldn't a ladder represent a way to create a gamble out of a unfavourable looking game, or represent a really bad thing to go near when the game looks okay? wouldn't we expect to see quite different joseki/patterns depending on who is ahead or behind by a relatively small amount?

moha · #2

There are a few different strategies a net can come up with for the ladder problem. First and most important, with sufficient network depth it is possible to understand and predict the likely results of most ladders. This takes a lot of training and is very sensible to various bugs/biases/defects though, as the necessary connections are long and complicated. The original AGZ was probably the closest to this approach.

Then it is possible to arrive at the assumption that ladders are dangers that usually favor the opponent, thus avoiding them. ELF is said to have evolved in this direction.

What you say is also possible, although it is not a good idea to assume the opponent shares my blindness. Playing out an unread ladder is unlikely to give an even chance, so keeping even 40% or 30% can be preferable to it. But since most bots train in selfplay, this can create similar illusions. This was typical for early LZ, even strengthened by a special bias from it's distributed match system: a quick win (typical early ladders) was worth more than a slow win, because the match was abandoned quite easily if early results favored one side strongly enough. There was even a time where LZ's ladder preference oscillated a bit, probably depending on whether the net in question saw more ladder wins or ladder losses in it's training.

And also, since the net does not need to predict good moves, but interesting moves that need to be searched, assuming all ladders are favorable can also lead to good results when paired with search. This would lead to all ladders getting "looked at" and played only if favorable. This is said to be the current LZ way (although there is some contradiction here since training is done towards search results, so working ladders only). But also note that this has the drawback that only ladders from or near the starting position are handled correctly, so the ones that come up deeper during search remain a problem.

chut · #3

I have been think about this intriguing problem a bit. I think human has a meta level logic that governs the tree search engine. So when human sees a ladder pattern forming he/she will direct the tree search to read the ladder to the end as a matter top priority. The win rate in the middle of a ladder is undefined until the whole ladder is fully read out.

I believe Alpha Go (Lee, Master, Zero) have incorporated such a mechanism in its tree search algorithm, otherwise the whole system has a very fragile very glaring loophole. LZ has just such a loophole and it is very easy to mislead LZ into one with a common 3,3 point invasion joseki.

Tryss · #4

chut wrote:

I believe Alpha Go (Lee, Master, Zero) have incorporated such a mechanism in its tree search algorithm, otherwise the whole system has a very fragile very glaring loophole. LZ has just such a loophole and it is very easy to mislead LZ into one with a common 3,3 point invasion joseki.

According to the papers the Alpha Go team published, no, they didn't (and they would have probably published it). In addition to that, that would mean it's no more a pure "zero" bot, so it's doubtfull they would even try to implement something like this in Alpha Go Zero (or in Alpha Zero).

chut · #5

Maybe Alphago MUCH superior tree search is able to read deep enough to determine the result of a ladder often enough so it can adjust the network accordingly. In a ladder situation the network weight is meaningless without such a deep tree search. The generative adversarial networks playing each other will be in a blind leading the blind situation and they will never learn how to deal with a ladder properly.

I think with an less endowed training facilities we will need to augment the tree search with such a meta level algorithm.

chut · #6

I think this is a very intriguing problem. It is a all or nothing situation. If the MCTS is not deep enough then certain pattern will never be learned. This would be worthy of a paper by itself. And the meta level algorithm to guide the MCTS - I think that is a deep learning experiment worth pursuing.

Bill Spight · #7

chut wrote:

I have been think about this intriguing problem a bit. I think human has a meta level logic that governs the tree search engine. So when human sees a ladder pattern forming he/she will direct the tree search to read the ladder to the end as a matter top priority. The win rate in the middle of a ladder is undefined until the whole ladder is fully read out.

Neural nets are like human unconscious parallel processing (intuition). Human conscious search is very memory limited, which means that it is pretty much depth first. That makes ladders and one lane roads natural targets for it. Humans also have logic, which can allow us to focus or eliminate search. For instance, in a capturing race we can count liberties, which may be viewed as a kind of depth first non-alternating search which, once done, allows us to eliminate other searches, because we know that they are equivalent.

Adding human style logic to a go playing program may or may not be a good idea. A ladder module might work fine if triggered, but make the program less efficient when it is not. Currently programs are getting better rapidly without such alterations, so why bother?

BTW, I have only looked at a couple of Elf's reviews, but it often seems to perform deep search with little breadth (exploration). Perhaps its pattern recognitions is now so good that the benefits of deep search for ladders and semeai are paying off.

John Fairbairn · #8

Humans don't read out most ladders. They just apply the rule of six. It seems practicable to replicate that quite efficiently in a computer, and since it would only ever be triggered in an atari situation, it would be triggered relatively rarely and so would not significantly slow the machine down.

There are of course situations where things get messy with more than one stone in the line of fire on the six lines, but such situations are uncommon and even if the computer gets those cases wrong (and on average only half would be wrong anyway) no harm is done compared to the current capability.

Bill Spight · #9

John Fairbairn wrote:

Humans don't read out most ladders. They just apply the rule of six.

Yes, human logic has produce the rule of six and other rules to make our conscious processing more efficient.

Quote:

It seems practicable to replicate that quite efficiently in a computer, and since it would only ever be triggered in an atari situation, it would be triggered relatively rarely and so would not significantly slow the machine down.

So one would think.

Yet even when the programmer is a strong player or the programming team includes a strong player such modules have not been implemented in the top level bots. {shrug}

Tryss · **#10**

John Fairbairn wrote:

Humans don't read out most ladders. They just apply the rule of six. It seems practicable to replicate that quite efficiently in a computer, and since it would only ever be triggered in an atari situation, it would be triggered relatively rarely and so would not significantly slow the machine down.

But then you need to apply it to every atari in the search. It's far from obvious that it would increase the strength at time parity.

Uberdude · **#11**

John Fairbairn wrote:

Humans don't read out most ladders. They just apply the rule of six.

This human doesn't! (and isn't even clear what said rule is, though imagines it's too do with the width of a channel that affects ladders).

moha · **#12**

As I wrote above it is possible for a net to guess the ladder result without search. It can create a logic like "if the closest stone along this diagonal is w and is closer than the b stone along the neighbouring diagonal" etc. (rule of six?)

It is not easy, and needs intermediate data/concepts (about "closest along a diagonal") which in turn needs cooperation between several layers to pass around those intermediate info. This is probably near the complexity limit that simple gradient descent can find with a lot of training, but AGZ is rumored to achieve this.

A few people experimented with hardcoding those diagonal scans only (as opposed to "ladder capture / ladder escape" from original AlphaGo), and this was also successful.

Bill Spight · **#13**

Uberdude wrote:

John Fairbairn wrote:

Humans don't read out most ladders. They just apply the rule of six.

This human doesn't! (and isn't even clear what said rule is, though imagines it's too do with the width of a channel that affects ladders).

Click Here To Show Diagram Code: [go]$$W Ladder path $$ --------------- $$ . . . . . . . . | $$ . . . a X X . . | $$ . . a S C O X . | $$ . a S C C X . X | $$ a S C C S a . . | $$ S C C S a . . . |[/go]

(Diagram from Sensei's Library, https://senseis.xmp.net/?LadderBreaker )
The linear distance from "a" to "a" inclusive measures 6 points.

jlt · **#14**

dfan · **#15**

Bill Spight wrote:

John Fairbairn wrote:

It seems practicable to replicate that quite efficiently in a computer, and since it would only ever be triggered in an atari situation, it would be triggered relatively rarely and so would not significantly slow the machine down.

So one would think.

Yet even when the programmer is a strong player or the programming team includes a strong player such modules have not been implemented in the top level bots. {shrug}

AlphaGo had two input features that checked for the presence of ladders in this way. AlphaGo Zero removed those features because DeepMind wanted to show that it was possible to achieve top-level play without assistance by hand-crafted features. Projects such as Leela Zero and ELF OpenGo happened to follow AlphaGo Zero in this respect, I believe partially because they were trying to confirm that DeepMind's work could be duplicated. I agree that adding ladder features back in would improve the systems' strength, and I wouldn't be surprised if one of the non-open systems has already done so.

Tryss · **#16**

Yes, you're right, I was mistaken

Quote:

Features for policy/value network. Each position s was pre-processed into a set of 19×19 feature planes. The features that we use come directly from the raw representation of the game rules, indicating the status of each intersection of the Go board: stone colour, liberties (adjacent empty points of stone’s chain), captures, legality, turns since stone was played, and (for the value network only) the current colour to play. In addition, we use one simple tactical feature that computes the outcome of a ladder search⁷.

Source : https://storage.googleapis.com/deepmind ... ePaper.pdf (page 8)

Note that this is the orginial AlphaGo

ez4u · **#17**

People interested in this topic should have a look at lightvector's work on github. Personally I may understand about 15% of it. But he has done a lot on ladders and other distant relationships. Quite interesting!

Bill Spight · **#18**

dfan wrote:

Bill Spight wrote:

John Fairbairn wrote:

It seems practicable to replicate that quite efficiently in a computer, and since it would only ever be triggered in an atari situation, it would be triggered relatively rarely and so would not significantly slow the machine down.

So one would think.

Yet even when the programmer is a strong player or the programming team includes a strong player such modules have not been implemented in the top level bots. {shrug}

AlphaGo had two input features that checked for the presence of ladders in this way.

I stand corrected.

chut · **#19**

Life/death of a group and ladder are related problems. We still see mightily strong bots fall flat on situations that are obvious to human. These are monkey wrench to the tree search because the minimax evaluation of a whole branch can become invalid if we push the search a bit deeper. That means there is an inherent uncertainty to the evaluated win rate of a branch.

I am wondering, what guides the tree search now? What decides which branch to go deep first? Human are guided by a meta level knowledge of such situations, and human professionals do read out ladder to great depth and detail. There is a famous game of Lee Sedol where he played out a failed ladder to his advantage (https://senseis.xmp.net/?LeeSedolHongChangSikLadderGame).

It does seem to me that we can't escape having meta level tree search guidance system. That is probably worthy of a deep learning project.

Tryss · **#20**

Quote:

I am wondering, what guides the tree search now? What decides which branch to go deep first?

A policy network. A bot like AlphaZero or LeelaZero has a neural network that gives his candidate moves, the choice of which one to explore depend on how much the policy network like it, the previous estimated winrate of the branch, and the number of previous explorations (to encourage exploration, there is a bonus to positions less explored).

Deep learning and ladder blindness

Who is online