AI joseki preferences

John Fairbairn · Post by **John Fairbairn** » Sat Sep 16, 2017 1:38 am

Picking up on a comment by Yoda on a Master game, he says the joseki in the upper right is now hardly played and the preference among pros is now for the one in the lower left (and was chosen by Master). A database search partly confirms that. Apart from a few sporadic examples in the early 1990s, this has been gaining ground since 2000. But the upper-right version is hardly dead, except in Japan. Perhaps Yoda has been unaware of the examples from Korea and China. Still, there is something in his intuition and decision to comment that seems prone to favour the lower-left version.

But if there is something in it, it is surely something subtle. Unless it's pure coincidence, how did Master/AlphaGo and, as it happens, FineArt come to the apparent conclusion that the new version is better? It seems unlikely it can simply be down to the evaluation function as it usually occurs far too early in the game (move 12 in this case). It can't be through learning from existing pro games as there are too few, and the older version would still dominate. If we assume it's from playing lots and lots of self-play games, we'd also have to assume that it tried both variations of a not very common joseki for a massive number of times, which is obviously possible but is it likely? Or do we infer that there has been human input of the latest josekis?

Just to deepen the mystery, human pros have played the new version against DeepZen quite a few times but DeepZen doesn't seem to choose it for itself (maybe because it keeps winning against the pros who play it?)

dfan · Post by **dfan** » Sat Sep 16, 2017 5:55 am

AlphaGo's value and policy networks are updated in response to every training position, and affect every position. It doesn't need to see an exact joseki to have its "opinion" of that joseki affected. (As a vastly oversimplified example, it could have learned to prefer hanes over wedges in general from looking at lots of other positions.) Given AlphaGo's design, I would be surprised if it had been explicitly fed particular josekis in an attempt to teach it something specific.

moha · Post by **moha** » Sat Sep 16, 2017 5:56 am

John Fairbairn wrote:how did Master/AlphaGo and, as it happens, FineArt come to the apparent conclusion that the new version is better? It seems unlikely it can simply be down to the evaluation function as it usually occurs far too early in the game (move 12 in this case). It can't be through learning from existing pro games as there are too few

I don't see why you rule out the easiest explanation: AG comes across this position, it reads ahead in both lines and sees one is slightly better (leads to better positions).

Bill Spight · Post by **Bill Spight** » Sat Sep 16, 2017 9:59 am

moha wrote:
John Fairbairn wrote:how did Master/AlphaGo and, as it happens, FineArt come to the apparent conclusion that the new version is better? It seems unlikely it can simply be down to the evaluation function as it usually occurs far too early in the game (move 12 in this case). It can't be through learning from existing pro games as there are too few
I don't see why you rule out the easiest explanation: AG comes across this position, it reads ahead in both lines and sees one is slightly better (leads to better positions).

In which case it relies upon its evaluation function.

alphaville · Post by **alphaville** » Sat Sep 16, 2017 10:26 am

moha wrote:
John Fairbairn wrote:how did Master/AlphaGo and, as it happens, FineArt come to the apparent conclusion that the new version is better? It seems unlikely it can simply be down to the evaluation function as it usually occurs far too early in the game (move 12 in this case). It can't be through learning from existing pro games as there are too few
I don't see why you rule out the easiest explanation: AG comes across this position, it reads ahead in both lines and sees one is slightly better (leads to better positions).

I agree, the simple explanations are the most likely to be true.

AlphaGo always optimizes for winning the game, so the best way for us to learn from it is to try to understand why one variation is more likely to win the game compared to the other, that's all.

Bill Spight · Post by **Bill Spight** » Sat Sep 16, 2017 10:42 am

alphaville wrote:
moha wrote:
John Fairbairn wrote:how did Master/AlphaGo and, as it happens, FineArt come to the apparent conclusion that the new version is better? It seems unlikely it can simply be down to the evaluation function as it usually occurs far too early in the game (move 12 in this case). It can't be through learning from existing pro games as there are too few
I don't see why you rule out the easiest explanation: AG comes across this position, it reads ahead in both lines and sees one is slightly better (leads to better positions).
I agree, the simple explanations are the most likely to be true.

AlphaGo always optimizes for winning the game, so the best way for us to learn from it is to try to understand why one variation is more likely to win the game compared to the other, that's all.

But we cannot take AlphaGo's evaluations as gospel, particularly so early in the game.

moha · Post by **moha** » Sat Sep 16, 2017 11:14 am

Bill Spight wrote:
moha wrote:
John Fairbairn wrote:how did Master/AlphaGo and, as it happens, FineArt come to the apparent conclusion that the new version is better? It seems unlikely it can simply be down to the evaluation function as it usually occurs far too early in the game (move 12 in this case). It can't be through learning from existing pro games as there are too few
I don't see why you rule out the easiest explanation: AG comes across this position, it reads ahead in both lines and sees one is slightly better (leads to better positions).
In which case it relies upon its evaluation function.

Sure, but I felt John mentioned the evaluation function as something that already has some kind of (NN) bias for one line in the starting position, without deep search.

Which it still actually may, btw, EXACTLY because it is very early in the game. Isolated corner josekis may be somewhat well explored by selfplay (top of tree! but suboptimal globally) and thus the NN may already have some preference to AG's chosen line weighted in it. (This also slightly weakens my argument in the earlier search <> NN knowledge discussion, about Lee SeDol match game 3 move 16+, though I believe it still holds.)

But since even the slightest change in the global position can make a big difference for the best line AG choses, I'm pretty sure it still relies on its deep search for each time it encounters this starting position. (In the recent WeiqiTV review on the AGA channel, Fan Hui with a strong chinese pro also points out how deep and accurate AG's reading is - worth a look.)

John Fairbairn · Post by **John Fairbairn** » Sat Sep 16, 2017 11:40 am

Sure, but I felt John mentioned the evaluation function as something that already has some kind of (NN) bias for one line in the starting position, without deep search.

Hardly likely since I don't even know what this means. I do know quite a few languages, but not this one. Sorry.

Making a rather wide-of-the-mark assumption about what another person was thinking (something we all do, of course) may be instructively analogous to the way people are assuming AG is "thinking." At any rate, I'm in Bill's camp of sceptics.

In any case, there is nothing absolute about a deep search, surely, as chess computers have shown. One program can impress us with a 50-move search that "proves" A won. A deeper search may show there was a resource and B could win. The next level of computer may suggest it was a draw.

But if deep (i.e. non-human) search really is the main criterion for AG's judgements, it would seem that it's not likely to be of any help to humans (even pros), except perhaps to prepare a few opening traps. Again, I think this is the thrust of the chess experience.

As to simple explanations, Occam's Razor could do its work by assuming a human fed in a joseki dictionary. What's simper than that? They have done that in chess, too.

Finally, another plug for the humans: it seems they had already figured out the new version of the joseki was better with AG's help.

alphaville · Post by **alphaville** » Sat Sep 16, 2017 12:20 pm

John Fairbairn wrote:As to simple explanations, Occam's Razor could do its work by assuming a human fed in a joseki dictionary. What's simper than that? They have done that in chess, too.

To me, a simpler explanation is that AlphaGo just discovered this move by itself, during the self-play training stage. The reason I think this is simpler: any manual intervention (such as trying to bias it towards a particular joseki vs another) would risk to degrade performance.

On the other hand: it is possible that it learned about this joseki automatically from the earlier human games database, like it learned about all other patterns that came up in the human games it had in that database. This is another simple explanation.
But I don't think that is what you meant by "human fed in a joseki dictionary"?

moha · Post by **moha** » Sat Sep 16, 2017 12:32 pm

John Fairbairn wrote:In any case, there is nothing absolute about a deep search, surely, as chess computers have shown. One program can impress us with a 50-move search that "proves" A won. A deeper search may show there was a resource and B could win. The next level of computer may suggest it was a draw.

Sure, even the best (practical) searches can make mistakes (since complete search, exploring all moves is not feasible). I also think the chess analogy is stronger now than before. In the past go and chess was seen completely different, that needs very different AI approaches (with chess having it easy by brute search).

Now, I'm not sure about the current state of art in computer chess, but a decade or so ago there already was some initial attempts at forward-pruning (ways to filter out some moves, to search the better ones only - null moves etc). I would bet blindly that todays programs can only stay in competition by doing this at a much agressive level (deeper searches at the costs of rare oversights - somewhat more human-like). For go, this was always necessary (since the number of candidates is higher). But now the situation is similar in both games: a strong program needs strong move selection (with NN in go), good search techniques (still in infancy in go), and very deep searches.

But if deep (i.e. non-human) search really is the main criterion for AG's judgements, it would seem that it's not likely to be of any help to humans (even pros), except perhaps to prepare a few opening traps. Again, I think this is the thrust of the chess experience.

On this I somewhat agree, but seeing a stronger player's moves can still be useful, even if I cannot imitate the ways it calculated them. And that top level go is mostly about deeper searches was already noticeable among 9Ps, even before AG.

As to simple explanations, Occam's Razor could do its work by assuming a human fed in a joseki dictionary. What's simper than that?

I'm afraid you are using the razor the wrong way.

Since AG already needs it's deep search for all other cases, it seems even simpler not to assume any more addition - it plays this corner by the same method as any other position.

dfan · Post by **dfan** » Sat Sep 16, 2017 2:41 pm

John Fairbairn wrote:But if deep (i.e. non-human) search really is the main criterion for AG's judgements, it would seem that it's not likely to be of any help to humans (even pros), except perhaps to prepare a few opening traps. Again, I think this is the thrust of the chess experience.

One interesting thing about AlphaGo is that its policy network does an excellent job of predicting the result of its deep search, and the policy network is effectively looking at patterns rather than doing search. So that gives me hope that there is something there that is useful in training human intuition.

As to simple explanations, Occam's Razor could do its work by assuming a human fed in a joseki dictionary. What's simpler than that? They have done that in chess, too.

In this case, there is a public paper written by the AlphaGo team as well as many talks and explanations by them, so Occam's Razor is that the program works the way they say it does, which does not include being fed a joseki dictionary. Using a joseki dictionary is a perfectly good idea, and I'm sure other Go programs have done it, but it does not fit with the AlphaGo approach.

moha · Post by **moha** » Sat Sep 16, 2017 5:00 pm

dfan wrote:One interesting thing about AlphaGo is that its policy network does an excellent job of predicting the result of its deep search, and the policy network is effectively looking at patterns rather than doing search. So that gives me hope that there is something there that is useful in training human intuition.

This was discussed before, and I believe even AG is much weaker with a single network lookup than with full search (IIRC their paper shown an ELO rating for policy net with no search that was half of the complete version). The kind of accuracy that is its forte is simply very hard to achieve without search.

But I agree that humans (esp. top pros) can use this to get better. Seeing unusual moves played by strong computers can fill in some gaps in their own NNs - which may be the only way for 9Ps to improve since their reading power can hardly be increased. So they will be less likely to overlook useful shoulder hits in the future.

Kirby · Post by **Kirby** » Sat Sep 16, 2017 6:38 pm

I'm reminded of this joseki pattern:

Click Here To Show Diagram Code: [go]$$Bcm1 $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . 3 . . 2 , 1 . . | $$ | . . . . . . . . . . . . . 6 5 . . . . | $$ | . . . . . . . . . . . . . . 7 . . . . | $$ | . . . . . . . . . . . . . . . 4 . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ----------------------------------------[/go]

I heard before that this move is old-fashioned:

Click Here To Show Diagram Code: [go]$$Bcm1 $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . 3 . . 2 8 1 . . | $$ | . . . . . . . . . . . . . 6 5 . . . . | $$ | . . . . . . . . . . . . . . 7 . . . . | $$ | . . . . . . . . . . . . . . . 4 . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ----------------------------------------[/go]

and that it's better to do this one:

Click Here To Show Diagram Code: [go]$$Bcm1 $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . 8 . . | $$ | . . . , . . . . . , . 3 . . 2 , 1 . . | $$ | . . . . . . . . . . . . . 6 5 . . . . | $$ | . . . . . . . . . . . . . . 7 . . . . | $$ | . . . . . . . . . . . . . . . 4 . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ----------------------------------------[/go]

Maybe it's true, and maybe it's not. I don't even know. I could look up in some pro game database if it's true... But is it really going to effect my win rate? I don't think so. So I keep in mind some of the new patterns I see when you jump into the 3-3 point. But I still play the other way sometimes, too.

I think that just KNOWING that the other option exists is good enough. In a board position where I want a shape like the one that happens when you dive into 3-3, I'll go for that option. If I like the "old pattern" better, I'll go for that one.

I think so-called preferences by AlphaGo can be treated in the same way. You see AlphaGo playing a certain way a lot, and you take note - see what kind of shape happens, etc. Then remember it. But it doesn't mean you have to change your way of playing.

Sometimes you can take AlphaGo's preference, sometimes you can go the other way.

dfan · Post by **dfan** » Sun Sep 17, 2017 6:40 am

moha wrote:
dfan wrote:One interesting thing about AlphaGo is that its policy network does an excellent job of predicting the result of its deep search, and the policy network is effectively looking at patterns rather than doing search. So that gives me hope that there is something there that is useful in training human intuition.
This was discussed before, and I believe even AG is much weaker with a single network lookup than with full search (IIRC their paper shown an ELO rating for policy net with no search that was half of the complete version). The kind of accuracy that is its forte is simply very hard to achieve without search.

Yeah, maybe I should have qualified "excellent job", but I don't know what the actual numbers are (e.g., how often the top hit of the policy network ends up being chosen). I didn't mean to imply that search isn't necessary, just that the policy network itself already has a lot of go wisdom.

pwaldron · Post by **pwaldron** » Mon Sep 18, 2017 12:29 pm

John Fairbairn wrote:But if there is something in it, it is surely something subtle. Unless it's pure coincidence, how did Master/AlphaGo and, as it happens, FineArt come to the apparent conclusion that the new version is better? It seems unlikely it can simply be down to the evaluation function as it usually occurs far too early in the game (move 12 in this case).

I wouldn't dismiss the possibility out of hand. From Redmond's commentaries of AlphaGo's games online against pros it seems to be capable of getting ahead after even 30 moves.

I was curious about the idea of Yoda not being aware of trends in China and Korea. The go world isn't that big. I would have thought a top-flight pro in any country would be keeping abreast of the trends in all three major countries.

Life In 19x19

AI joseki preferences

AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences

Re: AI joseki preferences