reviewing SL articles using LZ and criticism

Bantari · Post by **Bantari** » Tue Dec 03, 2019 1:49 pm

Kirby wrote:If the old idea seems totally wrong (e.g. it's always good to invade 3-3 early), then maybe it's worth getting rid of the page(?).

I would be careful here. My idea is to absolutely keep the page, but maybe provide a disclaimer with a link to the page that refutes the idea.

What if "LZ version 100.0" - or some other AI - will decide that the idea is good after all, and it will beat all the other AIs with that?

If nothing else, it will be interesting for historical reasons to see what were the outdated ideas, and what it took to abandon them. And what replaced them and why. This by itself can be a good learning tool... Like going through phases.

Bill Spight · Post by **Bill Spight** » Tue Dec 03, 2019 2:19 pm

Knotwilg wrote:Even more, {Kata} gives the ladder a prominent place in its evaluations and instructively so. Who among us would consider the ladder breaker - moyo builder?

Go Seigen, fer shure.

He showed such light reductions in his 21st century go books, and all pros are aware of ladders. Probably O Meien, Yoda, and others, as well.

Knotwilg · Post by **Knotwilg** » Tue Dec 03, 2019 2:21 pm

Bantari wrote:<rant>
I am always suspicious when confronted with this kinds of arguments. LZ says something is 10% better... so what? What does it mean? WHY is it 10% better, what is the IDEA behind making it so? Is it only better because LZ can prune the tree (or whatever LZ does) in a way humans will never be able to do? Or. is there an actual reason?
</rant>

Indeed LZ won't tell us. But as I've often said, pros don't talk a lot either during analysis. They show sequences to each other.
And while LZ doesn't tell us, we can still try understand why. If you look at the original page, you'll see that I'm trying to articulate what LZ shows us. This is of course my fallible low dan interpretation, but it's unfair to state that all we do in there is dismissing pro analyses with mere LZ percentages.

Your other argument, that maybe one day a superbot will tell us the pro was right after all. OK, that, or the superbot tells us 1-1 is best. We can't work with these hypotheses. We must trust in today's strongest evaluators and try learning from them, whether by developing our intuition and seeing moves in certain situations, or by abstracting and articulating concepts.

gowan · Post by **gowan** » Tue Dec 03, 2019 3:01 pm

When bot evaluations are based in high numbers of playouts presumably we are looking at results of "reading" deeper than humans can do. Therefore, it seems to me that we humans might not understand why the move is best or even how to handle the ramifications.

Bill Spight · Post by **Bill Spight** » Tue Dec 03, 2019 4:36 pm

gowan wrote:When bot evaluations are based in high numbers of playouts presumably we are looking at results of "reading" deeper than humans can do. Therefore, it seems to me that we humans might not understand why the move is best or even how to handle the ramifications.

Except that today's bots read the whole board with a variant of breadth first search. That's why humans, who tend to use depth first search, can still read more deeply than they can in certain situations. Like reading out a ladder better than LZ can.

I am really quite optimistic about human understanding. Look how easy it has been to play the early 3-3 invasion of the 4-4 and to realize that you want to avoid the second line hane-tsugi in most cases. Why do the bots love the 3-3 invasion, but not so much the 3-3 opening? Quien sabe? But I don't have to know to play better.

Kirby · Post by **Kirby** » Tue Dec 03, 2019 5:19 pm

In any case, our aim should be to extract a principle. Without a principle, we don’t need an SL page - just look at LZ for analysis.

lightvector · Post by **lightvector** » Tue Dec 03, 2019 6:03 pm

Bill Spight wrote:I am really quite optimistic about human understanding.

Me too. Even in John Fairbairn's example... or rather, suppose we found a similar case where mostly all the bots agreed, yet was similarly incomprehensible at first sight for humans. What do you do?

You play with the bot in Lizzie or a similar GUI, and try different openings, vary the placement of certain stones, but repeat the same shape locally, and see when wants the that variation and when not. You play out with the bot and see how it responds to attempts to refute it (using the bot itself to aid you so that you don't just get trounced by superhuman-level play unrelated to that situation). You discuss with other people about it in a serious effort to understand. You over time collect a dozen more distinct examples with the same kind of move suggested.

And maybe the move stays incomprehensible still, maybe even most of the time. But I suspect a nontrivial fraction of time, the seemingly "worse than useless" play by the bot will actually become understandable. You may learn that "ah, right, this is indeed a probe, and it looks like it takes a local loss, but if in this case we can expect to get sente due to XYZ, so we can come back and it is not a loss" or "ah, it looks like this or that formation is strong, but much later in the game, bot suggests this move which suddenly reveals a defect that we had not anticipated", or "ah, it looks like this is a good trade for us, but if we play out dozen more moves, it's actually kind of hard to see where we have enough territory now, we have to re-think our judgment".

The above takes a lot of work, so maybe it's not practical in most cases. But I don't think it's out of reach of human ability fundamentally.

Or... maybe you demonstrate that the bots are mistaken, as they often still are.

As far as what to do on an SL page in the meantime - yeah, in that case some practical considerations are needed. I also would lean towards not just deleting pages or "overturning things", but treating bot opinions just like any other part of discussion. Even if you don't fully trust them or think that you would be able to follow up their moves well enough to be useful, what they say is still meaningful data.

Javaness2 · Post by **Javaness2** » Wed Dec 04, 2019 3:08 am

SL provides a lot of functions. For some it is an encyclopedia, for some it is a tutorial, for others it is a filthy mess.
I think humans need their heuristics for narrowing down the choice of their moves, it's the reason that we have pages like Take the last big point in fuseki. When I read analysis on SL, I've never assumed that it is 100% accurate, but I would tend to trust most of the arguments which are presented to me. If the argument is presented really unclearly, I probably wouldn't listen to it. Presenting something like LZ says A is 2% better than B in this position would probably cause me to engage ignore_mode. SL has to retain a human presentation, but that presentation can certainly be enhanced by LZ and other AI.

Knotwilg · Post by **Knotwilg** » Wed Dec 04, 2019 3:09 am

lightvector wrote: Even if you don't fully trust them or think that you would be able to follow up their moves well enough to be useful, what they say is still meaningful data.

Not that this is what you imply but in general I think the power of bot analysis is underestimated. With Bill's series as one of the best learning experiences I have ever had in Go, I would say we are way beyond the state where bots make incomprehensible or even weird moves which we can only understand if we find that one tesuji 30 moves deep. That would even be a misunderstanding of how bots work: probabilistic evaluations are essentially not about deep reads but about wide distributions, so they lend themselves to building intuition rather than developing tesuji.

Of course, articulating that intuition is the bigger challenge. We still communicate our ideas through language, rather than through sequences, although I have argued before that at the expert level, at least in my perception, verbal language is abandoned in favor of go language, i.e. pros show sequences to each other without too many spoken words.

Hence, the articulation problem is not exactly a new one. I believe Charles Matthews coined that term long before AI came about. We do have pro wisdom, sometimes by themselves, often by ghost writers and we may expect that quality is better than what we make of AI sequences on an internet forum. Still, further interpretation of that wisdom has proven non-trivial, especially when many of those books have been written in Japanese or other oriental languages. Our endless discussions on thickness are a witness of that.

If traditional human go wisdom scores better in articulation, today's AI do better in evaluation. If we enhance that with our own, be it fallible, articulation, the result in learning may not be worse.

Uberdude · Post by **Uberdude** » Wed Dec 04, 2019 3:10 am

I was not able to reproduce John's observed behaviour of LZ wanting to run out the ladder and expecting black to atari from on top using a recent LZ 247. I was however able to see something similar with the year-and-a-half old LZ 157 that comes with Lizzie 0.6 on low playouts. So I don't see anything mysterious here, just the well known problem of LZ with ladders on low playouts which are lessened in stronger more recent networks and ameliorated by giving more playouts.

To illustrate, here is 157's initial thoughts on ~300 total playouts, it wants to run out, and you can see mainline variation is for black to atari from on top, white takes a stone in gote and black reinforces the lower right and then white cuts and there's a fight. But bear in mind the first move only had 105 playouts so once you get several moves down the variation there are even fewer so these are mostly just playing on policy network instinct rather than much reading.

: jf ladder workshop1.PNG (1.45 MiB) Viewed 11824 times

Let LZ think a bit more (3k playouts) and now run out is not favoured, it wants to 3-3 invade at lower right, and is also looking at the attach I mentioned before as a promising blue circle move (lower playouts than #1 choice at 3-3, but higher winrate).

: jf ladder workshop2.PNG (1.47 MiB) Viewed 11824 times

What does it now think would happen if white runs out? Mouse over that move and we see that with only 209 playouts it now expects black to atari underneath and then white to tenuki and 3-3.

: jf ladder workshop2a.PNG (1.41 MiB) Viewed 11824 times

Does this mean it has realised the ladder works for black? That's easy to test, just play out the escape, atari under, escape, and what does it expect black to do? Answer is atari again, so yes it now has enough playouts to realise the ladder works for black. And we can see with each extra stone white is adding to the captured ladder black's winrate is increasing. This all makes sense.

: jf ladder workshop2b.PNG (1.5 MiB) Viewed 11824 times

Give another thousand or two playouts and now that attachment is the blue move, note that more recent LZ engine as I'm using here no longer always picks the move with highest playouts as the first choice move, it's now a combination of playouts and winrate and this attachment already counts to LZ as the best move despite fewer playouts than 3-3.

: jf ladder workshop3.PNG (1.46 MiB) Viewed 11824 times

Is there a point between my first and second diagram where LZ would want to run out, expects black to atari on top and then white would 3-3 like John saw? Maybe, but my computer is too fast for me to find it easily. But if it does happen, you don't need to just scratch your head and retreat into a fog of confusion. Did LZ think extend once and then tenuki to 3-3 was actually a good exchange/probe of testing if black wants to ladder and the take the 3-3, or was it just LZ not reading enough and deciding "oh well, take the stone in gote is bad, I'd better 3-3 now?" Play it out to test!

Extend once, if black atari on top and the 3-3 then with 15k playouts we see 157 thinks black is at 38.7%.

: jf ladder workshop4.PNG (1.49 MiB) Viewed 11821 times

If directly 3-3, then black is 45%, ie this is better for black than before, so it WAS good for white to run out if black will atari on top. This doesn't seem particularly mysterious to me, it makes sense as a probe so white has, in sente, created the option of saving the 2 stones which is a big move for territory and also leaves the cut afterwards.

: jf ladder workshop5.PNG (1.47 MiB) Viewed 11821 times

But of course black may not atari on top, he could atari under as the ladder works. The downside of this is white gets a ladder breaker later. (Or if black answers it then white can run out the stones, which isn't an immediate disaster for black, but if that's going to happen he probably prefered to atari on top instead). We can ask LZ about this too. It says black is 47.5%, so this is the best for black, i.e. with enough playouts black would have atarid from below for ladder if white pulled out, because it doesn't want white to have the option of taking the 1 stone on edge.

: jf ladder workshop6.PNG (1.48 MiB) Viewed 11821 times

I think the bookish theory would often not like pulling out the stone once and then tenuki, because if black spends the next move in the area he makes a tortoise-shell capture which is better for him than his 1 gote if white didn't pull out. However, to spend a move in that area is slow, so this is a case of thinking about local analysis leading you astray globally. We can see that LZ agrees it's a bad exchange for white if black answers correctly underneath (47% for black > 45), but it's a good exchange if white tricks black into playing the wrong answer of top atari ( 39% for black < 45). And because LZ 157 is bad at ladders at low playouts it falls for the trick initially. LZ 247 is better at ladders so it would not atari on top but atari under (or if it would atari on top it is for a much smaller window of low playouts which I didn't catch it in).

QED

Knotwilg · Post by **Knotwilg** » Wed Dec 04, 2019 3:47 am

Javaness2 wrote: When I read analysis on SL, I've never assumed that it is 100% accurate, but I would tend to trust most of the arguments which are presented to me. (...) Presenting something like LZ says A is 2% better than B in this position would probably cause me to engage ignore_mode.

The state of affairs is rather that some claims about whole board positions are 50% accurate, as is shown through bot review which shows a line that is 10% (not 2%) better than both the "good" and the "bad" diagram which are roughly equal themselves according to the bot.

I am definitely not going to question moves that are in the 2% range of bot analysis, although if a 3-3 invasion is consistently 2% better than an approach by bot standards, it may deserve mentioning.

Bill Spight · Post by **Bill Spight** » Wed Dec 04, 2019 4:10 am

Great post, Uberdude!

I have linked to it from the topic, How to use AI for review.

John Fairbairn · Post by **John Fairbairn** » Wed Dec 04, 2019 4:17 am

I've had some further thoughts, and it seems to me that uberdude's analysis with newer LZ's than mine could be used to show two things:

1. The power of AI analysis is overestimated and the power human analysis is underestimated.

Arguments:

(a) The pro kakari move is not yet the best move but seems to have moved up significantly (it was not even on the radar in my investigation) and to be now well within the margin of error Bill postulates.

(b) The later bots are telling us the earlier bots are WRONG (even if they are both stronger than humans), so whatever "principles" the earlier bots were using are also WRONG according to the later bots - which in turn may be shown in due course also to be WRONG. Wrong does not mean worse than humans, but still does not mean RIGHT. On top of that, this is not some slow-moving evolution like the demolition of the flat-earth theory. Changes are happening almost daily depending on which bot or which version of a bot you use. It seems almost pointless trying to make sense of such rapid changes. Just chasing shadows. Shin Fuseki likewise showed that, and that was still on a much slower timescale.

2. The difference between bots and human pros is in danger of being much exaggerated.

That follows partly from (a) above, but much of the discussion on AI, here and elsewhere, seems to me to fall into two camps. One is the "AI is the best thing since sliced bread" syndrome, with some members of that set teetering into a cargo cult mindset. The other "whoa, hold your horses" set mostly urges caution, but in some extreme cases that caution can descend into denial and flat-earthism.

Unfortunately, when humans choose to fall into a camp, they often have a tendency to automatically rubbish the other camp's views. That may be happening here.

Yet, looking at the figures, very many pro moves seem to garner approval from bots, and while humans may be struggling to win on 2 stones against them, constant defeats by a soulless machine able to replicate its behaviour every time is not the same as constant defeats by a distractable human. I'd be very happy to say I was only two stones weaker than a human pro, which would make me one of the best amateurs ever. A pro might not be so happy to say he is two stones weaker than a bot, but I think that's because he has skin in the game - otherwise I suspect he'd be quite chuffed.

If we set aside the camp psychology, could we not argue that pros and bots are not really so far apart, and if that's the case, surely the pros' views of the best moves remain valid even for themselves - and a fortiori for amateurs?

Of course, we can try - misleadingly, I think - to switch tack completely and start arguing from the basis that bots sometimes show a human move is not just different but vastly different (e.g. 30% drop in win rate). I have strong reservations about the use of such examples. I am not disputing that such a move is likely to bad. But in go one bad apple does not spoil the whole barrel. If a pro makes one such horrendous move against a bot, of course he will lose. But what about all the other 250 moves where he kept pace with the bot? Until someone shows us that a pro is consistently - almost every move - falling behind significantly in win rate, it is unfair to rubbish him and his opinions because of a mistake.

To me, the interesting area is the one where even pros don't have opinions, or have differing opinions among themselves. I think I was the first to consistently point out, when I started doing my Go Seigen books, just how often and radically pros can differ - ranging from "brilliant" to "awful" for the same move, with some pros not even deeming the same move worthy of a comment.

Pros actually do acknowledge that they often don't know whether a move is good or bad. The uncertainty may be couched in euphemisms ("I would have played A instead" or "B is also possible") but it is still a strong undercurrent. That is precisely why they are studying bot play themselves. They clearly hadn't found a principle they can reliably apply in some aspects of the game before, but they did acknowledge that even before bots came along. Therefore, when they do feel confident enough to express an opinion about a move or a principle, I think we have to pay heed to it as, at the very least, an honest opinion that has worked for themselves. And which, to repeat myself, has worked well enough for them to get them within a couple of stones of the bots.

Knotwilg · Post by **Knotwilg** » Wed Dec 04, 2019 4:51 am

John Fairbairn wrote:I've had some further thoughts, and it seems to me that uberdude's analysis with newer LZ's than mine could be used to show two things:

1. The power of AI analysis is overestimated and the power human analysis is underestimated.
2. The difference between bots and human pros is in danger of being much exaggerated.

I don't want to be in a camp, but if I need to be in one, my camp is "yes, we should try to use AI evaluation to critically assess conventional wisdom, whether it has been spread by pros directly, by their ghost writers, or by amateurs further interpreting it on the Internet"

My original post was not so much "let's replace all pro wisdom with AI wisdom whatever the margin of error" but "I've seen claims on whole board positions on SL, some of which are quoted from pros but others may be assigned to amateurs, which don't pass the test of LZ (>10% difference)".

LZ can misread ladders, more so than KataGo, as I've shown in reply. While that inspires caution, it shouldn't make us despair and say "whatever bots say, future bots will overrule it, so let's stick to conventional wisdom". Sure, 3-3 invasion may be a fad, but so was 4-4 when Go & Kitani came up with it and so was a centre oriented strategy (among amateurs) when Takemiya was in his prime.

Anyway, what I take from this discussion is some caution, especially when
- overruling or criticizing conventional wisdom by actual professionals
- the difference of probabilities is low
and a request to keep traces of the original discussion (which is hard for me who likes synthesis)

Uberdude · Post by **Uberdude** » Wed Dec 04, 2019 5:21 am

John Fairbairn wrote:I've had some further thoughts, and it seems to me that uberdude's analysis with newer LZ's than mine could be used to show two things:

...

(b) The later bots are telling us the earlier bots are WRONG (even if they are both stronger than humans), so whatever "principles" the earlier bots were using are also WRONG according to the later bots - which in turn may be shown in due course also to be WRONG. Wrong does not mean worse than humans, but still does not mean RIGHT.

John, I don't know what version of LZ and how many playouts you used on your example, but I think before we worked out you were using #157 and only hundreds of playouts. So I didn't use a newer bot to show the older bot was wrong, I used the same bot but just gave it long enough to read the ladder (which was <30 seconds on my PC with a GPU, but may be much more if you have a slow computer). Do not assume LZ #157 with 100 playouts is superhuman, it may well be just mid-dan amateur strength overall (opening positional judgement much stronger, but ladder strength much weaker), e.g. here I (4 dan) beat LZ #145 in an even game when it had about 1000 playouts per move with a ladder, but I wasn't getting massacred before that either. If you want to analyse pro games/opinions with LZ 157 I recommend 10k playouts minimum, and more in situations with ladders and semeais.

Life In 19x19

reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism

Re: reviewing SL articles using LZ and criticism