Life In 19x19 http://www.lifein19x19.com/ |
|
How LZ reads out ladders http://www.lifein19x19.com/viewtopic.php?f=18&t=17298 |
Page 1 of 1 |
Author: | xela [ Sun Mar 01, 2020 12:59 am ] | ||
Post subject: | How LZ reads out ladders | ||
Carrying on from the other thread, now I'm using my modified LZ version to explore how LZ "understands" ladders. It's really interesting to look at older and newer LZ nets and see how they treat the same position. In theory, there should be three things going on:
I'd expect that smaller networks (5 or 6 blocks) will need to play out pretty much the whole ladder, because they can't "see all the way across the board", while a 20 or 40 block network should be able to "take in the position at a glance" and understand the ladder status without playing out the moves. So, on to some tests. Below are some taisha positions where both sides have made mistakes, and now white has the chance to start a ladder. I want to look at four scenarios:
Test position 1A (above): white must play at a, any other move is a mistake. Test position 1B (above): after white a, black should tenuki -- there are several possible moves, for example c, d, e, but b would be a bad mistake. Test position 2A: white a is a mistake. Any of the points marked c are not too bad, although d is probably white's best option. Test position 2B: after white a, black must play b. I tested with seven different networks (28 permutations of test position + network), with 2,000 playouts each time. The networks were number 45 (5 blocks), 57 (also 5 blocks), 91 (6 blocks), 116 (10 blocks), 157 (15 blocks), 173 (20 blocks) and 258 (40 blocks). Summary of results:
Overall, there are three things that really caught my attention. First is the interplay between network (policy and eval) and playouts for the medium sized nets. They need to read a few steps to evaluate the ladder correctly, but they don't need to read right to the end of the ladder. Of course, "LZ-157 can understand a ladder in 20 playouts" doesn't mean that it never makes ladder mistakes. If the ladder position is a few moves deep in a variation, then that specific position may not get enough playouts, so the ladder can still be "over the horizon" leading to a mistake. Second is the fact that a 20-block network still isn't quite big enough to make an accurate assessment of the full board. I guess the first five or ten blocks are about understanding basic shapes, then the later blocks start to take in bigger chunks. Third, it looks as though the 40-block network really can see the ladder status without having to read it out at all, at least for this position. We'd need to test on a bunch more positions to be sure. But I recall many of the "LZ can't do ladders" complaints happening around the time of moving from 15 to 20 blocks. Is it possible that the problem is solved simply by moving to a bigger network? Finally, the attached GTP log includes all 28 tests, for people who like going through lots of data, showing the number of playouts, policy value, winrate and principal variations for each move. In many cases, reading out the ladder isn't the PV, it's buried amongst the other variations. Over the next few posts I'll show you a few examples.
|
Author: | Uberdude [ Sun Mar 01, 2020 3:15 am ] |
Post subject: | Re: How LZ reads out ladders |
Interesting xela, thanks. Back around LZ 157 days I remember noticing that LZ would tend to assume ladders work so would make mistakes in positions they didn't, whilst Elf would tend to assume ladders don't work, so make mistakes in positions they did. Also, to test that LZ 40 block can really "read the ladder at a glance" rather than "ladder from lower left to top right is good for black with a black stone in top right" being baked into the policy I would suggest moving the stone(s) left by one space at a time until they stop being ladder breakers and see if LZ actually notices and how sharply. |
Author: | xela [ Sun Mar 01, 2020 3:48 am ] |
Post subject: | Re: How LZ reads out ladders |
Uberdude wrote: Also, to test that LZ 40 block can really "read the ladder at a glance" rather than "ladder from lower left to top right is good for black with a black stone in top right" being baked into the policy I would suggest moving the stone(s) left by one space at a time until they stop being ladder breakers and see if LZ actually notices and how sharply. Good idea! First I'll post the things I've already looked at (finding time to write things down is a bit of a challenge right now, but I'm gradually getting there). Then I'll try this. And I haven't forgotten the other ladder game you suggested... |
Author: | ez4u [ Sun Mar 01, 2020 5:29 am ] |
Post subject: | Re: How LZ reads out ladders |
xela wrote: Carrying on from the other thread, now I'm using my modified LZ version to explore how LZ "understands" ladders. It's really interesting to look at older and newer LZ nets and see how they treat the same position. In theory, there should be three things going on:
I'd expect that smaller networks (5 or 6 blocks) will need to play out pretty much the whole ladder, because they can't "see all the way across the board", while a 20 or 40 block network should be able to "take in the position at a glance" and understand the ladder status without playing out the moves. So, on to some tests. Below are some taisha positions where both sides have made mistakes, and now white has the chance to start a ladder. I want to look at four scenarios:
Test position 1A (above): white must play at a, any other move is a mistake. Test position 1B (above): after white a, black should tenuki -- there are several possible moves, for example c, d, e, but b would be a bad mistake. ... With the current 266 net, White's blue is the immediately the ladder play at ![]() ![]() ![]() Below is a screenshot after 1.3 million playouts. Blue is ![]() ![]() Checking the bottom left after 1.3 million playouts for 3, we can see that only three PO's test Black pulling out the laddered stone. But when we add ![]() |
Author: | ez4u [ Sun Mar 01, 2020 5:58 am ] |
Post subject: | Re: How LZ reads out ladders |
Following up on my previous post. Here is how Katago 1.3.3 the b15 net handled the same situation in 252 playouts! ![]() What to do after ![]() What it calculated for ![]() |
Author: | xela [ Sun Mar 01, 2020 6:13 am ] | ||||
Post subject: | Re: How LZ reads out ladders | ||||
Edit: Dave posted while I was drafting this. Thanks for taking such a close look! Yes, the bigger nets are frighteningly efficient in rejecting some moves, for better or for worse. -------- Let's look more closely at test 1A, white to play and capture a stone in a ladder. For all networks a was the first choice move (highest policy value), but this choice gets more clear-cut for bigger networks: Code: network G4 policy 45 28% 57 54% 91 53% 116 50% 157 88% 173 78% 258 97% Network 45 has trouble reading out the ladder. It comes up with this fantastic sequence as the principal variation, allowing the F4 stone to escape in exchange for the corner: According to LZ-45, this has a winrate of 43% for white, which is better than the 39% it can get from having a go at playing out the ladder and messing up: Actually, it does try 5 at a first, but then comes back and looks at this variation too. With this and similar distractions along the way, the 219 playouts given to G4 aren't enough to read the ladder to the end. To start with, this position -- has a neural net evaluation of 37% in white's favour. This number goes up with more playouts, but doesn't ever go up enough to beat the principal variation above. Playout number 236 gets to this position: but then LZ doesn't give any more playouts to the ladder, so the positive evaluation doesn't get to filter back up the tree and bump up G4 significantly. (I decided to let it run for a million playouts -- with such a small network, this still takes less than ten minutes -- and that still wasn't enough to put G4 in first place, although it closed the gap a little. It kept exploring the C3 variation, with 967,909 playouts given to that move, leading to a winrate of 37.5%, and G4 got a mere 13,831 playouts, winrate 35.4%.) Networks 57 and 91 behave pretty similarly: they read the ladder to the end with fewer "distractions" along the way, so there are enough playouts for the evaluations to filter up, and G4 turns out to be the best move. But actually the playouts don't matter that much: even on playout number 1, G4 is evaluated as better than any other move, so it would get the right answer on smaller numbers of playouts. Here's something interesting with LZ-91: Both ![]() ![]() Another interesting moment: when LZ-91 reads nearly to the end of the ladder: after playing ![]() Here's the summary of all the variations that LZ-91 explores: LZ-116 and LZ-157 only give a few playouts to the ladder, because for black to pull out of atari is a low policy option (around 1%), far from the first choice to be explored. LZ-157 gets this far on playout number 1270, but doesn't come back to this position: Here's the summary of variations for LZ-157: Finally, networks 173 and 258 don't read any ladder variations at all. Here's LZ-258:
|
Author: | xela [ Sun Mar 01, 2020 6:17 am ] |
Post subject: | Re: How LZ reads out ladders |
Summary: so far it looks as though the policy net overrides everything else. If there's a blind spot in the policy, then in theory a large enough number of playouts together with accurate evaluations should fix it, but it really does take a massive number of playouts. To be continued... |
Author: | xela [ Tue Mar 03, 2020 5:00 pm ] | ||||
Post subject: | Re: How LZ reads out ladders | ||||
Now for test 1B, black to play and figure out that trying to escape from the ladder is a bad idea. For small networks, F5 is the "first instinct" move, and they have to read a little bit to figure out that it doesn't work. As per the previous post, LZ-45 just didn't get it. Other networks up to LZ-116 read out the full ladder. LZ-157 and 173 read out a few steps but are able to evaluate the position at an earlier stage. LZ-258 literally doesn't look at pulling out (at least, not within the first 2,000 playouts). Code: network F5 policy F5 playouts best move best policy 45 60% 1977 F5 57 31% 66 Q16 2% 91 59% 189 R15 13% 116 19% 64 R15 8% 157 1% 14 O3 20% 173 5% 29 O3 52% 258 - 0 R15 46% Examples of the variations explored, if you're interested. LZ-116 reading out the whole ladder LZ-173 reading out part of the ladder LZ-258 operating on a higher level Something interesting is that even when black pulling out of atari the first time is a low policy move, continuing the ladder in the variations is still often the top policy move, and the only move looked at. I guess this reflects a bias in the self-play games: once LZ has reached a certain strength, it generally won't start bad ladders, meaning that if you have pulled out of atari once then continuing the ladder is probably the right thing to do?
|
Author: | Bill Spight [ Tue Mar 03, 2020 7:00 pm ] |
Post subject: | Re: How LZ reads out ladders |
FWIW, I found two examples of this position on Waltheri, 1941-00-00e, Sekiyama Riichi, 6 dan (W) vs. Nabeshima Ichiro, 4 dan, and 2001-09-16g, Zhu Songli, 5 dan (W) vs. Zhou Heyang, 9 dan. Play continued as above in both games. In the Elf commentaries, for ![]() ![]() |
Author: | xela [ Tue Mar 03, 2020 7:32 pm ] |
Post subject: | Re: How LZ reads out ladders |
Bill Spight wrote: FWIW, I found two examples of this position on Waltheri, 1941-00-00e, Sekiyama Riichi, 6 dan (W) vs. Nabeshima Ichiro, 4 dan, and 2001-09-16g, Zhu Songli, 5 dan (W) vs. Zhou Heyang, 9 dan. Nice! For "research purposes", I've been adding a white stone at a (and a corresponding black stone at D2). because otherwise LZ keeps thinking about playing a as a forcing move in the middle of reading out the ladder. It doesn't seem to change the overall conclusions, but it makes the process of tracing the variations much messier. |
Author: | Bill Spight [ Wed Mar 04, 2020 4:16 pm ] |
Post subject: | Re: How LZ reads out ladders |
Your mission, Mr. Phelps, should you choose to accept it. ![]() From Common Sense in Go (Kubomatsu, 1929, in Japanese). |
Author: | xela [ Thu Mar 05, 2020 4:58 am ] | ||||
Post subject: | Re: How LZ reads out ladders | ||||
Now on to test 2A, where the ladder is broken. Remember this is the one where the 10, 15 and 20 block networks have a blind spot. If the ladder doesn't work, then a is a mistake. There's a few reasonable alternatives, with b probably being the best move. LZ-258 finds b, but the weaker networks prefer c or d. Code: network G4 policy G4 playouts best move best policy best playouts 45 29% 239 C3 27% 1699 57 57% 494 C3 3% 1242 91 54% 494 C6 13% 894 116 47% 1852 G4 157 93% 1983 G4 173 82% 1943 G4 258 3% 4 H3 28% 1780 We see a similar pattern to before, where the weaker networks need to read out the ladder (it takes a little over 500 playouts before LZ-57 recognises C3 as being better than G4), but LZ-258 can see the status "at a glance". But this time there's a blip in the middle, where LZ-116, 157, 173 give a lot of playouts to G4 without managing to notice that the ladder is broken! What's going on here? For LZ-116, here's the principal variation: So LZ-116's opinion is that after ![]() It does actually read the ladder as a sub-variation: Here, ![]() At this point, LZ-116 still can't see that the ladder is broken, evaluates the position as strongly in white's favour, and stops reading because clearly black isn't going to persist with such a hopeless variation! My interpretation: LZ-116 is stuck at a local maximum: the policy is sharp enough to quickly eliminate unpromising moves, but not sophisticated enough to notice the ladder-breaker. At the risk of personifying these networks too much, we could say that LZ-116 is overconfident, leaping to conclusions and lacking LZ-57's patience in reading out the whole ladder. LZ-258 is also very condfident, but now the confidence is backed up by enough experience. Detailed traces if you care to explore further: LZ-57 LZ-116 LZ-258
|
Page 1 of 1 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |