Life In 19x19 http://www.lifein19x19.com/ |
|
LZ's progression http://www.lifein19x19.com/viewtopic.php?f=18&t=15718 |
Page 18 of 21 |
Author: | Uberdude [ Wed Mar 27, 2019 3:00 am ] |
Post subject: | Re: LZ's progression |
A recent LZ match game with semeai between 2 huge dragons with ko libs, I didn't analyse in Lizzie yet, I wonder if it's a liberty of big chains oversight or just a "losing anyway so doesn't matter" situation: Edit: why not connect ko at 235 as then white is just 1 eye and black's middle is alive so no semeai? That's what LZ 205 wants with well under 1600 playouts as in these matches. But at the end 205 wants to capture at a19, doesn't see the huge group has no libs. Elfv2 wants a19 initially, but after a few hundred playouts discovers the big capture and it's #1 by 1k total playouts. |
Author: | nbc44 [ Fri Mar 29, 2019 3:38 pm ] | |||
Post subject: | Re: LZ's progression | |||
Time parity match. LZ0.16 XXX and LZ0.16 Elfv2 2x1080ti, 60s per move. 7). #214 Code: #214 v elfv2 ( 26 games) wins black white #214 11 42.31% 5 41.67% 6 42.86% elfv2 15 57.69% 7 58.33% 8 57.14% 12 46.15% 14 53.85% Long live elfv2 . Quick test #215 l0 v16 vs #215 l0 v16 next. Code: The first net is worse than the second
v16 v v16next ( 106 games) wins black white v16 41 38.68% 18 37.50% 23 39.66% v16next 65 61.32% 30 62.50% 35 60.34% 48 45.28% 58 54.72%
|
Author: | Uberdude [ Sat Mar 30, 2019 7:32 am ] |
Post subject: | Re: LZ's progression |
nbc44: about how many playouts per move are Elf and LZ getting at these time settings on your hardware? I'm guessing at least 60k for LZ and double that for Elf? |
Author: | nbc44 [ Sat Mar 30, 2019 11:55 pm ] | ||
Post subject: | Re: LZ's progression | ||
Uberdude wrote: nbc44: about how many playouts per move are Elf and LZ getting at these time settings on your hardware? I'm guessing at least 60k for LZ and double that for Elf? B/W/B/W...
|
Author: | hoa803 [ Tue Apr 02, 2019 7:58 pm ] |
Post subject: | Re: LZ's progression |
nbc44 wrote: Code: #214 v elfv2 ( 26 games) wins black white #214 11 42.31% 5 41.67% 6 42.86% elfv2 15 57.69% 7 58.33% 8 57.14% 12 46.15% 14 53.85% Long live elfv2 . The thing nobody seems to be talking about in this thread is the confidence interval. The fact that Elf won 57% of the games must be viewed from a statistical point of view. For example, if we use the 57.7% proportion for Elfv2, the 95% percent confidence for 26 samples is 38.7% to 76.7%. Basically that means with 95% confidence we conclude that the actual chance Elfv2 has of beating #214 on a randomly sampled game is within that range. So, Elfv2 could actually be substantially weaker than #214 or substantially stronger. The only way to narrow this down is more samples. If we go to 160 games and Elfv2 ends up with the same overall 57.7% winrate, the interval is now 50% to 65%. In that case we would be reasonably sure that Elfv2 is stronger - only 1 in 40 (2.5%) chance that Elfv2 is weaker than #214. That is, in a nutshell, why we run 400 games for each network match and use 55% as the cutoff. We have 95% confidence that the new network is at least as strong as the previous one, and very likely is stronger. Now, there are speculations about rock, paper, scissors issues going on, but that is a whole different issue. It's still fun to do the matches, though. tl;dr: Even if #XXX wins a 20 or 50 or 100 game match, it doesn't mean we necessarily know it was stronger than its opponent. Because statistics. |
Author: | nbc44 [ Fri Apr 05, 2019 4:11 pm ] |
Post subject: | Re: LZ's progression |
Yes, you are right. All these tests are just rubbish in terms of mathematical statistics. And poor Lee Sedol still has chances to defeat Alfago . And you are right again: hoa803 wrote: It's still fun to do the matches, though.
|
Author: | hoa803 [ Fri Apr 05, 2019 4:55 pm ] |
Post subject: | Re: LZ's progression |
I don't know what they did in the new version of Leela but my Gflops pretty much doubled. I'm looking at an RTX 2060 for gaming and deep learning. Anybody try one and have a benchmark? |
Author: | Vargo [ Sat Apr 06, 2019 1:04 am ] |
Post subject: | Re: LZ's progression |
hoa803 wrote: The thing nobody seems to be talking about in this thread is the confidence interval. nbc44 wrote: All these tests are just rubbish All these 20, 30 ... game matches aren't gospel, obviously. Match parameters vary wildly (different gpus, different time per game, different number of visits, usage of -m, -r, etc. etc.)No one thinks #XXX is definitively stronger than elfv2 just because XXX won a single 20 game match by 11-9 (for example) But all these matches are really fun to run, and I think, taken as a whole, they can give an idea of the strength of the different networks. They're not gospel, but they're not rubbish either, even with so few as 20 games. For example 20 game match XXX v. YYY , result 15-5 If XXX and YYY were the same strength, there's only 2% chance for XXX to get at least 15 wins. It's not unreasonable to think that XXX is stronger, (particularly if there are several such matches going the same way). |
Author: | hoa803 [ Sat Apr 06, 2019 7:25 pm ] |
Post subject: | Re: LZ's progression |
Vargo wrote: But all these matches are really fun to run, and I think, taken as a whole, they can give an idea of the strength of the different networks. They're not gospel, but they're not rubbish either, even with so few as 20 games. For example 20 game match XXX v. YYY , result 15-5 If XXX and YYY were the same strength, there's only 2% chance for XXX to get at least 15 wins. It's not unreasonable to think that XXX is stronger, (particularly if there are several such matches going the same way). Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece. Seems like whatever blip caused some of the LZ nets to be significantly weaker than Elf has gone away. I wish I had better hardware to run out more visits and/or more games, but if I did that I'd have less statistical significance. Still, the 0.17 version of Leela gets ~2600 Gflops on my gpu, which results in a decent number of visits per move at that time control. I will say that one thing I disagree with is adding any of the self play randomness params to matches that ostensibly compare engine strength. I feel like the main value there is the games are perhaps more interesting to watch. However, I think any of the programmers on Github would agree it shouldn't be used in a "match" situation. Although, I haven't seen anybody discuss such a thing outside of training, so who knows. |
Author: | splee99 [ Sun Apr 07, 2019 11:34 am ] |
Post subject: | Re: LZ's progression |
hoa803 wrote: Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece. I'm just curious. In those 52 games, was ELFv2 running on LZ 0.16 or 0.17? |
Author: | hoa803 [ Mon Apr 08, 2019 9:20 am ] |
Post subject: | Re: LZ's progression |
Should have been 0.17. I would have more info but I screwed up my command line and didn't save any of the games, which is very frustrating. Validate prints out XX-XX win/ loss after each game, so I'm basing it on that alone. I need to rerun to confirm at some point. I'd probably just use the latest network. Right now I'm just helping train the AI rather than running matches. Edit: I messed with it after work today. Turns out the -k statement to save games must be placed towards the beginning of the command line with 0.17. Or at least, it started saving the games when I moved it from the end to directly after validation.exe statement. Edit 2: I'm currently running lz219 vs elfv2 at 30 seconds a move. I think I like that better than absolute time for a comparison, because both engines will be much stronger reading the full 30 seconds each move (and subsequent moves). |
Author: | Vargo [ Tue Apr 09, 2019 5:00 am ] |
Post subject: | Re: LZ's progression |
New network #220 Even if regular LZ(017) is not really designed for handicap games, it can play nice H games. H3 game with komi 7.5 : Crazy Stone DL (5 Dan) v. LZ017#220 (4s/move for LZ, and -r 1 to avoid resigning too soon, laptop with gtx 965) CS and LZ(Sabaki) don't agree on the final score (W+7.5 and W+4.5) I suppose the 3 points difference comes from the 3 handicap stones. Maybe my settings are wrong somehow ? If someone knows, thx... Settings : ______________________________________________________ |
Author: | And [ Tue Apr 09, 2019 5:43 am ] |
Post subject: | Re: LZ's progression |
interesting. zen shows w+4.5 |
Author: | jlt [ Tue Apr 09, 2019 5:44 am ] |
Post subject: | Re: LZ's progression |
LeelaZero counted the score as territory + prisoners (+komi for White) Crazystone counted 182 points for Black, which corresponds to (black living stones)+(black territory). My guess is that the 189.5 points for White correspond to (white living stones)+(white territory)+(komi)+(number of handicap stones). |
Author: | And [ Tue Apr 09, 2019 10:09 am ] |
Post subject: | Re: LZ's progression |
H3 game with komi 7.5 : Crazy Stone DL (5 Dan) - zen 7 (5sec), zen win, score CS W+35.5, zen and sabaki W+34.5. Crazy Stone possibly mistaken? ps I figured it out: CS shows area + handicap, and sabaki and zen - territory |
Author: | Vargo [ Thu Apr 11, 2019 9:01 am ] |
Post subject: | Re: LZ's progression |
And wrote: CS shows area + handicap, and sabaki and zen - territory jlt wrote: LeelaZero counted the score as territory + prisoners (+komi for White) Thx Crazystone counted 182 points for Black, which corresponds to (black living stones)+(black territory) 49 games at H2…H9 played byLZ017#10…#190 v. LZ017#220 time parity, 2 sec/move, 1x 1080, official LZ0.17#220 v.#xxx komi 7.5 , -r 1 for W (to avoid resigning too soon) –r 30 for B (to avoid very long games) In the interesting zone (bold frames) mini 3 game matches : According to THIS SITE, and in KGS rankings #40 is very approximately around 4K #70 is around 3D #100 is around 6D #130 is around 9D (it seems a lot, and these rankings weren't based on 2sec/move) You can play handicap go at this excellent site |
Author: | hoa803 [ Thu Apr 11, 2019 7:04 pm ] |
Post subject: | Re: LZ's progression |
If anybody hasn't used their free $300 from Google Cloud and feels like doing some deep learning, I recently set it up and it's quite easy to do. See this github thread for an updated guide. Note that the Microsoft Azure guide currently doesn't work with LeelaZero 0.17, but I'm trying to figure out the solution. On a single Tesla v100 gpu I am finishing a game every 108 seconds, averaged over 900 games! That means I can expect to make something like 12,000 games (selfplay and matches) before the $300 credit runs out. |
Author: | nbc44 [ Fri Apr 12, 2019 12:46 am ] | ||
Post subject: | Re: LZ's progression | ||
Time parity match with statistically significant result (part I). LZ0v17 #219 vs Elfv2 2x1080ti, 30s per move. Code: #219 v elfv2 ( 400 games)
wins black white #219 175 43.75% 65 41.67% 110 45.08% elfv2 225 56.25% 91 58.33% 134 54.92% 156 39.00% 244 61.00%
|
Author: | Amtiskaw [ Fri Apr 12, 2019 3:13 am ] | ||
Post subject: | Re: LZ's progression | ||
Cool. I adjusted the PB and PW properties in the SGF files to make it a bit clearer who was who.
|
Author: | hoa803 [ Sat Apr 13, 2019 6:10 pm ] |
Post subject: | Re: LZ's progression |
It might provoke an interesting discussion - the folks on GitHub don't feel the time parity matches are a good measure of engine strength, but rather visits. I don't claim to totally understand the reasoning but it might be worth looking into. |
Page 18 of 21 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |