LZ's progression
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: LZ's progression
There is a simple scientific point here, as well. Suppose that B beats A more often than not, and C beats B more often than not, and D beats C more often than not, etc., and we want to know how much more often, say, K beats A, our preferable method is not to try to figure it out based upon our estimates of how often B beats A and C beats B, etc., but to have A and K play against each other. Unless it is prohibitively costly or there are other reasons for not doing so.
One possible reason for not doing so is that both J and K beat A almost 100% of the time, so the answer is uninteresting. But maybe how often K beats D would be interesting. We really should not be arguing about the pluses and minuses of an inferior method.
One possible reason for not doing so is that both J and K beat A almost 100% of the time, so the answer is uninteresting. But maybe how often K beats D would be interesting. We really should not be arguing about the pluses and minuses of an inferior method.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Vargo
- Lives in gote
- Posts: 337
- Joined: Sat Aug 17, 2013 5:28 am
- GD Posts: 0
- Has thanked: 22 times
- Been thanked: 97 times
Re: LZ's progression
40 game match LZ0.16_#213 v. LZ0.16_ELFv2
at time parity (--visits=1601 for #213, --visits=3201 for ELFv2)
twogtp 1.5.0, 3 duplicate games, 37 games used.
Result : ELFv2 wins 19-18
The stats :The games (#213 is B in the even numbered games.games n° 12, 22 and 24 are duplicates)Next time, I'll use -m 20 to avoid duplicates.
at time parity (--visits=1601 for #213, --visits=3201 for ELFv2)
twogtp 1.5.0, 3 duplicate games, 37 games used.
Result : ELFv2 wins 19-18
The stats :
-
And
- Gosei
- Posts: 1464
- Joined: Tue Sep 25, 2018 10:28 am
- GD Posts: 0
- Has thanked: 212 times
- Been thanked: 215 times
Re: LZ's progression
several matches 25x25, nets received by board_resize.py.txt vs
LZ 40x256 #205 by ChangeBoardSizeOfWeight.cpp, 10sec/move, cpuonly, gogui-twogtp:
(https://github.com/leela-zero/leela-zero/issues/2240)
LM 192x15 GX89 - LZ 40x256 #205 13:27
LZ 192x15 f438268e - LZ 40x256 #205 5:35
elf v2 256x20 - LZ 40x256 #205 12:28
converted minigo(25x25) 000990-cormorant works, did not test
and LM 192x15 GX89(by ChangeBoardSizeOfWeight.cpp) - LM 192x15 GX89(by board_resize.py.txt) 37:3 (White 20:0)
LZ 40x256 #205 by ChangeBoardSizeOfWeight.cpp, 10sec/move, cpuonly, gogui-twogtp:
(https://github.com/leela-zero/leela-zero/issues/2240)
LM 192x15 GX89 - LZ 40x256 #205 13:27
LZ 192x15 f438268e - LZ 40x256 #205 5:35
elf v2 256x20 - LZ 40x256 #205 12:28
converted minigo(25x25) 000990-cormorant works, did not test
and LM 192x15 GX89(by ChangeBoardSizeOfWeight.cpp) - LM 192x15 GX89(by board_resize.py.txt) 37:3 (White 20:0)
Re: LZ's progression
Time parity match.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.
5). #211
6). #213
7). #214
in progress...
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.
Code: Select all
#211 v elfv2 ( 27 games)
wins black white
#211 5 18.52% 2 16.67% 3 20.00%
elfv2 22 81.48% 10 83.33% 12 80.00%
12 44.44% 15 55.56%
Code: Select all
#213 v elfv2 ( 26 games)
wins black white
#213 12 46.15% 4 44.44% 8 47.06%
elfv2 14 53.85% 5 55.56% 9 52.94%
9 34.62% 17 65.38%
in progress...
- Attachments
-
- l0-1-elfv2.zip
- (45.99 KiB) Downloaded 555 times
Last edited by nbc44 on Sat Mar 23, 2019 1:48 am, edited 1 time in total.
-
Vargo
- Lives in gote
- Posts: 337
- Joined: Sat Aug 17, 2013 5:28 am
- GD Posts: 0
- Has thanked: 22 times
- Been thanked: 97 times
Re: LZ's progression
50 game match at time parity#214 v. ELFv2
LZ0.16, twogtp 1.5.0
-v 1601 for #214 and -v 3201 for Elf, -m 20 for both.
no duplicate game, no error
ELFv2 wins 28-22 (56%)
The games : (#214 is B in the even numbered games):Command line and stats:
LZ0.16, twogtp 1.5.0
-v 1601 for #214 and -v 3201 for Elf, -m 20 for both.
no duplicate game, no error
ELFv2 wins 28-22 (56%)
The games : (#214 is B in the even numbered games):Command line and stats:
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: LZ's progression
This is for selfplay I think, it may be too random for matches. If you just want to avoid duplicates you could look into --randomtemp (and/or check if there are no weird edge moves).Vargo wrote:-m 20 for both
-
Vargo
- Lives in gote
- Posts: 337
- Joined: Sat Aug 17, 2013 5:28 am
- GD Posts: 0
- Has thanked: 22 times
- Been thanked: 97 times
Re: LZ's progression
You're right, maybe it's too much random.moha wrote: it may be too random for matches
I've looked at the first 20 games, there is no obviously weird move that I can see. In one game, Elf is caught in a ladder before resigning .
Anyway, I'll try -m 20 --randomtemp=0.xxx
-
Vargo
- Lives in gote
- Posts: 337
- Joined: Sat Aug 17, 2013 5:28 am
- GD Posts: 0
- Has thanked: 22 times
- Been thanked: 97 times
Re: LZ's progression
I've tried another 50 game match #214 v. ELF v2
Same parameters, but for -m 20 --randomtemp=0.3
Average game length and average times are almost the same as before, no duplicate.
The games look "normal", but in one case (THIS GAME, n°40) , it's #214 (B) which gets caught in a ladder, and the last W moves look weird, but maybe it's because the winrate was near 100% for W.
Command line and stats :The games (#214 is B in the even numbered games)
Same parameters, but for -m 20 --randomtemp=0.3
Average game length and average times are almost the same as before, no duplicate.
The games look "normal", but in one case (THIS GAME, n°40) , it's #214 (B) which gets caught in a ladder, and the last W moves look weird, but maybe it's because the winrate was near 100% for W.
Command line and stats :
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: LZ's progression
Maybe it has a preference for moves on the first line when the game is nearly over.Vargo wrote:The games look "normal", but in one case (THIS GAME, n°40) , it's #214 (B) which gets caught in a ladder, and the last W moves look weird, but maybe it's because the winrate was near 100% for W.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
Re: LZ's progression
Full disaster:Vargo wrote: -v 1601 for #214 and -v 3201 for Elf
ELFv2 wins 28-22 (56%)
Code: Select all
The first net is worse than the second
#214 v elfv2 ( 77 games)
wins black white
#214 26 33.77% 12 33.33% 14 34.15%
elfv2 51 66.23% 24 66.67% 27 65.85%
36 46.75% 41 53.25%
- Attachments
-
- 214-elv2.zip
- (62.88 KiB) Downloaded 573 times
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: LZ's progression
A recent LZ match game with semeai between 2 huge dragons with ko libs, I didn't analyse in Lizzie yet, I wonder if it's a liberty of big chains oversight or just a "losing anyway so doesn't matter" situation:
Edit: why not connect ko at 235 as then white is just 1 eye and black's middle is alive so no semeai? That's what LZ 205 wants with well under 1600 playouts as in these matches. But at the end 205 wants to capture at a19, doesn't see the huge group has no libs. Elfv2 wants a19 initially, but after a few hundred playouts discovers the big capture and it's #1 by 1k total playouts.
Edit: why not connect ko at 235 as then white is just 1 eye and black's middle is alive so no semeai? That's what LZ 205 wants with well under 1600 playouts as in these matches. But at the end 205 wants to capture at a19, doesn't see the huge group has no libs. Elfv2 wants a19 initially, but after a few hundred playouts discovers the big capture and it's #1 by 1k total playouts.
Re: LZ's progression
Time parity match.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.
7). #214
Long live elfv2
.
Quick test #215 l0 v16 vs #215 l0 v16 next.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.
Code: Select all
#214 v elfv2 ( 26 games)
wins black white
#214 11 42.31% 5 41.67% 6 42.86%
elfv2 15 57.69% 7 58.33% 8 57.14%
12 46.15% 14 53.85%
Quick test #215 l0 v16 vs #215 l0 v16 next.
Code: Select all
The first net is worse than the second
v16 v v16next ( 106 games)
wins black white
v16 41 38.68% 18 37.50% 23 39.66%
v16next 65 61.32% 30 62.50% 35 60.34%
48 45.28% 58 54.72%
- Attachments
-
- v16-v16next.zip
- (84.15 KiB) Downloaded 612 times
-
- 214-elfv2.zip
- (23.02 KiB) Downloaded 590 times
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: LZ's progression
nbc44: about how many playouts per move are Elf and LZ getting at these time settings on your hardware? I'm guessing at least 60k for LZ and double that for Elf?
Re: LZ's progression
B/W/B/W...Uberdude wrote:nbc44: about how many playouts per move are Elf and LZ getting at these time settings on your hardware? I'm guessing at least 60k for LZ and double that for Elf?
- Attachments
-
- 214-elfv2.zip
- (241.68 KiB) Downloaded 555 times
Re: LZ's progression
The thing nobody seems to be talking about in this thread is the confidence interval. The fact that Elf won 57% of the games must be viewed from a statistical point of view.nbc44 wrote:Long live elfv2Code: Select all
#214 v elfv2 ( 26 games) wins black white #214 11 42.31% 5 41.67% 6 42.86% elfv2 15 57.69% 7 58.33% 8 57.14% 12 46.15% 14 53.85%![]()
.
For example, if we use the 57.7% proportion for Elfv2, the 95% percent confidence for 26 samples is 38.7% to 76.7%.
Basically that means with 95% confidence we conclude that the actual chance Elfv2 has of beating #214 on a randomly sampled game is within that range. So, Elfv2 could actually be substantially weaker than #214 or substantially stronger. The only way to narrow this down is more samples.
If we go to 160 games and Elfv2 ends up with the same overall 57.7% winrate, the interval is now 50% to 65%. In that case we would be reasonably sure that Elfv2 is stronger - only 1 in 40 (2.5%) chance that Elfv2 is weaker than #214.
That is, in a nutshell, why we run 400 games for each network match and use 55% as the cutoff. We have 95% confidence that the new network is at least as strong as the previous one, and very likely is stronger. Now, there are speculations about rock, paper, scissors issues going on, but that is a whole different issue.
It's still fun to do the matches, though.
tl;dr: Even if #XXX wins a 20 or 50 or 100 game match, it doesn't mean we necessarily know it was stronger than its opponent. Because statistics.