LZ's progression

For discussing go computing, software announcements, etc.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: LZ's progression

Post by Bill Spight »

There is a simple scientific point here, as well. Suppose that B beats A more often than not, and C beats B more often than not, and D beats C more often than not, etc., and we want to know how much more often, say, K beats A, our preferable method is not to try to figure it out based upon our estimates of how often B beats A and C beats B, etc., but to have A and K play against each other. Unless it is prohibitively costly or there are other reasons for not doing so.

One possible reason for not doing so is that both J and K beat A almost 100% of the time, so the answer is uninteresting. But maybe how often K beats D would be interesting. We really should not be arguing about the pluses and minuses of an inferior method.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

40 game match LZ0.16_#213 v. LZ0.16_ELFv2
at time parity (--visits=1601 for #213, --visits=3201 for ELFv2)
twogtp 1.5.0, 3 duplicate games, 37 games used.

Result : ELFv2 wins 19-18

The stats :
213elf.gif
213elf.gif (76.44 KiB) Viewed 11932 times
The games (#213 is B in the even numbered games.games n° 12, 22 and 24 are duplicates)
213_elfv2.zip
(33.49 KiB) Downloaded 505 times
Next time, I'll use -m 20 to avoid duplicates.
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: LZ's progression

Post by And »

several matches 25x25, nets received by board_resize.py.txt vs
LZ 40x256 #205 by ChangeBoardSizeOfWeight.cpp, 10sec/move, cpuonly, gogui-twogtp:
(https://github.com/leela-zero/leela-zero/issues/2240)

LM 192x15 GX89 - LZ 40x256 #205 13:27
LZ 192x15 f438268e - LZ 40x256 #205 5:35
elf v2 256x20 - LZ 40x256 #205 12:28
converted minigo(25x25) 000990-cormorant works, did not test
and LM 192x15 GX89(by ChangeBoardSizeOfWeight.cpp) - LM 192x15 GX89(by board_resize.py.txt) 37:3 (White 20:0)
nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

Time parity match.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.
C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\XXX.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -k XXX-elfv2
5). #211

Code: Select all

#211 v elfv2 ( 27 games)
           wins        black       white
#211     5 18.52%    2 16.67%    3 20.00%
elfv2   22 81.48%   10 83.33%   12 80.00%
                    12 44.44%   15 55.56%
6). #213

Code: Select all

#213 v elfv2 ( 26 games)
           wins        black       white
#213    12 46.15%    4 44.44%    8 47.06%
elfv2   14 53.85%    5 55.56%    9 52.94%
                     9 34.62%   17 65.38%
7). #214
in progress...
Attachments
l0-1-elfv2.zip
(45.99 KiB) Downloaded 555 times
Last edited by nbc44 on Sat Mar 23, 2019 1:48 am, edited 1 time in total.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

50 game match at time parity#214 v. ELFv2
LZ0.16, twogtp 1.5.0
-v 1601 for #214 and -v 3201 for Elf, -m 20 for both.
no duplicate game, no error

ELFv2 wins 28-22 (56%)
The games : (#214 is B in the even numbered games):
214_elfv2.zip
(44.81 KiB) Downloaded 501 times
Command line and stats:
214_elfv2.gif
214_elfv2.gif (50.14 KiB) Viewed 13085 times
moha
Lives in gote
Posts: 311
Joined: Wed May 31, 2017 6:49 am
Rank: 2d
GD Posts: 0
Been thanked: 45 times

Re: LZ's progression

Post by moha »

Vargo wrote:-m 20 for both
This is for selfplay I think, it may be too random for matches. If you just want to avoid duplicates you could look into --randomtemp (and/or check if there are no weird edge moves).
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

moha wrote: it may be too random for matches
You're right, maybe it's too much random.
I've looked at the first 20 games, there is no obviously weird move that I can see. In one game, Elf is caught in a ladder before resigning .
Anyway, I'll try -m 20 --randomtemp=0.xxx
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

I've tried another 50 game match #214 v. ELF v2
Same parameters, but for -m 20 --randomtemp=0.3
Average game length and average times are almost the same as before, no duplicate.
The games look "normal", but in one case (THIS GAME, n°40) , it's #214 (B) which gets caught in a ladder, and the last W moves look weird, but maybe it's because the winrate was near 100% for W.


Command line and stats :
214_elf2.gif
214_elf2.gif (48.85 KiB) Viewed 12993 times
The games (#214 is B in the even numbered games)
214_elfv2.zip
(45.15 KiB) Downloaded 549 times
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: LZ's progression

Post by Bill Spight »

Vargo wrote:The games look "normal", but in one case (THIS GAME, n°40) , it's #214 (B) which gets caught in a ladder, and the last W moves look weird, but maybe it's because the winrate was near 100% for W.
Maybe it has a preference for moves on the first line when the game is nearly over.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

Vargo wrote: -v 1601 for #214 and -v 3201 for Elf
ELFv2 wins 28-22 (56%)
Full disaster:

Code: Select all

The first net is worse than the second
#214 v elfv2 ( 77 games)
           wins        black       white
#214    26 33.77%   12 33.33%   14 34.15%
elfv2   51 66.23%   24 66.67%   27 65.85%
                    36 46.75%   41 53.25%
C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\57499cb9.gz -o "-g -v 1601 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g -v 3201 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -- C:\APPS\l0gpu16\leelaz -- C:\APPS\l0gpu16\leelaz -k 214-elfv2
I think "-v 1601" is too small for l0.
Attachments
214-elv2.zip
(62.88 KiB) Downloaded 573 times
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: LZ's progression

Post by Uberdude »

A recent LZ match game with semeai between 2 huge dragons with ko libs, I didn't analyse in Lizzie yet, I wonder if it's a liberty of big chains oversight or just a "losing anyway so doesn't matter" situation:
Edit: why not connect ko at 235 as then white is just 1 eye and black's middle is alive so no semeai? That's what LZ 205 wants with well under 1600 playouts as in these matches. But at the end 205 wants to capture at a19, doesn't see the huge group has no libs. Elfv2 wants a19 initially, but after a few hundred playouts discovers the big capture and it's #1 by 1k total playouts.

nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

Time parity match.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.
C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\57499cb9.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -k 214-elfv2
7). #214

Code: Select all

#214 v elfv2 ( 26 games)
           wins        black       white
#214    11 42.31%    5 41.67%    6 42.86%
elfv2   15 57.69%    7 58.33%    8 57.14%
                    12 46.15%   14 53.85%
Long live elfv2 :bow: ;-).

Quick test #215 l0 v16 vs #215 l0 v16 next.
C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\35824222.gz -o "-g -v 1601 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -n C:\APPS\net\35824222.gz -o "-g -v 1601 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -- C:\APPS\l0gpu16\leelaz -- C:\APPS\l0gpu17beta\leelaz -k 215-215

Code: Select all

The first net is worse than the second
v16 v v16next ( 106 games)
             wins        black       white
v16       41 38.68%   18 37.50%   23 39.66%
v16next   65 61.32%   30 62.50%   35 60.34%
                      48 45.28%   58 54.72%
Attachments
v16-v16next.zip
(84.15 KiB) Downloaded 612 times
214-elfv2.zip
(23.02 KiB) Downloaded 590 times
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: LZ's progression

Post by Uberdude »

nbc44: about how many playouts per move are Elf and LZ getting at these time settings on your hardware? I'm guessing at least 60k for LZ and double that for Elf?
nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

Uberdude wrote:nbc44: about how many playouts per move are Elf and LZ getting at these time settings on your hardware? I'm guessing at least 60k for LZ and double that for Elf?
B/W/B/W...
250613 visits, 84820075 nodes, 250612 playouts, 2078 n/s
190448 visits, 65159316 nodes, 190447 playouts, 2653 n/s
77923 visits, 26101910 nodes, 77891 playouts, 648 n/s
303442 visits, 103175069 nodes, 277417 playouts, 2306 n/s
112093 visits, 37660799 nodes, 109869 playouts, 914 n/s
365217 visits, 123009648 nodes, 285659 playouts, 2373 n/s
118980 visits, 39563619 nodes, 75817 playouts, 629 n/s
190923 visits, 63611819 nodes, 176277 playouts, 1466 n/s
176104 visits, 58358407 nodes, 72497 playouts, 602 n/s
351184 visits, 116291124 nodes, 160679 playouts, 1335 n/s
246940 visits, 81878279 nodes, 70867 playouts, 589 n/s
501256 visits, 165035122 nodes, 150323 playouts, 1247 n/s
83318 visits, 27847938 nodes, 75966 playouts, 632 n/s
153929 visits, 51113696 nodes, 141326 playouts, 1176 n/s
93749 visits, 31180365 nodes, 76874 playouts, 638 n/s
139757 visits, 45953909 nodes, 139435 playouts, 1159 n/s
75404 visits, 24630095 nodes, 74193 playouts, 617 n/s
216336 visits, 69472594 nodes, 147526 playouts, 1227 n/s
145379 visits, 47116219 nodes, 77450 playouts, 644 n/s
366427 visits, 117271545 nodes, 150348 playouts, 1249 n/s
86769 visits, 28411877 nodes, 77414 playouts, 642 n/s
501487 visits, 160045166 nodes, 151939 playouts, 1261 n/s
114001 visits, 36887397 nodes, 80059 playouts, 665 n/s
434836 visits, 137483406 nodes, 143921 playouts, 1195 n/s
135090 visits, 43525507 nodes, 105729 playouts, 879 n/s
233277 visits, 73629322 nodes, 233096 playouts, 1936 n/s
101721 visits, 32488929 nodes, 87616 playouts, 727 n/s
418878 visits, 131623355 nodes, 188184 playouts, 1563 n/s
171534 visits, 54011108 nodes, 87257 playouts, 726 n/s
606010 visits, 189221269 nodes, 187253 playouts, 1553 n/s
117846 visits, 37483082 nodes, 92986 playouts, 774 n/s
805768 visits, 233047636 nodes, 200978 playouts, 1665 n/s
233223 visits, 73866832 nodes, 123005 playouts, 1023 n/s
982583 visits, 234407618 nodes, 200342 playouts, 1659 n/s
305271 visits, 96251086 nodes, 81473 playouts, 677 n/s
207797 visits, 64699146 nodes, 189889 playouts, 1579 n/s
250755 visits, 78212727 nodes, 81425 playouts, 677 n/s
215883 visits, 66441706 nodes, 214822 playouts, 1784 n/s
277018 visits, 85399205 nodes, 87681 playouts, 729 n/s
144679 visits, 44742881 nodes, 143897 playouts, 1197 n/s
78318 visits, 24415859 nodes, 76512 playouts, 637 n/s
269275 visits, 82240197 nodes, 146149 playouts, 1215 n/s
146883 visits, 45347045 nodes, 79111 playouts, 658 n/s
398934 visits, 120914177 nodes, 155622 playouts, 1293 n/s
175994 visits, 53751891 nodes, 77727 playouts, 646 n/s
552083 visits, 166101310 nodes, 153222 playouts, 1271 n/s
246749 visits, 74962649 nodes, 79211 playouts, 659 n/s
629993 visits, 187980166 nodes, 153944 playouts, 1276 n/s
88040 visits, 26722337 nodes, 76294 playouts, 635 n/s
148788 visits, 44511429 nodes, 143092 playouts, 1191 n/s
127643 visits, 38400718 nodes, 75794 playouts, 629 n/s
265571 visits, 78801002 nodes, 149286 playouts, 1241 n/s
108535 visits, 32642017 nodes, 80518 playouts, 670 n/s
437343 visits, 129209082 nodes, 188574 playouts, 1564 n/s
107997 visits, 32028785 nodes, 93200 playouts, 774 n/s
230308 visits, 66810126 nodes, 220965 playouts, 1838 n/s
169790 visits, 50197172 nodes, 97951 playouts, 815 n/s
237074 visits, 68574635 nodes, 225723 playouts, 1877 n/s
111803 visits, 32939888 nodes, 82566 playouts, 687 n/s
175955 visits, 51000251 nodes, 173955 playouts, 1447 n/s
136662 visits, 39934963 nodes, 79963 playouts, 665 n/s
274117 visits, 78593814 nodes, 239880 playouts, 1994 n/s
189962 visits, 55152771 nodes, 77532 playouts, 645 n/s
430761 visits, 122651692 nodes, 190156 playouts, 1579 n/s
151784 visits, 43429139 nodes, 74553 playouts, 618 n/s
150160 visits, 42951890 nodes, 149849 playouts, 1247 n/s
184539 visits, 52310923 nodes, 89414 playouts, 744 n/s
241240 visits, 68630438 nodes, 153366 playouts, 1275 n/s
188573 visits, 53385445 nodes, 85303 playouts, 709 n/s
300498 visits, 84932290 nodes, 151645 playouts, 1261 n/s
74680 visits, 21176847 nodes, 73977 playouts, 616 n/s
180093 visits, 48929301 nodes, 164014 playouts, 1365 n/s
115861 visits, 31969292 nodes, 101802 playouts, 847 n/s
319387 visits, 85592286 nodes, 162402 playouts, 1350 n/s
164350 visits, 44823562 nodes, 94965 playouts, 790 n/s
487381 visits, 129696375 nodes, 191651 playouts, 1591 n/s
236017 visits, 63356424 nodes, 112047 playouts, 932 n/s
734270 visits, 194816941 nodes, 248320 playouts, 2058 n/s
330659 visits, 88292956 nodes, 98470 playouts, 818 n/s
967196 visits, 232853490 nodes, 240758 playouts, 1992 n/s
390211 visits, 103145716 nodes, 107960 playouts, 897 n/s
1163359 visits, 236127745 nodes, 196489 playouts, 1625 n/s
501097 visits, 131451193 nodes, 124836 playouts, 1036 n/s
1206584 visits, 231229645 nodes, 145417 playouts, 1203 n/s
612815 visits, 160077228 nodes, 118454 playouts, 982 n/s
175267 visits, 46447966 nodes, 167898 playouts, 1397 n/s
239328 visits, 62144395 nodes, 98787 playouts, 819 n/s
189633 visits, 49612063 nodes, 188646 playouts, 1569 n/s
258041 visits, 65922087 nodes, 117900 playouts, 980 n/s
207360 visits, 53834576 nodes, 172708 playouts, 1434 n/s
352491 visits, 90053187 nodes, 95863 playouts, 796 n/s
374772 visits, 96365992 nodes, 171178 playouts, 1422 n/s
119453 visits, 30385030 nodes, 119340 playouts, 993 n/s
221529 visits, 55944427 nodes, 172656 playouts, 1436 n/s
246145 visits, 61739643 nodes, 132279 playouts, 1100 n/s
249721 visits, 63041596 nodes, 192926 playouts, 1605 n/s
248256 visits, 61619633 nodes, 86786 playouts, 719 n/s
406882 visits, 100986133 nodes, 183319 playouts, 1523 n/s
268371 visits, 65930877 nodes, 77822 playouts, 647 n/s
546857 visits, 133591636 nodes, 266471 playouts, 2212 n/s
326711 visits, 79654880 nodes, 92395 playouts, 768 n/s
816033 visits, 197510257 nodes, 312218 playouts, 2587 n/s
408991 visits, 99097457 nodes, 94501 playouts, 785 n/s
511157 visits, 122218143 nodes, 151709 playouts, 1260 n/s
116155 visits, 28312548 nodes, 112099 playouts, 933 n/s
176251 visits, 42227118 nodes, 175579 playouts, 1461 n/s
203875 visits, 49178283 nodes, 125869 playouts, 1047 n/s
240431 visits, 56884272 nodes, 236196 playouts, 1965 n/s
311561 visits, 74337361 nodes, 187422 playouts, 1558 n/s
569816 visits, 132821287 nodes, 345332 playouts, 2866 n/s
185353 visits, 44063495 nodes, 164915 playouts, 1370 n/s
771317 visits, 177794129 nodes, 233376 playouts, 1935 n/s
213961 visits, 49970981 nodes, 107543 playouts, 895 n/s
281769 visits, 64938818 nodes, 263808 playouts, 2194 n/s
275637 visits, 63604672 nodes, 90434 playouts, 752 n/s
472522 visits, 106675451 nodes, 236468 playouts, 1964 n/s
330092 visits, 75322233 nodes, 96025 playouts, 798 n/s
639794 visits, 141835526 nodes, 192012 playouts, 1593 n/s
353392 visits, 79920712 nodes, 117931 playouts, 980 n/s
781693 visits, 170844391 nodes, 164683 playouts, 1366 n/s
448572 visits, 100778753 nodes, 219216 playouts, 1821 n/s
246211 visits, 53838173 nodes, 228742 playouts, 1903 n/s
87496 visits, 19379296 nodes, 80647 playouts, 671 n/s
394670 visits, 81370363 nodes, 270320 playouts, 2247 n/s
179778 visits, 39351493 nodes, 96774 playouts, 805 n/s
756853 visits, 153200823 nodes, 365030 playouts, 3028 n/s
258494 visits, 55715908 nodes, 255301 playouts, 2123 n/s
1148117 visits, 229496303 nodes, 392295 playouts, 3242 n/s
336291 visits, 71877809 nodes, 78169 playouts, 650 n/s
1487251 visits, 233871931 nodes, 340575 playouts, 2817 n/s
417744 visits, 88821582 nodes, 81969 playouts, 681 n/s
1754747 visits, 233238734 nodes, 287685 playouts, 2378 n/s
532208 visits, 112848791 nodes, 114662 playouts, 952 n/s
2227045 visits, 239132718 nodes, 474011 playouts, 3914 n/s
641763 visits, 135646559 nodes, 112719 playouts, 935 n/s
207385 visits, 43194034 nodes, 203313 playouts, 1692 n/s
531609 visits, 110291752 nodes, 106445 playouts, 882 n/s
384754 visits, 78086672 nodes, 198802 playouts, 1650 n/s
480312 visits, 97862558 nodes, 108481 playouts, 901 n/s
508586 visits, 101217246 nodes, 197149 playouts, 1636 n/s
117862 visits, 25229962 nodes, 117765 playouts, 980 n/s
175496 visits, 36145229 nodes, 166979 playouts, 1390 n/s
153196 visits, 32573295 nodes, 108392 playouts, 902 n/s
280472 visits, 56963394 nodes, 271856 playouts, 2261 n/s
172851 visits, 36391861 nodes, 92260 playouts, 767 n/s
238229 visits, 48231715 nodes, 214840 playouts, 1787 n/s
165422 visits, 34405403 nodes, 91866 playouts, 764 n/s
203370 visits, 41042154 nodes, 201058 playouts, 1672 n/s
88966 visits, 18140207 nodes, 88372 playouts, 736 n/s
529503 visits, 104335271 nodes, 404997 playouts, 3364 n/s
91544 visits, 18592232 nodes, 89320 playouts, 744 n/s
199370 visits, 39219823 nodes, 199364 playouts, 1659 n/s
102921 visits, 20458557 nodes, 75458 playouts, 628 n/s
403419 visits, 77388867 nodes, 209354 playouts, 1740 n/s
92393 visits, 18362999 nodes, 92374 playouts, 768 n/s
525443 visits, 98409103 nodes, 244096 playouts, 2027 n/s
123523 visits, 24284162 nodes, 75740 playouts, 630 n/s
170238 visits, 32704453 nodes, 170141 playouts, 1416 n/s
93818 visits, 18000266 nodes, 82404 playouts, 686 n/s
228119 visits, 42622331 nodes, 212055 playouts, 1764 n/s
79806 visits, 15362719 nodes, 79500 playouts, 662 n/s
365085 visits, 65880313 nodes, 183803 playouts, 1528 n/s
142492 visits, 27179332 nodes, 84000 playouts, 699 n/s
464455 visits, 81843561 nodes, 425282 playouts, 3527 n/s
181386 visits, 33812527 nodes, 100199 playouts, 834 n/s
647013 visits, 113150274 nodes, 183833 playouts, 1526 n/s
135091 visits, 25047580 nodes, 134714 playouts, 1121 n/s
188098 visits, 34768708 nodes, 183177 playouts, 1524 n/s
115429 visits, 21335317 nodes, 114063 playouts, 949 n/s
400612 visits, 72057003 nodes, 244407 playouts, 2031 n/s
105013 visits, 19163031 nodes, 105012 playouts, 874 n/s
711468 visits, 126044620 nodes, 386373 playouts, 3203 n/s
144051 visits, 26064356 nodes, 142577 playouts, 1186 n/s
662818 visits, 115790902 nodes, 361212 playouts, 2998 n/s
249160 visits, 44191396 nodes, 182762 playouts, 1520 n/s
322962 visits, 56721305 nodes, 307550 playouts, 2557 n/s
395472 visits, 68386785 nodes, 246858 playouts, 2051 n/s
298981 visits, 51683281 nodes, 298927 playouts, 2486 n/s
218812 visits, 37965019 nodes, 216719 playouts, 1803 n/s
240163 visits, 41166634 nodes, 201448 playouts, 1676 n/s
234788 visits, 40383272 nodes, 119099 playouts, 991 n/s
353334 visits, 58843402 nodes, 309728 playouts, 2575 n/s
191736 visits, 32433966 nodes, 95128 playouts, 791 n/s
204900 visits, 34610202 nodes, 203888 playouts, 1696 n/s
167516 visits, 27614425 nodes, 85731 playouts, 712 n/s
302203 visits, 50115942 nodes, 178920 playouts, 1488 n/s
204149 visits, 33208032 nodes, 83348 playouts, 693 n/s
421426 visits, 69429830 nodes, 200128 playouts, 1661 n/s
212123 visits, 34465936 nodes, 78508 playouts, 653 n/s
345393 visits, 55692175 nodes, 186125 playouts, 1547 n/s
271733 visits, 43407183 nodes, 84145 playouts, 700 n/s
247501 visits, 39118403 nodes, 210608 playouts, 1752 n/s
280434 visits, 43477821 nodes, 107397 playouts, 893 n/s
199734 visits, 31835803 nodes, 199149 playouts, 1657 n/s
93206 visits, 14877012 nodes, 87426 playouts, 728 n/s
Attachments
214-elfv2.zip
(241.68 KiB) Downloaded 555 times
hoa803
Beginner
Posts: 19
Joined: Tue Apr 02, 2019 7:12 pm
GD Posts: 0
Been thanked: 2 times

Re: LZ's progression

Post by hoa803 »

nbc44 wrote:

Code: Select all

#214 v elfv2 ( 26 games)
           wins        black       white
#214    11 42.31%    5 41.67%    6 42.86%
elfv2   15 57.69%    7 58.33%    8 57.14%
                    12 46.15%   14 53.85%
Long live elfv2 :bow: ;-).
The thing nobody seems to be talking about in this thread is the confidence interval. The fact that Elf won 57% of the games must be viewed from a statistical point of view.

For example, if we use the 57.7% proportion for Elfv2, the 95% percent confidence for 26 samples is 38.7% to 76.7%.

Basically that means with 95% confidence we conclude that the actual chance Elfv2 has of beating #214 on a randomly sampled game is within that range. So, Elfv2 could actually be substantially weaker than #214 or substantially stronger. The only way to narrow this down is more samples.

If we go to 160 games and Elfv2 ends up with the same overall 57.7% winrate, the interval is now 50% to 65%. In that case we would be reasonably sure that Elfv2 is stronger - only 1 in 40 (2.5%) chance that Elfv2 is weaker than #214.

That is, in a nutshell, why we run 400 games for each network match and use 55% as the cutoff. We have 95% confidence that the new network is at least as strong as the previous one, and very likely is stronger. Now, there are speculations about rock, paper, scissors issues going on, but that is a whole different issue.

It's still fun to do the matches, though. :)

tl;dr: Even if #XXX wins a 20 or 50 or 100 game match, it doesn't mean we necessarily know it was stronger than its opponent. Because statistics.
Post Reply