It is currently Mon Sep 28, 2020 4:26 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 425 posts ]  Go to page Previous  1 ... 15, 16, 17, 18, 19, 20, 21, 22  Next
Author Message
Offline
 Post subject: Re: LZ's progression
Post #341 Posted: Fri Mar 22, 2019 7:20 pm 
Dies in gote

Posts: 50
Liked others: 0
Was liked: 3
Time parity match.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.
C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\XXX.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -k XXX-elfv2

5). #211
Code:
#211 v elfv2 ( 27 games)
           wins        black       white
#211     5 18.52%    2 16.67%    3 20.00%
elfv2   22 81.48%   10 83.33%   12 80.00%
                    12 44.44%   15 55.56%

6). #213
Code:
#213 v elfv2 ( 26 games)
           wins        black       white
#213    12 46.15%    4 44.44%    8 47.06%
elfv2   14 53.85%    5 55.56%    9 52.94%
                     9 34.62%   17 65.38%

7). #214
in progress...


Attachments:
l0-1-elfv2.zip [45.99 KiB]
Downloaded 72 times


Last edited by nbc44 on Sat Mar 23, 2019 1:48 am, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #342 Posted: Fri Mar 22, 2019 11:15 pm 
Lives in gote

Posts: 311
Liked others: 15
Was liked: 93
50 game match at time parity#214 v. ELFv2
LZ0.16, twogtp 1.5.0
-v 1601 for #214 and -v 3201 for Elf, -m 20 for both.
no duplicate game, no error

ELFv2 wins 28-22 (56%)
The games : (#214 is B in the even numbered games):
Attachment:
214_elfv2.zip [44.81 KiB]
Downloaded 72 times
Command line and stats:
Attachment:
214_elfv2.gif
214_elfv2.gif [ 50.14 KiB | Viewed 3819 times ]

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #343 Posted: Sat Mar 23, 2019 1:51 am 
Lives in gote

Posts: 311
Liked others: 0
Was liked: 44
Rank: 2d
Vargo wrote:
-m 20 for both
This is for selfplay I think, it may be too random for matches. If you just want to avoid duplicates you could look into --randomtemp (and/or check if there are no weird edge moves).

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #344 Posted: Sat Mar 23, 2019 5:09 am 
Lives in gote

Posts: 311
Liked others: 15
Was liked: 93
moha wrote:
it may be too random for matches
You're right, maybe it's too much random.
I've looked at the first 20 games, there is no obviously weird move that I can see. In one game, Elf is caught in a ladder before resigning .
Anyway, I'll try -m 20 --randomtemp=0.xxx

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #345 Posted: Sat Mar 23, 2019 8:51 am 
Lives in gote

Posts: 311
Liked others: 15
Was liked: 93
I've tried another 50 game match #214 v. ELF v2
Same parameters, but for -m 20 --randomtemp=0.3
Average game length and average times are almost the same as before, no duplicate.
The games look "normal", but in one case (THIS GAME, n°40) , it's #214 (B) which gets caught in a ladder, and the last W moves look weird, but maybe it's because the winrate was near 100% for W.


Command line and stats :
Attachment:
214_elf2.gif
214_elf2.gif [ 48.85 KiB | Viewed 3727 times ]
The games (#214 is B in the even numbered games)
Attachment:
214_elfv2.zip [45.15 KiB]
Downloaded 79 times

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #346 Posted: Sat Mar 23, 2019 9:15 am 
Honinbo

Posts: 10214
Liked others: 3438
Was liked: 3292
Vargo wrote:
The games look "normal", but in one case (THIS GAME, n°40) , it's #214 (B) which gets caught in a ladder, and the last W moves look weird, but maybe it's because the winrate was near 100% for W.


Maybe it has a preference for moves on the first line when the game is nearly over.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

My two main guides in life:
My mother and my wife. :)

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #347 Posted: Sun Mar 24, 2019 3:20 pm 
Dies in gote

Posts: 50
Liked others: 0
Was liked: 3
Vargo wrote:
-v 1601 for #214 and -v 3201 for Elf
ELFv2 wins 28-22 (56%)

Full disaster:
Code:
The first net is worse than the second
#214 v elfv2 ( 77 games)
           wins        black       white
#214    26 33.77%   12 33.33%   14 34.15%
elfv2   51 66.23%   24 66.67%   27 65.85%
                    36 46.75%   41 53.25%

C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\57499cb9.gz -o "-g -v 1601 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g -v 3201 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -- C:\APPS\l0gpu16\leelaz -- C:\APPS\l0gpu16\leelaz -k 214-elfv2


I think "-v 1601" is too small for l0.


Attachments:
214-elv2.zip [62.88 KiB]
Downloaded 71 times
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #348 Posted: Wed Mar 27, 2019 3:00 am 
Judan

Posts: 6582
Location: Cambridge, UK
Liked others: 405
Was liked: 3625
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
A recent LZ match game with semeai between 2 huge dragons with ko libs, I didn't analyse in Lizzie yet, I wonder if it's a liberty of big chains oversight or just a "losing anyway so doesn't matter" situation:
Edit: why not connect ko at 235 as then white is just 1 eye and black's middle is alive so no semeai? That's what LZ 205 wants with well under 1600 playouts as in these matches. But at the end 205 wants to capture at a19, doesn't see the huge group has no libs. Elfv2 wants a19 initially, but after a few hundred playouts discovers the big capture and it's #1 by 1k total playouts.


Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #349 Posted: Fri Mar 29, 2019 3:38 pm 
Dies in gote

Posts: 50
Liked others: 0
Was liked: 3
Time parity match.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.
C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\57499cb9.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -- C:\APPS\l0gpu16\leelaz --gtp-command "time_settings 1 61 1" -k 214-elfv2

7). #214
Code:
#214 v elfv2 ( 26 games)
           wins        black       white
#214    11 42.31%    5 41.67%    6 42.86%
elfv2   15 57.69%    7 58.33%    8 57.14%
                    12 46.15%   14 53.85%


Long live elfv2 :bow: ;-).

Quick test #215 l0 v16 vs #215 l0 v16 next.
C:\APPS\l0gpu16\validation.exe -n C:\APPS\net\35824222.gz -o "-g -v 1601 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -n C:\APPS\net\35824222.gz -o "-g -v 1601 --gpu 0 --gpu 1 --noponder -t 24 -q -d --timemanage off --precision single -w" -- C:\APPS\l0gpu16\leelaz -- C:\APPS\l0gpu17beta\leelaz -k 215-215

Code:
The first net is worse than the second
v16 v v16next ( 106 games)
             wins        black       white
v16       41 38.68%   18 37.50%   23 39.66%
v16next   65 61.32%   30 62.50%   35 60.34%
                      48 45.28%   58 54.72%


Attachments:
v16-v16next.zip [84.15 KiB]
Downloaded 67 times
214-elfv2.zip [23.02 KiB]
Downloaded 69 times
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #350 Posted: Sat Mar 30, 2019 7:32 am 
Judan

Posts: 6582
Location: Cambridge, UK
Liked others: 405
Was liked: 3625
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
nbc44: about how many playouts per move are Elf and LZ getting at these time settings on your hardware? I'm guessing at least 60k for LZ and double that for Elf?

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #351 Posted: Sat Mar 30, 2019 11:55 pm 
Dies in gote

Posts: 50
Liked others: 0
Was liked: 3
Uberdude wrote:
nbc44: about how many playouts per move are Elf and LZ getting at these time settings on your hardware? I'm guessing at least 60k for LZ and double that for Elf?

B/W/B/W...
250613 visits, 84820075 nodes, 250612 playouts, 2078 n/s
190448 visits, 65159316 nodes, 190447 playouts, 2653 n/s
77923 visits, 26101910 nodes, 77891 playouts, 648 n/s
303442 visits, 103175069 nodes, 277417 playouts, 2306 n/s
112093 visits, 37660799 nodes, 109869 playouts, 914 n/s
365217 visits, 123009648 nodes, 285659 playouts, 2373 n/s
118980 visits, 39563619 nodes, 75817 playouts, 629 n/s
190923 visits, 63611819 nodes, 176277 playouts, 1466 n/s
176104 visits, 58358407 nodes, 72497 playouts, 602 n/s
351184 visits, 116291124 nodes, 160679 playouts, 1335 n/s
246940 visits, 81878279 nodes, 70867 playouts, 589 n/s
501256 visits, 165035122 nodes, 150323 playouts, 1247 n/s
83318 visits, 27847938 nodes, 75966 playouts, 632 n/s
153929 visits, 51113696 nodes, 141326 playouts, 1176 n/s
93749 visits, 31180365 nodes, 76874 playouts, 638 n/s
139757 visits, 45953909 nodes, 139435 playouts, 1159 n/s
75404 visits, 24630095 nodes, 74193 playouts, 617 n/s
216336 visits, 69472594 nodes, 147526 playouts, 1227 n/s
145379 visits, 47116219 nodes, 77450 playouts, 644 n/s
366427 visits, 117271545 nodes, 150348 playouts, 1249 n/s
86769 visits, 28411877 nodes, 77414 playouts, 642 n/s
501487 visits, 160045166 nodes, 151939 playouts, 1261 n/s
114001 visits, 36887397 nodes, 80059 playouts, 665 n/s
434836 visits, 137483406 nodes, 143921 playouts, 1195 n/s
135090 visits, 43525507 nodes, 105729 playouts, 879 n/s
233277 visits, 73629322 nodes, 233096 playouts, 1936 n/s
101721 visits, 32488929 nodes, 87616 playouts, 727 n/s
418878 visits, 131623355 nodes, 188184 playouts, 1563 n/s
171534 visits, 54011108 nodes, 87257 playouts, 726 n/s
606010 visits, 189221269 nodes, 187253 playouts, 1553 n/s
117846 visits, 37483082 nodes, 92986 playouts, 774 n/s
805768 visits, 233047636 nodes, 200978 playouts, 1665 n/s
233223 visits, 73866832 nodes, 123005 playouts, 1023 n/s
982583 visits, 234407618 nodes, 200342 playouts, 1659 n/s
305271 visits, 96251086 nodes, 81473 playouts, 677 n/s
207797 visits, 64699146 nodes, 189889 playouts, 1579 n/s
250755 visits, 78212727 nodes, 81425 playouts, 677 n/s
215883 visits, 66441706 nodes, 214822 playouts, 1784 n/s
277018 visits, 85399205 nodes, 87681 playouts, 729 n/s
144679 visits, 44742881 nodes, 143897 playouts, 1197 n/s
78318 visits, 24415859 nodes, 76512 playouts, 637 n/s
269275 visits, 82240197 nodes, 146149 playouts, 1215 n/s
146883 visits, 45347045 nodes, 79111 playouts, 658 n/s
398934 visits, 120914177 nodes, 155622 playouts, 1293 n/s
175994 visits, 53751891 nodes, 77727 playouts, 646 n/s
552083 visits, 166101310 nodes, 153222 playouts, 1271 n/s
246749 visits, 74962649 nodes, 79211 playouts, 659 n/s
629993 visits, 187980166 nodes, 153944 playouts, 1276 n/s
88040 visits, 26722337 nodes, 76294 playouts, 635 n/s
148788 visits, 44511429 nodes, 143092 playouts, 1191 n/s
127643 visits, 38400718 nodes, 75794 playouts, 629 n/s
265571 visits, 78801002 nodes, 149286 playouts, 1241 n/s
108535 visits, 32642017 nodes, 80518 playouts, 670 n/s
437343 visits, 129209082 nodes, 188574 playouts, 1564 n/s
107997 visits, 32028785 nodes, 93200 playouts, 774 n/s
230308 visits, 66810126 nodes, 220965 playouts, 1838 n/s
169790 visits, 50197172 nodes, 97951 playouts, 815 n/s
237074 visits, 68574635 nodes, 225723 playouts, 1877 n/s
111803 visits, 32939888 nodes, 82566 playouts, 687 n/s
175955 visits, 51000251 nodes, 173955 playouts, 1447 n/s
136662 visits, 39934963 nodes, 79963 playouts, 665 n/s
274117 visits, 78593814 nodes, 239880 playouts, 1994 n/s
189962 visits, 55152771 nodes, 77532 playouts, 645 n/s
430761 visits, 122651692 nodes, 190156 playouts, 1579 n/s
151784 visits, 43429139 nodes, 74553 playouts, 618 n/s
150160 visits, 42951890 nodes, 149849 playouts, 1247 n/s
184539 visits, 52310923 nodes, 89414 playouts, 744 n/s
241240 visits, 68630438 nodes, 153366 playouts, 1275 n/s
188573 visits, 53385445 nodes, 85303 playouts, 709 n/s
300498 visits, 84932290 nodes, 151645 playouts, 1261 n/s
74680 visits, 21176847 nodes, 73977 playouts, 616 n/s
180093 visits, 48929301 nodes, 164014 playouts, 1365 n/s
115861 visits, 31969292 nodes, 101802 playouts, 847 n/s
319387 visits, 85592286 nodes, 162402 playouts, 1350 n/s
164350 visits, 44823562 nodes, 94965 playouts, 790 n/s
487381 visits, 129696375 nodes, 191651 playouts, 1591 n/s
236017 visits, 63356424 nodes, 112047 playouts, 932 n/s
734270 visits, 194816941 nodes, 248320 playouts, 2058 n/s
330659 visits, 88292956 nodes, 98470 playouts, 818 n/s
967196 visits, 232853490 nodes, 240758 playouts, 1992 n/s
390211 visits, 103145716 nodes, 107960 playouts, 897 n/s
1163359 visits, 236127745 nodes, 196489 playouts, 1625 n/s
501097 visits, 131451193 nodes, 124836 playouts, 1036 n/s
1206584 visits, 231229645 nodes, 145417 playouts, 1203 n/s
612815 visits, 160077228 nodes, 118454 playouts, 982 n/s
175267 visits, 46447966 nodes, 167898 playouts, 1397 n/s
239328 visits, 62144395 nodes, 98787 playouts, 819 n/s
189633 visits, 49612063 nodes, 188646 playouts, 1569 n/s
258041 visits, 65922087 nodes, 117900 playouts, 980 n/s
207360 visits, 53834576 nodes, 172708 playouts, 1434 n/s
352491 visits, 90053187 nodes, 95863 playouts, 796 n/s
374772 visits, 96365992 nodes, 171178 playouts, 1422 n/s
119453 visits, 30385030 nodes, 119340 playouts, 993 n/s
221529 visits, 55944427 nodes, 172656 playouts, 1436 n/s
246145 visits, 61739643 nodes, 132279 playouts, 1100 n/s
249721 visits, 63041596 nodes, 192926 playouts, 1605 n/s
248256 visits, 61619633 nodes, 86786 playouts, 719 n/s
406882 visits, 100986133 nodes, 183319 playouts, 1523 n/s
268371 visits, 65930877 nodes, 77822 playouts, 647 n/s
546857 visits, 133591636 nodes, 266471 playouts, 2212 n/s
326711 visits, 79654880 nodes, 92395 playouts, 768 n/s
816033 visits, 197510257 nodes, 312218 playouts, 2587 n/s
408991 visits, 99097457 nodes, 94501 playouts, 785 n/s
511157 visits, 122218143 nodes, 151709 playouts, 1260 n/s
116155 visits, 28312548 nodes, 112099 playouts, 933 n/s
176251 visits, 42227118 nodes, 175579 playouts, 1461 n/s
203875 visits, 49178283 nodes, 125869 playouts, 1047 n/s
240431 visits, 56884272 nodes, 236196 playouts, 1965 n/s
311561 visits, 74337361 nodes, 187422 playouts, 1558 n/s
569816 visits, 132821287 nodes, 345332 playouts, 2866 n/s
185353 visits, 44063495 nodes, 164915 playouts, 1370 n/s
771317 visits, 177794129 nodes, 233376 playouts, 1935 n/s
213961 visits, 49970981 nodes, 107543 playouts, 895 n/s
281769 visits, 64938818 nodes, 263808 playouts, 2194 n/s
275637 visits, 63604672 nodes, 90434 playouts, 752 n/s
472522 visits, 106675451 nodes, 236468 playouts, 1964 n/s
330092 visits, 75322233 nodes, 96025 playouts, 798 n/s
639794 visits, 141835526 nodes, 192012 playouts, 1593 n/s
353392 visits, 79920712 nodes, 117931 playouts, 980 n/s
781693 visits, 170844391 nodes, 164683 playouts, 1366 n/s
448572 visits, 100778753 nodes, 219216 playouts, 1821 n/s
246211 visits, 53838173 nodes, 228742 playouts, 1903 n/s
87496 visits, 19379296 nodes, 80647 playouts, 671 n/s
394670 visits, 81370363 nodes, 270320 playouts, 2247 n/s
179778 visits, 39351493 nodes, 96774 playouts, 805 n/s
756853 visits, 153200823 nodes, 365030 playouts, 3028 n/s
258494 visits, 55715908 nodes, 255301 playouts, 2123 n/s
1148117 visits, 229496303 nodes, 392295 playouts, 3242 n/s
336291 visits, 71877809 nodes, 78169 playouts, 650 n/s
1487251 visits, 233871931 nodes, 340575 playouts, 2817 n/s
417744 visits, 88821582 nodes, 81969 playouts, 681 n/s
1754747 visits, 233238734 nodes, 287685 playouts, 2378 n/s
532208 visits, 112848791 nodes, 114662 playouts, 952 n/s
2227045 visits, 239132718 nodes, 474011 playouts, 3914 n/s
641763 visits, 135646559 nodes, 112719 playouts, 935 n/s
207385 visits, 43194034 nodes, 203313 playouts, 1692 n/s
531609 visits, 110291752 nodes, 106445 playouts, 882 n/s
384754 visits, 78086672 nodes, 198802 playouts, 1650 n/s
480312 visits, 97862558 nodes, 108481 playouts, 901 n/s
508586 visits, 101217246 nodes, 197149 playouts, 1636 n/s
117862 visits, 25229962 nodes, 117765 playouts, 980 n/s
175496 visits, 36145229 nodes, 166979 playouts, 1390 n/s
153196 visits, 32573295 nodes, 108392 playouts, 902 n/s
280472 visits, 56963394 nodes, 271856 playouts, 2261 n/s
172851 visits, 36391861 nodes, 92260 playouts, 767 n/s
238229 visits, 48231715 nodes, 214840 playouts, 1787 n/s
165422 visits, 34405403 nodes, 91866 playouts, 764 n/s
203370 visits, 41042154 nodes, 201058 playouts, 1672 n/s
88966 visits, 18140207 nodes, 88372 playouts, 736 n/s
529503 visits, 104335271 nodes, 404997 playouts, 3364 n/s
91544 visits, 18592232 nodes, 89320 playouts, 744 n/s
199370 visits, 39219823 nodes, 199364 playouts, 1659 n/s
102921 visits, 20458557 nodes, 75458 playouts, 628 n/s
403419 visits, 77388867 nodes, 209354 playouts, 1740 n/s
92393 visits, 18362999 nodes, 92374 playouts, 768 n/s
525443 visits, 98409103 nodes, 244096 playouts, 2027 n/s
123523 visits, 24284162 nodes, 75740 playouts, 630 n/s
170238 visits, 32704453 nodes, 170141 playouts, 1416 n/s
93818 visits, 18000266 nodes, 82404 playouts, 686 n/s
228119 visits, 42622331 nodes, 212055 playouts, 1764 n/s
79806 visits, 15362719 nodes, 79500 playouts, 662 n/s
365085 visits, 65880313 nodes, 183803 playouts, 1528 n/s
142492 visits, 27179332 nodes, 84000 playouts, 699 n/s
464455 visits, 81843561 nodes, 425282 playouts, 3527 n/s
181386 visits, 33812527 nodes, 100199 playouts, 834 n/s
647013 visits, 113150274 nodes, 183833 playouts, 1526 n/s
135091 visits, 25047580 nodes, 134714 playouts, 1121 n/s
188098 visits, 34768708 nodes, 183177 playouts, 1524 n/s
115429 visits, 21335317 nodes, 114063 playouts, 949 n/s
400612 visits, 72057003 nodes, 244407 playouts, 2031 n/s
105013 visits, 19163031 nodes, 105012 playouts, 874 n/s
711468 visits, 126044620 nodes, 386373 playouts, 3203 n/s
144051 visits, 26064356 nodes, 142577 playouts, 1186 n/s
662818 visits, 115790902 nodes, 361212 playouts, 2998 n/s
249160 visits, 44191396 nodes, 182762 playouts, 1520 n/s
322962 visits, 56721305 nodes, 307550 playouts, 2557 n/s
395472 visits, 68386785 nodes, 246858 playouts, 2051 n/s
298981 visits, 51683281 nodes, 298927 playouts, 2486 n/s
218812 visits, 37965019 nodes, 216719 playouts, 1803 n/s
240163 visits, 41166634 nodes, 201448 playouts, 1676 n/s
234788 visits, 40383272 nodes, 119099 playouts, 991 n/s
353334 visits, 58843402 nodes, 309728 playouts, 2575 n/s
191736 visits, 32433966 nodes, 95128 playouts, 791 n/s
204900 visits, 34610202 nodes, 203888 playouts, 1696 n/s
167516 visits, 27614425 nodes, 85731 playouts, 712 n/s
302203 visits, 50115942 nodes, 178920 playouts, 1488 n/s
204149 visits, 33208032 nodes, 83348 playouts, 693 n/s
421426 visits, 69429830 nodes, 200128 playouts, 1661 n/s
212123 visits, 34465936 nodes, 78508 playouts, 653 n/s
345393 visits, 55692175 nodes, 186125 playouts, 1547 n/s
271733 visits, 43407183 nodes, 84145 playouts, 700 n/s
247501 visits, 39118403 nodes, 210608 playouts, 1752 n/s
280434 visits, 43477821 nodes, 107397 playouts, 893 n/s
199734 visits, 31835803 nodes, 199149 playouts, 1657 n/s
93206 visits, 14877012 nodes, 87426 playouts, 728 n/s


Attachments:
214-elfv2.zip [241.68 KiB]
Downloaded 69 times
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #352 Posted: Tue Apr 02, 2019 7:58 pm 
Beginner

Posts: 19
Liked others: 0
Was liked: 2
nbc44 wrote:
Code:
#214 v elfv2 ( 26 games)
           wins        black       white
#214    11 42.31%    5 41.67%    6 42.86%
elfv2   15 57.69%    7 58.33%    8 57.14%
                    12 46.15%   14 53.85%


Long live elfv2 :bow: ;-).



The thing nobody seems to be talking about in this thread is the confidence interval. The fact that Elf won 57% of the games must be viewed from a statistical point of view.

For example, if we use the 57.7% proportion for Elfv2, the 95% percent confidence for 26 samples is 38.7% to 76.7%.

Basically that means with 95% confidence we conclude that the actual chance Elfv2 has of beating #214 on a randomly sampled game is within that range. So, Elfv2 could actually be substantially weaker than #214 or substantially stronger. The only way to narrow this down is more samples.

If we go to 160 games and Elfv2 ends up with the same overall 57.7% winrate, the interval is now 50% to 65%. In that case we would be reasonably sure that Elfv2 is stronger - only 1 in 40 (2.5%) chance that Elfv2 is weaker than #214.

That is, in a nutshell, why we run 400 games for each network match and use 55% as the cutoff. We have 95% confidence that the new network is at least as strong as the previous one, and very likely is stronger. Now, there are speculations about rock, paper, scissors issues going on, but that is a whole different issue.

It's still fun to do the matches, though. :)

tl;dr: Even if #XXX wins a 20 or 50 or 100 game match, it doesn't mean we necessarily know it was stronger than its opponent. Because statistics.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #353 Posted: Fri Apr 05, 2019 4:11 pm 
Dies in gote

Posts: 50
Liked others: 0
Was liked: 3
Yes, you are right. All these tests are just rubbish in terms of mathematical statistics. And poor Lee Sedol still has chances to defeat Alfago :).
And you are right again:

hoa803 wrote:
It's still fun to do the matches, though. :)

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #354 Posted: Fri Apr 05, 2019 4:55 pm 
Beginner

Posts: 19
Liked others: 0
Was liked: 2
I don't know what they did in the new version of Leela but my Gflops pretty much doubled. I'm looking at an RTX 2060 for gaming and deep learning. Anybody try one and have a benchmark?

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #355 Posted: Sat Apr 06, 2019 1:04 am 
Lives in gote

Posts: 311
Liked others: 15
Was liked: 93
hoa803 wrote:
The thing nobody seems to be talking about in this thread is the confidence interval.
nbc44 wrote:
All these tests are just rubbish
All these 20, 30 ... game matches aren't gospel, obviously. Match parameters vary wildly (different gpus, different time per game, different number of visits, usage of -m, -r, etc. etc.)
No one thinks #XXX is definitively stronger than elfv2 just because XXX won a single 20 game match by 11-9 (for example)

But all these matches are really fun to run, and I think, taken as a whole, they can give an idea of the strength of the different networks.

They're not gospel, but they're not rubbish either, even with so few as 20 games.

For example
20 game match XXX v. YYY , result 15-5
If XXX and YYY were the same strength, there's only 2% chance for XXX to get at least 15 wins. It's not unreasonable to think that XXX is stronger, (particularly if there are several such matches going the same way).

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #356 Posted: Sat Apr 06, 2019 7:25 pm 
Beginner

Posts: 19
Liked others: 0
Was liked: 2
Vargo wrote:
But all these matches are really fun to run, and I think, taken as a whole, they can give an idea of the strength of the different networks.

They're not gospel, but they're not rubbish either, even with so few as 20 games.

For example
20 game match XXX v. YYY , result 15-5
If XXX and YYY were the same strength, there's only 2% chance for XXX to get at least 15 wins. It's not unreasonable to think that XXX is stronger, (particularly if there are several such matches going the same way).


Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece.

Seems like whatever blip caused some of the LZ nets to be significantly weaker than Elf has gone away. I wish I had better hardware to run out more visits and/or more games, but if I did that I'd have less statistical significance. Still, the 0.17 version of Leela gets ~2600 Gflops on my gpu, which results in a decent number of visits per move at that time control.

I will say that one thing I disagree with is adding any of the self play randomness params to matches that ostensibly compare engine strength. I feel like the main value there is the games are perhaps more interesting to watch. However, I think any of the programmers on Github would agree it shouldn't be used in a "match" situation. Although, I haven't seen anybody discuss such a thing outside of training, so who knows.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #357 Posted: Sun Apr 07, 2019 11:34 am 
Dies with sente

Posts: 101
Liked others: 2
Was liked: 16
Rank: KGS 2 D
hoa803 wrote:

Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece.


I'm just curious. In those 52 games, was ELFv2 running on LZ 0.16 or 0.17?

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #358 Posted: Mon Apr 08, 2019 9:20 am 
Beginner

Posts: 19
Liked others: 0
Was liked: 2
Should have been 0.17. I would have more info but I screwed up my command line and didn't save any of the games, which is very frustrating. Validate prints out XX-XX win/ loss after each game, so I'm basing it on that alone. I need to rerun to confirm at some point. I'd probably just use the latest network.

Right now I'm just helping train the AI rather than running matches.

Edit: I messed with it after work today. Turns out the -k statement to save games must be placed towards the beginning of the command line with 0.17. Or at least, it started saving the games when I moved it from the end to directly after validation.exe statement.

Edit 2: I'm currently running lz219 vs elfv2 at 30 seconds a move. I think I like that better than absolute time for a comparison, because both engines will be much stronger reading the full 30 seconds each move (and subsequent moves).

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #359 Posted: Tue Apr 09, 2019 5:00 am 
Lives in gote

Posts: 311
Liked others: 15
Was liked: 93
New network #220 :bow:
Even if regular LZ(017) is not really designed for handicap games, it can play nice H games.

H3 game with komi 7.5 : Crazy Stone DL (5 Dan) v. LZ017#220
(4s/move for LZ, and -r 1 to avoid resigning too soon, laptop with gtx 965)
CS and LZ(Sabaki) don't agree on the final score (W+7.5 and W+4.5) I suppose the 3 points difference comes from the 3 handicap stones. Maybe my settings are wrong somehow ? If someone knows, thx...

Settings :
Attachment:
settings.jpg
settings.jpg [ 114.83 KiB | Viewed 2425 times ]
counting...
Attachment:
territory.jpg
territory.jpg [ 220.66 KiB | Viewed 2425 times ]
--> the game
______________________________________________________

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #360 Posted: Tue Apr 09, 2019 5:43 am 
Lives in gote
User avatar

Posts: 450
Liked others: 69
Was liked: 47
interesting. zen shows w+4.5

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 425 posts ]  Go to page Previous  1 ... 15, 16, 17, 18, 19, 20, 21, 22  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group