Engine Tournament

q30 · Post by **q30** » Sat Jul 25, 2020 1:23 am

The best "light heavyweight" LeelaZero neuronet in 2019 year was 40b_257a_64k_q (details)

q30 · Post by **q30** » Sat Aug 08, 2020 2:03 am

The best LeelaZero "welterweight" file was 15b_257_248k_q in 2019 (details).

q30 · Post by **q30** » Sat Aug 15, 2020 2:26 am

The best weight of LeelaZero neuronet was 40b_257a_64k_q in 2019 year (details).
So, the rate of "weight categories" in 2019 was the next:
"bantamweight" <|= 2 ^ 23 B (< 12 MiB) - best_5b.txt;................................................................................(6)
"featherweight" 2 ^ 24 B (12 - 24 MiB) - I haven't;
"lightweight" 2 ^ 25 B (24 - 48 MiB) - LeelaMaster_E08.txt;..........................................................................(5)
"welterweight" 2 ^ 26 B (48 - 96 MiB) - 15b_257_248k_q;.............................................................................(2)
"middleweight" 2 ^ 27 B (96 - 192 MiB) - 20b_254_784k_q;............................................................................(4)
"light heavyweight" 2 ^ 28 B (192 - 384 MiB) - 40b_257a_64k_q;....................................................................(1)
"heavyweight" 2 ^ 29 B (384 - 768 MiB) - 14a3a5f70aba55e312af52f97fe44b1376f0b9a966639e8f21ca38333886dd3b;...(3)
"super heavyweight" >|= 2 ^ 30 B (> 768 MiB) - I haven't.

q30 · Post by **q30** » Sat Oct 03, 2020 2:17 am

The rate of Go engines without using GPU after 2019 year was the next (details):

Top level
1) LeelaZero

High level
2) Leela
3) Rayon
4) Hiratuka
5) Zenith

Middle level
6) Pachi_DCNN
7) Ray
8) Pachi
9) MoGo

q30 · Post by **q30** » Sat Nov 21, 2020 4:26 am

The rate of "weight category" KataGo winners (details):

"bantamweight"------<|= 2 ^ 23 B (< 12 MiB)-- g170e-b10c128-s1141046784-d204142634.bin----(6);
"featherweight"-----2 ^ 24 B (12 - 24 MiB)--- I haven't;
"lightweight"-------2 ^ 25 B (24 - 48 MiB)--- g170e-b15c192-s1672170752-d466197061.bin----(5);
"welterweight"------2 ^ 26 B (48 - 96 MiB)--- g170e-b20c256x2-s5303129600-d1228401921.bin-(4);
"middleweight"------2 ^ 27 B (96 - 192 MiB)-- g170-b40c256x2-s5095420928-d1229425124.bin--(2);
"light heavyweight"-2 ^ 28 B (192 - 384 MiB)- g170-b30c320x2-s4824661760-d1229536699.bin--(1);
"heavyweight"-------2 ^ 29 B (384 - 768 MiB)- g170e-b40c384x2-s2348692992-d1229892979.bin-(3);
"super heavyweight"->|= 2 ^ 30 B (> 768 MiB)- I haven't.

q30 · Post by **q30** » Sat Nov 28, 2020 5:35 am

Katago v1.7.0 isn't stronger, then v1.6.1 (details).

lightvector · Post by **lightvector** » Sat Nov 28, 2020 8:38 am

Posted in reply also on github too:

You are right, because v1.7.0 does not add any new features that affect strength, it mostly only adds things like CUDA 11 support and changes to the interface that make analysis tools much more flexible.

However if I understand your link correctly, you are right only by accident, because the number of games you've played is massively far too few to be reliable in determining that one version isn't stronger than another. Please be wary of drawing conclusions from such tiny numbers of games. For example in the past, I have absolutely have had genuinely stronger versions of a bot be losing more than winning even after 50 or 100 games, simply due to getting unlucky with statistical noise, before hundreds and even thousands more games finally showed (almost certainly) that the true winning chance was actually greater than 50% and that the version that was losing more at first was truly the stronger version.

Which of course means that even smaller numbers of games like 8 games, or 4 games, are definitely too few to make any reliable determinations of strength, beyond establishing that neither side is completely outright buggy and disastrously weaker (if the result wasn't entirely lopsided).

---

Also, if the parentheses you have in your weight category list "(1)", "(3)", etc. are meant to be an ordering of equal-compute strength then I think you may have it wrong. From early testing, I recall that that g170-b40c256x2-s5095420928-d1229425124.bin should be stronger than g170-b30c320x2-s4824661760-d1229536699.bin even at equal playouts. It's both faster and smaller and stronger, there's no reason to use the 30-block network.

If you found the 30 block network as stronger, again that might be due to playing too few games to accurately measure. I wouldn't want to give a misleading impression to people picking which network to use based on statistical noise in a tiny sample size.

q30 · Post by **q30** » Sat Nov 28, 2020 10:30 am

I'm using equal time on move settings and compute resources (details). With the playouts equal number bigger "end-trained" network must be stronger, than smaller one. But because network with bigger number of blocks (40) has smaller number of somewhat "c" (256), they are almost equal in memory and visits number using in the mentioned conditions. And their strength is equal on practical end user opinion (my tests are doing on this opinion): the score is 5-3. So in this rate (1) and (2) are given formally, for certainty, and the results aren't wrong on mentioned opinion...
But I don't know, are these networks "end-trained" how AlphaZero's one on supercomputer, or they will be upgraded in future, how it's with LeelaZero, whose networks are updated continuously...

q30 · Post by **q30** » Sat Feb 13, 2021 3:11 am

The rate of the LeelaZero "weight categories" 2020 year winners (details):
"bantamweight".......<|= 2 ^ 23 B (< 12 MiB) - best_5b.txt..............................................................................(6)
"featherweight"........2 ^ 24 B (12 - 24 MiB) - I haven't
"lightweight"............2 ^ 25 B (24 - 48 MiB) - LeelaMaster_E08.txt....................................................................(5)
"welterweight".........2 ^ 26 B (48 - 96 MiB) - 15b_270_856k_q..........................................................................(2)
"middleweight".........2 ^ 27 B (96 - 192 MiB) -20b_266_168k_q.........................................................................(4)
"light heavyweight"...2 ^ 28 B (192 - 384 MiB)-40b_273d_256k_q.......................................................................(1)
"heavyweight".........2 ^ 29 B (384 - 768 MiB)-14a3a5f70aba55e312af52f97fe44b1376f0b9a966639e8f21ca38333886dd3b.(3)
"super heavyweight".>|= 2 ^ 30 B (> 768 MiB)-I haven't.

as0770 · Post by **as0770** » Tue Feb 16, 2021 11:33 am

lightvector wrote:the number of games you've played is massively far too few to be reliable in determining that one version isn't stronger than another.

Don't do that. He won't get it. He has no idea of statistics, but tries to explain the world. To make it even worse he is not able to communicate in english.

Bye, I am off again

q30 · Post by **q30** » Sat Feb 27, 2021 4:52 am

There some has no idea about that for engines end user is not any importance what engine (neuronet) is stronger than another one only in big statistics, because human plays only few (compared with big statistics) games...

q30 · Post by **q30** » Sat Feb 27, 2021 4:54 am

KataGo wins LeelaZero in all corresponding (both by "weight" and its rate) categories (details).

q30 · Post by **q30** » Sat Mar 06, 2021 4:38 am

KataGo v.1.8.0 improvements doesn't make it stronger for end user opinion (details).

lightvector · Post by **lightvector** » Sat Mar 06, 2021 7:28 am

q30 wrote:There some has no idea about that for engines end user is not any importance what engine (neuronet) is stronger than another one only in big statistics, because human plays only few (compared with big statistics) games...

There is an idea here that you might not be taking into account:
The best way to test what will be better for end users who play few games... is NOT to run few games yourself, but instead is to run "big statistics".

Why? Well for example, if you run only a small-game test and mistakenly think the version that in truth wins 47% is better than the version that in truth wins 52% because your test got unlucky, then out of every 100 users who follow your recommendation and run one game or one analysis, on average 5 of them will get unnecessarily worse results due to your bad luck. Even if it only makes a difference for a few users, better to use enough games so that your own luck is not an issue.

q30 · Post by **q30** » Sat Mar 06, 2021 9:13 am

In mentioned case the result is "practically equivalent". So recommendation is: You can download the strongest one in this rate- it will be not any practically noticeable weaker, than other in this rate...

More important is performance*time equivalence of tests to real games, because of the U-shaped score dependence on it, that was given earlier in this topic.

Life In 19x19

Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament

Re: Engine Tournament