Engine Tournament

For discussing go computing, software announcements, etc.
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

The best "light heavyweight" LeelaZero neuronet in 2019 year was 40b_257a_64k_q (details)
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

The best LeelaZero "welterweight" file was 15b_257_248k_q in 2019 (details).
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

The best weight of LeelaZero neuronet was 40b_257a_64k_q in 2019 year (details).
So, the rate of "weight categories" in 2019 was the next:
"bantamweight" <|= 2 ^ 23 B (< 12 MiB) - best_5b.txt;................................................................................(6)
"featherweight" 2 ^ 24 B (12 - 24 MiB) - I haven't;
"lightweight" 2 ^ 25 B (24 - 48 MiB) - LeelaMaster_E08.txt;..........................................................................(5)
"welterweight" 2 ^ 26 B (48 - 96 MiB) - 15b_257_248k_q;.............................................................................(2)
"middleweight" 2 ^ 27 B (96 - 192 MiB) - 20b_254_784k_q;............................................................................(4)
"light heavyweight" 2 ^ 28 B (192 - 384 MiB) - 40b_257a_64k_q;....................................................................(1)
"heavyweight" 2 ^ 29 B (384 - 768 MiB) - 14a3a5f70aba55e312af52f97fe44b1376f0b9a966639e8f21ca38333886dd3b;...(3)
"super heavyweight" >|= 2 ^ 30 B (> 768 MiB) - I haven't.
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

The rate of Go engines without using GPU after 2019 year was the next (details):

Top level
1) LeelaZero

High level
2) Leela
3) Rayon
4) Hiratuka
5) Zenith

Middle level
6) Pachi_DCNN
7) Ray
8) Pachi
9) MoGo
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

The rate of "weight category" KataGo winners (details):

"bantamweight"------<|= 2 ^ 23 B (< 12 MiB)-- g170e-b10c128-s1141046784-d204142634.bin----(6);
"featherweight"-----2 ^ 24 B (12 - 24 MiB)--- I haven't;
"lightweight"-------2 ^ 25 B (24 - 48 MiB)--- g170e-b15c192-s1672170752-d466197061.bin----(5);
"welterweight"------2 ^ 26 B (48 - 96 MiB)--- g170e-b20c256x2-s5303129600-d1228401921.bin-(4);
"middleweight"------2 ^ 27 B (96 - 192 MiB)-- g170-b40c256x2-s5095420928-d1229425124.bin--(2);
"light heavyweight"-2 ^ 28 B (192 - 384 MiB)- g170-b30c320x2-s4824661760-d1229536699.bin--(1);
"heavyweight"-------2 ^ 29 B (384 - 768 MiB)- g170e-b40c384x2-s2348692992-d1229892979.bin-(3);
"super heavyweight"->|= 2 ^ 30 B (> 768 MiB)- I haven't.
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

Katago v1.7.0 isn't stronger, then v1.6.1 (details).
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Engine Tournament

Post by lightvector »

Posted in reply also on github too:

You are right, because v1.7.0 does not add any new features that affect strength, it mostly only adds things like CUDA 11 support and changes to the interface that make analysis tools much more flexible.

However if I understand your link correctly, you are right only by accident, because the number of games you've played is massively far too few to be reliable in determining that one version isn't stronger than another. Please be wary of drawing conclusions from such tiny numbers of games. For example in the past, I have absolutely have had genuinely stronger versions of a bot be losing more than winning even after 50 or 100 games, simply due to getting unlucky with statistical noise, before hundreds and even thousands more games finally showed (almost certainly) that the true winning chance was actually greater than 50% and that the version that was losing more at first was truly the stronger version.

Which of course means that even smaller numbers of games like 8 games, or 4 games, are definitely too few to make any reliable determinations of strength, beyond establishing that neither side is completely outright buggy and disastrously weaker (if the result wasn't entirely lopsided).

---

Also, if the parentheses you have in your weight category list "(1)", "(3)", etc. are meant to be an ordering of equal-compute strength then I think you may have it wrong. From early testing, I recall that that g170-b40c256x2-s5095420928-d1229425124.bin should be stronger than g170-b30c320x2-s4824661760-d1229536699.bin even at equal playouts. It's both faster and smaller and stronger, there's no reason to use the 30-block network.

If you found the 30 block network as stronger, again that might be due to playing too few games to accurately measure. I wouldn't want to give a misleading impression to people picking which network to use based on statistical noise in a tiny sample size.
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

I'm using equal time on move settings and compute resources (details). With the playouts equal number bigger "end-trained" network must be stronger, than smaller one. But because network with bigger number of blocks (40) has smaller number of somewhat "c" (256), they are almost equal in memory and visits number using in the mentioned conditions. And their strength is equal on practical end user opinion (my tests are doing on this opinion): the score is 5-3. So in this rate (1) and (2) are given formally, for certainty, and the results aren't wrong on mentioned opinion...
But I don't know, are these networks "end-trained" how AlphaZero's one on supercomputer, or they will be upgraded in future, how it's with LeelaZero, whose networks are updated continuously...
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

The rate of the LeelaZero "weight categories" 2020 year winners (details):
"bantamweight".......<|= 2 ^ 23 B (< 12 MiB) - best_5b.txt..............................................................................(6)
"featherweight"........2 ^ 24 B (12 - 24 MiB) - I haven't
"lightweight"............2 ^ 25 B (24 - 48 MiB) - LeelaMaster_E08.txt....................................................................(5)
"welterweight".........2 ^ 26 B (48 - 96 MiB) - 15b_270_856k_q..........................................................................(2)
"middleweight".........2 ^ 27 B (96 - 192 MiB) -20b_266_168k_q.........................................................................(4)
"light heavyweight"...2 ^ 28 B (192 - 384 MiB)-40b_273d_256k_q.......................................................................(1)
"heavyweight".........2 ^ 29 B (384 - 768 MiB)-14a3a5f70aba55e312af52f97fe44b1376f0b9a966639e8f21ca38333886dd3b.(3)
"super heavyweight".>|= 2 ^ 30 B (> 768 MiB)-I haven't.
as0770
Lives with ko
Posts: 180
Joined: Sun Jun 26, 2016 8:07 am
Rank: Beginner
GD Posts: 0
Has thanked: 15 times
Been thanked: 23 times

Re: Engine Tournament

Post by as0770 »

lightvector wrote:the number of games you've played is massively far too few to be reliable in determining that one version isn't stronger than another.
Don't do that. He won't get it. He has no idea of statistics, but tries to explain the world. To make it even worse he is not able to communicate in english.

Bye, I am off again :lol:
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

There some has no idea about that for engines end user is not any importance what engine (neuronet) is stronger than another one only in big statistics, because human plays only few (compared with big statistics) games...
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

KataGo wins LeelaZero in all corresponding (both by "weight" and its rate) categories (details).
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

KataGo v.1.8.0 improvements doesn't make it stronger for end user opinion (details).
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Engine Tournament

Post by lightvector »

q30 wrote:There some has no idea about that for engines end user is not any importance what engine (neuronet) is stronger than another one only in big statistics, because human plays only few (compared with big statistics) games...
There is an idea here that you might not be taking into account:
The best way to test what will be better for end users who play few games... is NOT to run few games yourself, but instead is to run "big statistics".

Why? Well for example, if you run only a small-game test and mistakenly think the version that in truth wins 47% is better than the version that in truth wins 52% because your test got unlucky, then out of every 100 users who follow your recommendation and run one game or one analysis, on average 5 of them will get unnecessarily worse results due to your bad luck. Even if it only makes a difference for a few users, better to use enough games so that your own luck is not an issue.
q30
Lives with ko
Posts: 145
Joined: Sat Aug 13, 2016 8:23 am
Rank: 30 kyu
GD Posts: 0
Has thanked: 1 time
Been thanked: 1 time

Re: Engine Tournament

Post by q30 »

In mentioned case the result is "practically equivalent". So recommendation is: You can download the strongest one in this rate- it will be not any practically noticeable weaker, than other in this rate...

More important is performance*time equivalence of tests to real games, because of the U-shaped score dependence on it, that was given earlier in this topic.
Post Reply