LZ's progression

For discussing go computing, software announcements, etc.
Post Reply
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

LZ's progression

Post by Vargo »

In 3 weeks, LZ has made great progress between networks e860 (04/18) and b633 (05/09)
The chain of networks can be seen as :
04/22 : 3f6c wins over e860 by 243 games out of a total of 437 (55,61%) , then
04/25 : 1586-3f6c 235/433
04/29 : cfb2-1586 254/433
05/02 : 18e6-cfb2 239/413
05/04 : ecab-18e6 226/412
05/06 : 3737-ecab 253/429
05/07 : 4be6-3737 239/433
05/08 : 2fb0-4be6 235/426
05/08 : 05b7-2fb0 240/426
05/09 : b633-05b7 235/428

So, b633 should win 91.94% of its games against e860, not so far from the win percentage of LZ_ELF against LZ_best_normal_network (93-94%)
Maybe it's partly because the ELF weights loaded the dice for the most recent networks, but still, in just 3 weeks, it's an impressive progression :clap:
User avatar
jlt
Gosei
Posts: 1786
Joined: Wed Dec 14, 2016 3:59 am
GD Posts: 0
Has thanked: 185 times
Been thanked: 495 times

Re: LZ's progression

Post by jlt »

LZ_ELF (62b5417b) won 94.20 % games against b6337c69, so LeelaZero is still far from ELF.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

You're right.
I was just saying that the 3 weeks difference between e860 and b633 is comparable to the difference between b633 and 62b5.
If the progression rate doesn't drop too much, in 3-4 weeks, LZ's "normally promoted network" could surpass 62b5.
jokkebk
Dies in gote
Posts: 44
Joined: Tue Feb 01, 2011 4:47 am
Rank: EGF 1 kyu
GD Posts: 0
KGS: finity
Has thanked: 2 times
Been thanked: 14 times

Re: LZ's progression

Post by jokkebk »

What is also interesting, that Leela with ELF weights won 93 % of games against LZ #132. After that there has been a skyrocket rise of stronger networks in just a few days, with LZ #136 being 150 ELO stronger than #132 in self-play (I think it's actually cumulative ELO so 136vs135 + 135vs134 + ... + 133vs132).

Now that they tried the stronger #136 again against ELF network, the ELF won 94 % of games. So ELO difference jumped from 450 to 490! I wonder if this is:

1) Statistical variance -- the winrate of LZ networks is small so random chance plays a role
2) Another change made at point of introducing ELF, "t=1" (whatever that means) changed the playing conditions, and quick ELO leap of networks is related to network adjusting to "new possibilities"
3) Once LZ playing style comes closer to ELF, the wins become rarer

Last option seems also possible. With humans, if one is a strong moyo-oriented player A and weaker but territory-oriented player B, the weaker B may win more games than a slightly stronger player C who also plays moyo. It's similar effect as the heightened ELO difference in self-play, because minuscule advantage with same playing style may mean 80 % win rate against the weaker version.

I'm hoping a few weeks will show that LZ is narrowing the gap against ELF network, I like LZ networks better, least of all because they play handicap games. Just tried yesterday against LZ #136 with 1 playout (so it just picks the top move without any search), and got crushed by move #100 even with 4 handicap stones. :D (I'm EGF 1 kuy so not a strong dan).
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

There's a new best network (90560), and it should win 93.24% of its games against the old e860.
I've run a 100 games TWOGTP-match between these two (--visits=3201 --noponder)
I find the result surprising : 90560 won "only" 78-22 (e860 won 8 games as B, and 14 as W)
78 seems a bit far from 93...
Maybe 100 games isn't enough, or am I missing something ?
Anyway, I'll set up another match, maybe with more games ;-)
dfan
Gosei
Posts: 1598
Joined: Wed Apr 21, 2010 8:49 am
Rank: AGA 2k Fox 3d
GD Posts: 61
KGS: dfan
Has thanked: 891 times
Been thanked: 534 times
Contact:

Re: LZ's progression

Post by dfan »

Why do you think it should win 93.24% of its games against e860?

If it's because of the supposed Elo difference on the web page graph, be aware that cumulative strength increases are smaller than they look there. Leela has not really gained 11000 Elo since it started. I forget what the ratio is, but when you compare two historical networks in a match, their results are significantly closer than you would expect just by looking at their places on that graph.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

04/22 : 3f6c wins over e860 by 243 games out of a total of 437 (55,61%) , then
04/25 : 1586-3f6c 235/433
04/29 : cfb2-1586 254/433
05/02 : 18e6-cfb2 239/413
05/04 : ecab-18e6 226/412
05/06 : 3737-ecab 253/429
05/07 : 4be6-3737 239/433
05/08 : 2fb0-4be6 235/426
05/08 : 05b7-2fb0 240/426
05/09 : b633-05b7 235/428
05/14 : 9056-b633 232/424

A wins wa games out of a total of t1 games against B
B wins wb games out of t2 against C
C wins wc games out of t3 against D
D wins wd games out of t4 against E
(etc)

z = wa/(t1-wa) * wb/(t2-wb) * wc/(t3-wc) * wd/(t4-wd)

A should win z/(z+1) % of its games against E

Here, the "cumulative" percentages from e860 to 9056 are

55.61%
59.78%
66.64%
73.29%
76.92%
82.73%
85.51%
87.90%
90.36%
91.94%
93.24%

But maybe I'm missing something, because the 78 wins seem low.
dfan
Gosei
Posts: 1598
Joined: Wed Apr 21, 2010 8:49 am
Rank: AGA 2k Fox 3d
GD Posts: 61
KGS: dfan
Has thanked: 891 times
Been thanked: 534 times
Contact:

Re: LZ's progression

Post by dfan »

Yeah, the Elo model notwithstanding, it turns out that you can't just concatenate a string of self-play rating differences like that; as you observed, it will always be too optimistic. I'm not sure whether this is purely the result of trying to accumulate a bunch of small rating differences, or if it has to do with self-play match results being less generalizable than a dataset with games against multiple opponents. It is well known that the rating graph on http://zero.sjeng.org/ is far too optimistic as far as "actual Elo" goes.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

You're right about the Elo model, but I don't use Elo differences, I only use win percentages from actual matches, and that should be "transitive".
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: LZ's progression

Post by Bill Spight »

Vargo wrote:You're right about the Elo model, but I don't use Elo differences, I only use win percentages from actual matches, and that should be "transitive".
Instead of percentages, take a look at the log of the odds. IMX, that's more informative. (In the human sense of the term. ;))
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: LZ's progression

Post by Uberdude »

If Andrew beats Bob 60% of the time and Bob beat Charlie 60% of the time what do you think Andrew's win rate against Charlie is? I don't think you can really say much, it might even be less than 50%, though in general it will be >60%, (how much more I've no idea, but I've a feeling something more like a geometric than arithmetic mean is likely to be less wrong).
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

Andrew would win 69.23% of his games against Charlie, I think.

One can see that this formula works in cases where the outcome is obvious :
A wins 50% against B, who wins 50% against C (--> A wins 50% against C)
or A wins 1 game out of 3 against B, who wins 2 games out of 3 against C (A wins 50% against C)
So, it seems right to me, but I'm looking forward to setting up further matches to verify this :)
dfan
Gosei
Posts: 1598
Joined: Wed Apr 21, 2010 8:49 am
Rank: AGA 2k Fox 3d
GD Posts: 61
KGS: dfan
Has thanked: 891 times
Been thanked: 534 times
Contact:

Re: LZ's progression

Post by dfan »

There is no particular reason that winning percentages have to be related in this exact mathematical way.

For example, Alice, Bob and Carol all play the classic game "Whose random number is bigger?". Alice is a beginner and picks integers from 1 to 100 uniformly at random. Bob is more experienced and picks integers from 51 to 150 uniformly at random. Carol is an expert and picks integers from 101 to 200 uniformly at random (she's very good at this game, though you can probably imagine even better strategies).

How often does Bob beat Alice? How often does Carol beat Bob? How often does Carol beat Alice?
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: LZ's progression

Post by Bill Spight »

Vargo wrote:Andrew would win 69.23% of his games against Charlie, I think.

One can see that this formula works in cases where the outcome is obvious :
A wins 50% against B, who wins 50% against C (--> A wins 50% against C)
or A wins 1 game out of 3 against B, who wins 2 games out of 3 against C (A wins 50% against C)
So, it seems right to me, but I'm looking forward to setting up further matches to verify this :)
Using odds, (3/2) (3/2) = 9/4. :)

However, in a multi-skill game like go, I would expect the odds to be less than that.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
moha
Lives in gote
Posts: 311
Joined: Wed May 31, 2017 6:49 am
Rank: 2d
GD Posts: 0
Been thanked: 45 times

Re: LZ's progression

Post by moha »

Vargo wrote:Andrew would win 69.23% of his games against Charlie, I think.
This assumes that the observed winrates equal to their theoretical values (without sampling errors), and also that there are no distorting factors (like various correlations).

Both assumptions seems wrong here, the first one in particular. Consider the extreme: a program has a bug and it plays randomly with all networks. You would still see a climbing elo graph (in a few percent of matches one side would go above the promotion threshold by pure luck), but the latest net would not do well against the first one.
Post Reply