Life In 19x19

Posted: **Wed May 09, 2018 11:00 pm**

In 3 weeks, LZ has made great progress between networks e860 (04/18) and b633 (05/09)
The chain of networks can be seen as :
04/22 : 3f6c wins over e860 by 243 games out of a total of 437 (55,61%) , then
04/25 : 1586-3f6c 235/433
04/29 : cfb2-1586 254/433
05/02 : 18e6-cfb2 239/413
05/04 : ecab-18e6 226/412
05/06 : 3737-ecab 253/429
05/07 : 4be6-3737 239/433
05/08 : 2fb0-4be6 235/426
05/08 : 05b7-2fb0 240/426
05/09 : b633-05b7 235/428

So, b633 should win 91.94% of its games against e860, not so far from the win percentage of LZ_ELF against LZ_best_normal_network (93-94%)
Maybe it's partly because the ELF weights loaded the dice for the most recent networks, but still, in just 3 weeks, it's an impressive progression

Posted: **Wed May 09, 2018 11:19 pm**

LZ_ELF (62b5417b) won 94.20 % games against b6337c69, so LeelaZero is still far from ELF.

Posted: **Thu May 10, 2018 12:33 am**

You're right.
I was just saying that the 3 weeks difference between e860 and b633 is comparable to the difference between b633 and 62b5.
If the progression rate doesn't drop too much, in 3-4 weeks, LZ's "normally promoted network" could surpass 62b5.

Posted: **Thu May 10, 2018 1:38 am**

What is also interesting, that Leela with ELF weights won 93 % of games against LZ #132. After that there has been a skyrocket rise of stronger networks in just a few days, with LZ #136 being 150 ELO stronger than #132 in self-play (I think it's actually cumulative ELO so 136vs135 + 135vs134 + ... + 133vs132).

Now that they tried the stronger #136 again against ELF network, the ELF won 94 % of games. So ELO difference jumped from 450 to 490! I wonder if this is:

1) Statistical variance -- the winrate of LZ networks is small so random chance plays a role
2) Another change made at point of introducing ELF, "t=1" (whatever that means) changed the playing conditions, and quick ELO leap of networks is related to network adjusting to "new possibilities"
3) Once LZ playing style comes closer to ELF, the wins become rarer

Last option seems also possible. With humans, if one is a strong moyo-oriented player A and weaker but territory-oriented player B, the weaker B may win more games than a slightly stronger player C who also plays moyo. It's similar effect as the heightened ELO difference in self-play, because minuscule advantage with same playing style may mean 80 % win rate against the weaker version.

I'm hoping a few weeks will show that LZ is narrowing the gap against ELF network, I like LZ networks better, least of all because they play handicap games. Just tried yesterday against LZ #136 with 1 playout (so it just picks the top move without any search), and got crushed by move #100 even with 4 handicap stones.

(I'm EGF 1 kuy so not a strong dan).

Posted: **Tue May 15, 2018 5:19 am**

There's a new best network (90560), and it should win 93.24% of its games against the old e860.
I've run a 100 games TWOGTP-match between these two (--visits=3201 --noponder)
I find the result surprising : 90560 won "only" 78-22 (e860 won 8 games as B, and 14 as W)
78 seems a bit far from 93...
Maybe 100 games isn't enough, or am I missing something ?
Anyway, I'll set up another match, maybe with more games

Posted: **Tue May 15, 2018 5:50 am**

Why do you think it should win 93.24% of its games against e860?

If it's because of the supposed Elo difference on the web page graph, be aware that cumulative strength increases are smaller than they look there. Leela has not really gained 11000 Elo since it started. I forget what the ratio is, but when you compare two historical networks in a match, their results are significantly closer than you would expect just by looking at their places on that graph.

Posted: **Tue May 15, 2018 7:00 am**

04/22 : 3f6c wins over e860 by 243 games out of a total of 437 (55,61%) , then
04/25 : 1586-3f6c 235/433
04/29 : cfb2-1586 254/433
05/02 : 18e6-cfb2 239/413
05/04 : ecab-18e6 226/412
05/06 : 3737-ecab 253/429
05/07 : 4be6-3737 239/433
05/08 : 2fb0-4be6 235/426
05/08 : 05b7-2fb0 240/426
05/09 : b633-05b7 235/428
05/14 : 9056-b633 232/424

A wins wa games out of a total of t1 games against B
B wins wb games out of t2 against C
C wins wc games out of t3 against D
D wins wd games out of t4 against E
(etc)

z = wa/(t1-wa) * wb/(t2-wb) * wc/(t3-wc) * wd/(t4-wd)

A should win z/(z+1) % of its games against E

Here, the "cumulative" percentages from e860 to 9056 are

55.61%
59.78%
66.64%
73.29%
76.92%
82.73%
85.51%
87.90%
90.36%
91.94%
93.24%

But maybe I'm missing something, because the 78 wins seem low.

Posted: **Tue May 15, 2018 7:39 am**

Yeah, the Elo model notwithstanding, it turns out that you can't just concatenate a string of self-play rating differences like that; as you observed, it will always be too optimistic. I'm not sure whether this is purely the result of trying to accumulate a bunch of small rating differences, or if it has to do with self-play match results being less generalizable than a dataset with games against multiple opponents. It is well known that the rating graph on http://zero.sjeng.org/ is far too optimistic as far as "actual Elo" goes.

Posted: **Tue May 15, 2018 8:38 am**

You're right about the Elo model, but I don't use Elo differences, I only use win percentages from actual matches, and that should be "transitive".

Posted: **Tue May 15, 2018 9:18 am**

Vargo wrote:You're right about the Elo model, but I don't use Elo differences, I only use win percentages from actual matches, and that should be "transitive".

Instead of percentages, take a look at the log of the odds. IMX, that's more informative. (In the human sense of the term.

)

Posted: **Tue May 15, 2018 9:19 am**

If Andrew beats Bob 60% of the time and Bob beat Charlie 60% of the time what do you think Andrew's win rate against Charlie is? I don't think you can really say much, it might even be less than 50%, though in general it will be >60%, (how much more I've no idea, but I've a feeling something more like a geometric than arithmetic mean is likely to be less wrong).

Posted: **Tue May 15, 2018 9:46 am**

Andrew would win 69.23% of his games against Charlie, I think.

One can see that this formula works in cases where the outcome is obvious :
A wins 50% against B, who wins 50% against C (--> A wins 50% against C)
or A wins 1 game out of 3 against B, who wins 2 games out of 3 against C (A wins 50% against C)
So, it seems right to me, but I'm looking forward to setting up further matches to verify this

Posted: **Tue May 15, 2018 11:38 am**

There is no particular reason that winning percentages have to be related in this exact mathematical way.

For example, Alice, Bob and Carol all play the classic game "Whose random number is bigger?". Alice is a beginner and picks integers from 1 to 100 uniformly at random. Bob is more experienced and picks integers from 51 to 150 uniformly at random. Carol is an expert and picks integers from 101 to 200 uniformly at random (she's very good at this game, though you can probably imagine even better strategies).

How often does Bob beat Alice? How often does Carol beat Bob? How often does Carol beat Alice?

Posted: **Tue May 15, 2018 11:54 am**

Vargo wrote:Andrew would win 69.23% of his games against Charlie, I think.

One can see that this formula works in cases where the outcome is obvious :
A wins 50% against B, who wins 50% against C (--> A wins 50% against C)
or A wins 1 game out of 3 against B, who wins 2 games out of 3 against C (A wins 50% against C)
So, it seems right to me, but I'm looking forward to setting up further matches to verify this

Using odds, (3/2) (3/2) = 9/4.

However, in a multi-skill game like go, I would expect the odds to be less than that.

Posted: **Tue May 15, 2018 12:05 pm**

Vargo wrote:Andrew would win 69.23% of his games against Charlie, I think.

This assumes that the observed winrates equal to their theoretical values (without sampling errors), and also that there are no distorting factors (like various correlations).

Both assumptions seems wrong here, the first one in particular. Consider the extreme: a program has a bug and it plays randomly with all networks. You would still see a climbing elo graph (in a few percent of matches one side would go above the promotion threshold by pure luck), but the latest net would not do well against the first one.

Life In 19x19

LZ's progression

LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression