Life In 19x19
http://www.lifein19x19.com/

What is the Elo scale used by Leela Zero?
http://www.lifein19x19.com/viewtopic.php?f=18&t=15912
Page 1 of 1

Author:  chut [ Sat Jul 21, 2018 8:32 pm ]
Post subject:  What is the Elo scale used by Leela Zero?

Hi, can someone post a link on the elo scale used by Leela Zero? Alphago Master is reporting Elo 4700, but LZ is already reaching 12000 so obviously the two are using different Elo system.

In LZ's system what number would be a top human player? And what is the elo difference that would give a winning rate of say 90%? How many stone difference is the current 12000 strength against a top professional now?

Many thanks.
chut

Author:  moha [ Sun Jul 22, 2018 4:45 am ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

IIRC Alphago anchored it's scale to a known human level, while LZ is anchored at random play = 0. But since this is only selfplay rating (built from observed match diffs) it is not really comparable between different bots anyway, and is not to be taken too seriously (contains a lot of luck factor, inflated at least a few times). The difference to a top human can only be guessed from actual results and also depends heavily on hardware.

Author:  chut [ Fri Aug 03, 2018 6:47 pm ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

Just wondering, where would top professionals be in LZ's scale? Around 7000?

Author:  moha [ Sat Aug 04, 2018 6:14 am ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

Presumably higher. You can try to look up past posts about LZ <> human matches that happened at various scale points (KGS/OGS matches etc.). Also that scale was established at 3200 visit limit per move. While LZ is stronger than most pros at match conditions now, this assumes top hw and hundred thousands visits.

Author:  Vargo [ Fri Aug 10, 2018 9:20 pm ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

The same network 62b541 (ELF v0) climbs from Elo 11198 (leftmost green circle) to Elo 12080, that's a 882 gain !!!

About the Elo scale, Wikipedia says :
Quote:
The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.

And gives a formula to determine the theoretical expected score of two players.
According to that formula, a 882 Elo difference means the best player will win 99.38% of the games...

What does it mean ? ELF v0 is really so much stronger than ELF v0 ???
Attachment:
canvas.jpg
canvas.jpg [ 81.26 KiB | Viewed 9242 times ]

Author:  Uberdude [ Fri Aug 10, 2018 10:26 pm ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

Vargo wrote:
What does it mean ? ELF v0 is really so much stronger than ELF v0 ???


No. The test matches against the fixed-strength external benchmark of Elf v0 are an excellent illustration of the inflation in Elo calculated from successive versions of Leela Zero. Elf's Elo comes from a match against e,g version 144 of LZ with "inflated self-play Elo scale" rating of say 11300, seeing it wins e.g. 85%, and then plugging in the Elo formula that says that means you are 300 points higher so Elf_vs_144 is 11300+300 = 11600. But then a few versions of LZ later we estimate LZ 150 at 11530, based on a bunch of successive networks having about 55% win against the previous one in test matches. But when you play Elf against LZ150 it wins 80%, much more than what you'd expect from an 11600 vs an 11530. So then you say 80% win means 240 Elo higher so give it a new estimate of 11530 + 240 = 11770. So the gap is narrowing, but Elf keeps running away as you get closer to it. 2 months ago I made an estimate using a little linear extrapolation on just 3 data points of Elf's true rating on the LZ self-play Elo scale being around 13000, and just eyeballing the extra test matches we have now that doesn't look too far wrong.

Author:  moha [ Sat Aug 11, 2018 2:05 am ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

Vargo wrote:
What does it mean ? ELF v0 is really so much stronger than ELF v0 ???
The Elo distance between these tests is roughly the amount of luck accumulated by the promoted networks in test matches (the primary source of inflation).

Author:  Vargo [ Sat Aug 11, 2018 7:07 am ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

I understand your explanations, Uberdude and moha, thanks.
But I can't help finding wrong to call "Elo" a scale where one player can climb 900 points without gaining any strength...

Author:  Uberdude [ Sat Aug 11, 2018 8:41 am ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

Perhaps we should call it Leelo not Elo!

Author:  Bill Spight [ Sat Aug 11, 2018 8:50 am ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

Vargo wrote:
I understand your explanations, Uberdude and moha, thanks.
But I can't help finding wrong to call "Elo" a scale where one player can climb 900 points without gaining any strength...


I confess that I have not studied the Elo system, but I did devise and administer a ratings system myself once upon a time. One assumption that I suspect the Elo system makes is that playing strength is reducible to a single number. We know that, while that is approximately correct, it is not true, because of lack of transitivity. E.g., Player A usually beats Player B, who usually beats Player C, who usually beats Player A.

IMO it is better to think of a player having a number of different strengths which together produce an average strength, depending upon the different strengths of his opponents. If you are not good at making life, for instance, your rating will generally be worse if your opponents with a certain average Elo rating are good at killing than if your opponents with the same average rating are not good at killing. So for Elo ratings (or mine, as I was aware) to generalize, you want players to play opponents with a wide variety of different strengths.

The different versions of Leela do not face opponents with a wide variety of different strengths, when they determine their ratings by self play. Obviously. So, IMO, we should not expect their ratings to generalize when they face opponents with different strengths. Such a Elf, apparently. ;)

Edit: In this case, it seems that the different versions of Leela have gotten better and better with regard to certain shared strengths, while retaining relative weaknesses which Elf is able to exploit. If they had trained against Elf or other strong bots, they would very probably have improved against them, as well. This is why it is not a bad idea to develop at least two strong bots at the same time, who play against each other and ferret out each other's weaknesses. :)

Edit 2: It's not too late. If I were training Leela, I would forget self play and train it against Elf for a while. :)

Author:  jlt [ Sat Aug 11, 2018 9:09 am ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

Another factor is that a new network can be promoted without being stronger. If current network A is of equal strength to new network B, and 400 test matches are run, then there is about 1 chance out of 39 that B wins at least 55% of the matches against A, so B will get promoted with 36 Elo points above A, although A and B are of equal strength.

Of course, in theory this could go in the other direction, strength of new networks could be underestimated, but this is rarely the case since new networks are almost never much stronger than current networks.

P.S. There is a certain amount of Elf self-play games, see http://zero.sjeng.org/

Author:  Bill Spight [ Sat Aug 11, 2018 9:18 am ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

jlt wrote:
Another factor is that a new network can be promoted without being stronger. If current network A is of equal strength to new network B, and 400 test matches are run, then there is about 1 chance out of 39 that B wins at least 55% of the matches against A, so B will get promoted with 36 Elo points above A, although A and B are of equal strength.

Of course, in theory this could go in the other direction, strength of new networks could be underestimated, but this is rarely the case since new networks are almost never much stronger than current networks.

P.S. There is a certain amount of Elf self-play games, see http://zero.sjeng.org/


Hasn't this kind of inflation recently been checked by playing new versions against a number of older versions?

Author:  Uberdude [ Sat Aug 11, 2018 2:53 pm ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

Here's details of some of the Elf matches (is there a way to see more than the last 100?):
Code:
Date       LZ version      LZ Elo           Elf win %     Elf Elo
30 July    #159 20b        11844            79.5          12080
29 July    #157 15b        11806            72.5          11974
27 June    #151 or 152     11566 or 11609   ?             11886
20 May     #141            11185            90            11560
5 May      #136            10959            94            11444
4 May      #130.5          10730            93            11198


So in the timespan above LZ's Elo from inflated self-play measurement (henceforth called Leelo) has grown 1114. Elf's has grown 882. But we know it's really the same, so we could say LZ's true Elo gain is more like 1114-882 = 232 which is just 20% of 1114. So, at least at this stage in LZ's growth we could say the Leelo ratings are inflated by a factor of about 5. Also note the recent 20 block networks are still weaker than #157 (Elf won more vs 159) so there's still plenty of catching up to do.

On 5th May Elf was 468 real Elo ahead of LZ. On 30 July was 236 which is close to half. So I estimate LZ needs to gain about 1100 Leelo to be the same strength as Elf (v0, v1 is stronger). Plugging the numbers I get 12977, surprisingly close to my 13020 from before given how dodgy extrapolation often is (would be less if I used #157 though, but 15 block had plateaud so now we get some extra inflation with the initially weaker 20 block networks which hopefully have capacity to improve further). bijyxo's latest 40 block LZ though is getting pretty close to Elf and v1 at that: Elf v1 only scored 58% in a match 2 days ago.

Author:  Uberdude [ Tue Aug 28, 2018 3:17 am ]
Post subject:  Re: What is the Elo scale used by Leela Zero?

LZ's 20b networks have been progressing well in the vs-last-version elo landscape, going from the first 20 block #158 at 11775 to the lastest #172 at 12236. But what does this mean against Elf, here's a few more results added to the table:

Code:
Date       LZ version      LZ Elo           Elf win %     Elf Elo
25 Aug     #172 20b        12236            69.7          12382
16 Aug     #166 20b        12065            74.0          12247
30 July    #159 20b        11844            79.5          12080
29 July    #157 15b        11806            72.5          11974
27 June    #151 or 152     11566 or 11609   ?             11886
20 May     #141            11185            90            11560
5 May      #136            10959            94            11444
4 May      #130.5          10730            93            11198


So the latest 20b does look to finally be better against Elf v0 than the best 15b, but not much.

Page 1 of 1 All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/