Page 6 of 7

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 9:55 am
by gennan
Boidhre wrote:
gennan wrote:Roughly 10% of the EGD games in the range 10k-2d are between players more than 2 ranks apart. That's not very rare, I think.
Ah, from looking at the 2015-2019 data (I figure 2020 data is going to be weird) There's something of a threshold in tournaments here anyway around perhaps 2k where you start being above the bar more often than not and be playing even. Below that it would normally be handicap-1 or handicap-2. Around 90% of even games at mid sdk are G+1 and G+2 but yeah around 10% are G+3 and G+4 which is higher than I thought (I expected it to be less than 5% for mid SDK based on tournaments I've seen). G+1 in the winning rate table on the EGD is all even games up to 1 stone difference?
I think these tables are even games only. I think equal rank is not included in G+1.

I have seen tournaments where MacMahon groups at the lower end are sparsely populated, so it can happen that one needs to play an opponent currently more than 1 MacMahon group away from yours. But in that case, I think handicap is usually determined by the current difference in MacMahon score (minus 1 or 2). AFAIK, only in specific handicap tournaments, handicap is determined by rank difference (minus 1 or 2).

I have analysed raw EGD data quite a bit for the coming EGD update, so if you really want to know more detailed statistics, I could collect those.

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 10:04 am
by jlt
The Mc Mahon score is rank+nb of wins, and tournaments are usually played at handicap-1, so on round 4 of a tournament, an 8k with 3 wins can play an even game against a 4k with 0 wins.

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 10:46 am
by Boidhre
gennan wrote:
Boidhre wrote:
gennan wrote:Roughly 10% of the EGD games in the range 10k-2d are between players more than 2 ranks apart. That's not very rare, I think.
Ah, from looking at the 2015-2019 data (I figure 2020 data is going to be weird) There's something of a threshold in tournaments here anyway around perhaps 2k where you start being above the bar more often than not and be playing even. Below that it would normally be handicap-1 or handicap-2. Around 90% of even games at mid sdk are G+1 and G+2 but yeah around 10% are G+3 and G+4 which is higher than I thought (I expected it to be less than 5% for mid SDK based on tournaments I've seen). G+1 in the winning rate table on the EGD is all even games up to 1 stone difference?
I think these tables are even games only. I think equal rank is not included in G+1.

I have seen tournaments where MacMahon groups at the lower end are sparsely populated, so it can happen that one needs to play an opponent currently more than 1 MacMahon group away from yours. But in that case, I think handicap is usually determined by the current difference in MacMahon score (minus 1 or 2). AFAIK, only in specific handicap tournaments, handicap is determined by rank difference (minus 1 or 2).

I have analysed raw EGD data quite a bit for the coming EGD update, so if you really want to know more detailed statistics, I could collect those.
Ah, I see where my thinking went awry. We run *small* tournaments so we use strict -1 or -2 rank handicaps below the bar rather than try to use McMahon group based handicaps. I'd completely forgotten that this wouldn't be the norm in tournaments in many European countries with fuller mid-weak kyu fields. My apologies, I really shouldn't have made that error.

Re: EGD stats, the things that have most interested me recently as been how the expected win rate against G+1 evolves with rank and whether are inflection points if plotted out against GoR ratings. I'm curious about the relationship between increases in consistency of results and increases in strength. It's only an idle curiosity though rather than anything worth you putting time into.

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 11:36 am
by gennan
Boidhre wrote:Re: EGD stats, the things that have most interested me recently as been how the expected win rate against G+1 evolves with rank and whether are inflection points if plotted out against GoR ratings. I'm curious about the relationship between increases in consistency of results and increases in strength. It's only an idle curiosity though rather than anything worth you putting time into.
You can see the observed winning probabities against G+1 directly on the EGD statistics page.

Around 5d-6d it is about 25% (~190 Elo per 100 GoR)
Around 3d it is about 33% (~120 Elo per 100 GoR)
Around 1d it is about 36% (~100 Elo per 100 GoR)
Around 2k it is about 40% (~70 Elo per 100 GoR)
Around 15k-3k it is about 43% (~50 Elo per 100 GoR)

This table shows that ranks don't have a constant width when expressed as winning probabilities or conversely Elo rating gaps.
As ranks go lower, the Elo gap between consecutive ranks gradually decreases (is this what you mean by "consistency"?).
As ranks go higher, the Elo gap between consecutive ranks increases, more sharply as the rank gets higher and higher.

Below 15k, the winning probabilities go down again a bit, but I think this is an anomaly caused by the disturbance of the rank floor of 20k that the EGD currently uses (soon to be lowered to 30k). I assume that without this disturbance, the winning probability would be about 45% around 30k, increasing to 50% when approaching random play.

A non perfect player can never beat a (hypothetical) perfect player, so the non perfect player would have a winning probabilty of 0 and the perfect player would have an Elo rating of infinity. From handicap games against pros, it looks like AI should be ranked about 12d EGF. A perfect player might have a rank of about 13d.

Also see Elo per stone in Go on this page.

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 11:45 am
by jlt
gennan wrote:A non perfect player can never beat a (hypothetical) perfect player,
I disagree. A non perfect player has a tiny but nonzero chance of playing a perfect game. In a 300-move game, a random player has about 1/(361 x 359 x 357 x ... x 63) ~ 10-788 (Edit: 10-342) chances of playing the whole game perfectly.

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 11:50 am
by gennan
jlt wrote:
gennan wrote:A non perfect player can never beat a (hypothetical) perfect player,
I disagree. A non perfect player has a tiny but nonzero chance of playing a perfect game. In a 300-move game, a random player has about 1/(361 x 359 x 357 x ... x 63) ~ 10-788 chances of playing the whole game perfectly.
You're right ofcourse. It would not be 0% but an exceedingly small number. So the perfect player would not have a rating of infinity, but it would be a very large number. But for all practical purposes, the perfect player would probably win all of the games that it can play before the heat death of the universe (I'm more a physicist than a mathematician).

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 11:53 am
by Boidhre
Thank you (yes, consistency was unclear, I meant more consistent = higher expected win% against R-1 opponent). That graph is what I'm interested in and the rate on increase in elo gain per rank or 100 GoR and how well the EGF statistics match with the predictions from the EGF's rating model. What interests me most is the relationship between this and handicap stones and how the value of a handicap stone changes as players are stronger. How we have this discrete linear idea of strength difference on the board compared to the ELO differences shown in winrates. If that makes any sense. :)

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 12:17 pm
by gennan
Boidhre wrote:What interests me most is the relationship between this and handicap stones and how the value of a handicap stone changes as players are stronger. How we have this discrete linear idea of strength difference on the board compared to the ELO differences shown in winrates. If that makes any sense. :)
There are other rating systems that rate players on an Elo scale, such as https://www.goratings.org/en/. This is fine if you are only interested in ordering players by "skill", but the drawback of an Elo scale is that you cannot easily determine proper handicaps from rating gaps.
The EGD was specifically designed to rate players on a rank scale (based on handicaps), because that is the traditional "rating" scale used in go.

From my earlier Elo gap per rank table, you can say that the value of a full handicap stone is greater as players get stronger: 200 Elo for a 6d, 100 Elo for a 1d and 50 Elo for a 10k.
The same is true for handicap in chess. Apparently, the value of knight-odds is about 600 Elo at a rating of 2400, but only about 200 Elo at a rating of 1100.

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 12:58 pm
by gennan
jlt wrote:In a 300-move game, a random player has about 1/(361 x 359 x 357 x ... x 63) ~ 10-788 chances of playing the whole game perfectly.
Per Elo gap of 2000, the winning probability of the weaker player decreases by a factor of about 105.
So when we put a random player at 0 Elo and use 10-788 as the probility of playing a perfect game, I get that a perfect player has a rating of about 2000 * 788 / 5 ~ 300,000 Elo on a 19x19 board.

This high number is still hard to put in context though. For purposes of handicap and ranks, I think an estimate of 13d EGF for a perfect player is much more meaningful. Also for AlphaGo Zero, for which Deepmind reported a rating of about 5200 Elo above a random player, I think an estimate of 12d EGF is much more meaningful.

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 1:15 pm
by jlt
I made a mistake (confusion between ln and log10), it's 10-342 instead of 10-788 so a perfect player is about 140000 Elo above random play. That looks a huge number as we are used to thinking of Alphago as near-perfect, but what do we know about superhuman play? Maybe in 2050, Alphago will look very weak compared to more recent IAs.

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 1:34 pm
by gennan
jlt wrote:I made a mistake (confusion between ln and log10), it's 10-342 instead of 10-788 so a perfect player is about 140000 Elo above random play. That looks a huge number as we are used to thinking of Alphago as near-perfect, but what do we know about superhuman play? Maybe in 2050, Alphago will look very weak compared to more recent IAs.
Thanks for the correction.

So the Elo ratings of AI still have a lot of room for improvement. The gap between 2050 AI and 2020 AI could be as large as 20,000 Elo in even games with perfect komi, but I expect that in terms of handicap, the gap between 2050 AI and 2020 AI will be less than 10 points (so 2020 AI can still win when taking black and a few points reverse komi against 2050 AI).

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 2:46 pm
by Boidhre
gennan wrote: There are other rating systems that rate players on an Elo scale, such as https://www.goratings.org/en/. This is fine if you are only interested in ordering players by "skill", but the drawback of an Elo scale is that you cannot easily determine proper handicaps from rating gaps.
The EGD was specifically designed to rate players on a rank scale (based on handicaps), because that is the traditional "rating" scale used in go.

From my earlier Elo gap per rank table, you can say that the value of a full handicap stone is greater as players get stronger: 200 Elo for a 6d, 100 Elo for a 1d and 50 Elo for a 10k.
The same is true for handicap in chess. Apparently, the value of knight-odds is about 600 Elo at a rating of 2400, but only about 200 Elo at a rating of 1100.
My interest is partially coming from talking to some Irish ddks and weaker sdks recently about how "going up a stone in rank" doesn't mean the same thing going from 15k to 14k as it does going from 2k to 1k. Being able to visualise it as different jumps in Elo might be helpful for people coming from chess or similar backgrounds where Elo or similar is used.

Thanks.

Re: How to make KGS better?

Posted: Tue Dec 29, 2020 3:26 pm
by Harleqin
gennan wrote:
Boidhre wrote: The EGD was specifically designed to rate players on a rank scale (based on handicaps), because that is the traditional "rating" scale used in go.
Well, yes, it was supposed to carry that correlation forward, but I don't see how it could actually work:

1. Players use the rating to determine their rank
2. There are very few handicap games in the data

This means that (1.) the rank is not an input anymore, and (2.) the value of handicap is not an input.

This comes on top of the general problem that rating systems do not get enough data to reach any kind of conclusive answer anyway.

Re: How to make KGS better?

Posted: Wed Dec 30, 2020 4:03 am
by gennan
Harleqin wrote:
gennan wrote:The EGD was specifically designed to rate players on a rank scale (based on handicaps), because that is the traditional "rating" scale used in go.
Well, yes, it was supposed to carry that correlation forward, but I don't see how it could actually work:

1. Players use the rating to determine their rank
2. There are very few handicap games in the data

This means that (1.) the rank is not an input anymore, and (2.) the value of handicap is not an input.

This comes on top of the general problem that rating systems do not get enough data to reach any kind of conclusive answer anyway.
Handicaps vs rank gaps
About 10% of the EGD games are handicap games. Indeed that is not really enough to ensure that rank gaps and handicaps stay aligned over time. However, when we analyse those handicap games, the actual results seem to align quite well with expected results. On average the error seems to be less than a stone. So apparently, the EGD managed to keep rank gaps fairly well aligned with handicaps over more than 2 decades. That is encouraging. The system must be doing something right.

Overall rating drift
Besides aligning ranks gaps with handicap, there is also the matter of overall rating drift (inflation/deflation).
1. Before the internet, players would determine their rank by handicap in club games (outside of the EGD). When players entered their 1st tournament, they would declare their club rank at that moment and that would be their initial rating in the EGD.
2. Also, the EGD has a reset mechanism. When a player has improved a lot since their previous tournament (again determined by their current club rank from handicap games), they would declare their new rank on entering the next tournament. If that new rank is more than 1 rank above their previous highest declared rank, the system resets that player's rating to that new rank. So the player is not forced to sandbag and "steal" all their rating points from weaker players.
So the EGD is constantly being fed by club ranks. This is mostly sufficient to keep EGD ranks fairly well aligned to club ranks.

Internet players do complicate this picture a bit, because a new tournament player may have never played in a club when entering their 1st tournament. So they only know their rank on their favourite go server. And the tournament organisers will then guesstimate their equivalent EGF rank to play in the tournament.

The system depends on the right balance of pessimistic and optimistic rank declarations by tournament players to stay well aligned.
When players stop declaring club promotions when playing in tournaments, it becomes difficult to keep EGD ranks aligned to club ranks. If a previous 5k knows he has improved to 3k in their club, but does not update their rank when entering the next tournament, taking away rating points from his opponents to make his own rating rise to 3k, he is effectively sandbagging and causing deflation.

When looking at the EGD data, it does seem that overall, players tend to be pessimistic/conservative, delaying promotion until their EGD rating supports it. There are insufficient optimistic promotions to counter the deflation of pessimistic promotions.

3. So to counter the deflationary effect of improving players that don't promote themselves, we increase the rating bonus in the upcoming EGD update to constantly inject rating points into the system (at least in the kyu range).

Re: How to make KGS better?

Posted: Thu Dec 31, 2020 7:48 am
by daal
SoDesuNe wrote: b) Offer AI review (should be one which works with all komi variants and handicap of course): Please see lichess.org for this (it's fast, it points out mistakes to learn from, it's free for everyone, it even produces a nice graph to see where your biggest mistakes were made)
Seems by far the thing that would make a go server more attractive to me.