It is currently Fri Apr 19, 2024 4:55 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 81 posts ]  Go to page 1, 2, 3, 4, 5  Next
Author Message
Offline
 Post subject: A Curious Case Study in KGS Ranks
Post #1 Posted: Mon Mar 24, 2014 7:41 pm 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
There are many complaints espoused here and elsewhere about the inability of KGS's rating system to satisfy the needs of edge-case users. Frequently these discussions are emotiionally charged with only vague references to unsourced anecdotes, while it would be my preference for them to be more data driven. A strange turn of events has occurred recently that have allowed for an interesting evaluation of KGS's rating system behavior under extreme circumstances. Not being one to pass up a chance for investigation, I posit to L19 a case study. Specifically what I feel it tests are the following two claims:

- If you play too many rated games is it possible for your rating to become "stuck" to the point where even large streaks cannot move your rank. (If you play too many games will it take a very long time for your rank to move.)

-Does KGS unneccessarily penalize losing streaks over winning streaks, to where players cannot advance due to having 1 bad day. (Does a losing streak "weigh you down" more than a winning streak can "bring you up").



The Details:

The bot GnuGo2 has played approximately 17,000 rated games in the last six months, averaging about a 41% win rate (41.7 if you remove the anomaly we're about to discuss). This places it firmly in the mid-to-lower 11k rating and makes it quite possibly as stable as any rank will ever be. Due to an unfortunate error in how the user running this bot had it implemented, in mid-March there was one day where the bot forfeited vritually all of its games, ultimately going 6-236 on the day. For your review I've attached a clipped version of the bot's rating graph for this year where the day in question is clearly visible.

To cover the highlights:

- Having 1 poor day (2.5% win rate) encompassing approximately 1.5% of the total games played in the 6 month period caused the bot's rating to drop about 1/5 of a stone (graph is only updated once / day so there is no finer resolution to use) in spite of having 17,000 games "anchoring" the rank.

-Upon being restored to "normal strength" the bot played 887 (~5% of total games played in the 6 months) games winning ~49% of them, and it took less than a week for the rank to essentially fully recover.

-The bot's winrate while being rated 1 stone lower than normal was ~57.5%, so nothing terribly extraordinary.


To me this suggests that even if you are an extreme edge case (I don't know of any human users who have managed 17,000 games in 6 months, in spite of how much many have tried), your rank is still mobile if you truly have statistically significant streaks. Further it suggests to me no matter how bad of a day you have (because this was basically the worst of bad days), it is not a particularly excessive burden to overcome (The rank was restored to normal without an excessively high win rate).


Thoughts?


Attachments:
File comment: Annotated Rank Graph
GnuGo2.JPG
GnuGo2.JPG [ 33.05 KiB | Viewed 13105 times ]

This post by Mef was liked by: daal
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #2 Posted: Mon Mar 24, 2014 8:10 pm 
Lives in sente

Posts: 1223
Liked others: 738
Was liked: 239
Rank: OGS 2d
KGS: illluck
Tygem: Trickprey
OGS: illluck
That seems like a demonstration of immobile rank to me - 6:236 and only dropping a fifth of a stone is pretty ridiculous.


This post by illluck was liked by 2 people: cdybeijing, Splatted
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #3 Posted: Mon Mar 24, 2014 8:23 pm 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
illluck wrote:
That seems like a demonstration of immobile rank to me - 6:236 and only dropping a fifth of a stone is pretty ridiculous.


To put this in perspective, this is the equivalent to a normal player who plays 2 games /day having a 4 game losing streak in a day.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #4 Posted: Mon Mar 24, 2014 9:30 pm 
Lives with ko
User avatar

Posts: 129
Liked others: 5
Was liked: 14
Rank: KGS 4k
Those who are willing to look at KGS ranks rationally know that kgs ranks do not get stuck. It's just that there are people that need something to blame for the fact that they are not progressing as fast they they would like.


This post by Dante31 was liked by 2 people: Bantari, Mef
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #5 Posted: Mon Mar 24, 2014 9:43 pm 
Judan

Posts: 6140
Liked others: 0
Was liked: 786
The case study does not compare well to human players with frequent games, who need, without significant interruption, to win ca. 70+% for weeks up to a few months in order to improve a rank, after it has been VERY MUCH easier to drop a rank.

The problem can already be observed when 1 loss demotes a rank, but the next 2 or 3 games won do not necessarily promote a rank.

For any rating system to be perceived fair, there must be symmetry in the difficulties of decreasing and increasing one's rating. The KGS system lacks such a symmetry.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #6 Posted: Mon Mar 24, 2014 10:13 pm 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
RobertJasiek wrote:
The case study does not compare well to human players with frequent games, who need, without significant interruption, to win ca. 70+% for weeks up to a few months in order to improve a rank, after it has been VERY MUCH easier to drop a rank.

The problem can already be observed when 1 loss demotes a rank, but the next 2 or 3 games won do not necessarily promote a rank.

For any rating system to be perceived fair, there must be symmetry in the difficulties of decreasing and increasing one's rating. The KGS system lacks such a symmetry.



This has never been documented, only alluded to in unsupported anecdote that falls apart whenever data is collected. In fact, you personally were used as an example in a previous case study to demonstrate that this effect doesn't exist!

Edit: My apologies, I should have said: Two previous case studies

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #7 Posted: Tue Mar 25, 2014 1:56 am 
Judan

Posts: 6140
Liked others: 0
Was liked: 786
1) I have experienced my described rating / ranking behaviour for myself several (not only one, as you suggest) times.

2) Your linked case studies might be used for OTHER arguments (such as that I do not permanently win 70% of my KGS games, e.g., because(!!!) it is by far too frustrating to maintain a winning attitude when affected by the mentioned experience and continue playing only when not tired), but they do not refute my made experience.

3) I have heard from (or watched) several people that they have made similar experiences.

4) Since the effects have been experienced, they DO exist. (And no, I have not bothered to protocol them. I have better uses for my time.)

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #8 Posted: Tue Mar 25, 2014 2:28 am 
Gosei
User avatar

Posts: 1585
Location: Barcelona, Spain (GMT+1)
Liked others: 577
Was liked: 298
Rank: KGS 5k
KGS: RBerenguel
Tygem: rberenguel
Wbaduk: JohnKeats
Kaya handle: RBerenguel
Online playing schedule: KGS on Saturday I use to be online, but I can be if needed from 20-23 GMT+1
RobertJasiek wrote:
4) Since the effects have been experienced, they DO exist. (And no, I have not bothered to protocol them. I have better uses for my time.)


¿¿?? Robert, you are a mathematician. Come on!

_________________
Geek of all trades, master of none: the motto for my blog mostlymaths.net


This post by RBerenguel was liked by: hyperpape
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #9 Posted: Tue Mar 25, 2014 3:06 am 
Judan

Posts: 6140
Liked others: 0
Was liked: 786
A fix for the rating system? Easy, use a different system:

- +0.1 ranks for a win, -0.1 ranks for a loss.
- Ignore all handicap games (incl. those with handicap 1).
- Ignore games with a rank difference >2.
- Maximum rank 9d.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #10 Posted: Tue Mar 25, 2014 4:09 am 
Gosei
User avatar

Posts: 1585
Location: Barcelona, Spain (GMT+1)
Liked others: 577
Was liked: 298
Rank: KGS 5k
KGS: RBerenguel
Tygem: rberenguel
Wbaduk: JohnKeats
Kaya handle: RBerenguel
Online playing schedule: KGS on Saturday I use to be online, but I can be if needed from 20-23 GMT+1
RobertJasiek wrote:
A fix for the rating system? Easy, use a different system:

- +0.1 ranks for a win, -0.1 ranks for a loss.
- Ignore all handicap games (incl. those with handicap 1).
- Ignore games with a rank difference >2.
- Maximum rank 9d.


I'm tempted to run a Monte Carlo simulation of such a system. Maybe I'll do, could be fun.

_________________
Geek of all trades, master of none: the motto for my blog mostlymaths.net

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #11 Posted: Tue Mar 25, 2014 4:51 am 
Beginner

Posts: 10
Liked others: 0
Was liked: 6
Rank: AGA 2 dan
Easy, use a different system:

- +0.1 ranks for a win, -0.1 ranks for a loss.
- Ignore all handicap games (incl. those with handicap 1).
- Ignore games with a rank difference >2.
- Maximum rank 9d.[/quote]

I'm tempted to run a Monte Carlo simulation of such a system. Maybe I'll do, could be fun.[/quote]


Under which system, in Mef's example the bot's rating would have moved to 34k the following day?

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #12 Posted: Tue Mar 25, 2014 5:41 am 
Gosei
User avatar

Posts: 2011
Location: Groningen, NL
Liked others: 202
Was liked: 1087
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
RobertJasiek wrote:
A fix for the rating system? Easy, use a different system:

- +0.1 ranks for a win, -0.1 ranks for a loss.
- Ignore all handicap games (incl. those with handicap 1).
- Ignore games with a rank difference >2.
- Maximum rank 9d.


Which is deflationary. Every 20k that enters the system and moves up to 1d has removed 20 ranks total from the other players. That's no problem in a small playing pool like a club, where I think this kind of system is fine, as you can just manually recalibrate all ranks every once in a while, but on a go server it is unsuitable.

In a deflationary system, playing more games means you lose rating quicker. So you're replacing "My rating is stuck because I play so much" with "My rating keeps dropping because I play so much". How is that better?


This post by HermanHiddema was liked by 2 people: Bill Spight, Mef
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #13 Posted: Tue Mar 25, 2014 6:52 am 
Lives in gote

Posts: 677
Liked others: 6
Was liked: 31
KGS: 2d
I am a 5D-Tygem and 1d-KGS. From my experience with Tygem I can say: Ranks at KGS are more stable and consistent than Tygem's. On Tygem you will find some more differences within one rank. Sometimes you play guys that seem like 1-2 stones weaker, sometimes 1-2 stronger, but all have the same rank. But here comes the advantage of such a thing: It's more fun, you have faster chances to get promoted/demoted and to play stronger players that wouldn't play you otherwise. KGS ranking is sounder, but more boring and since ranking is maybe the main motivation to play and stay in Go, it's significant.

I'd like KGS to copy Tygem's ranking system, i.e. a system of x-game series where you get promoted when you win y games and demoted when you lose z games out of it.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #14 Posted: Tue Mar 25, 2014 7:18 am 
Lives with ko

Posts: 199
Liked others: 6
Was liked: 55
Rank: KGS 3 kyu
It's funny how Robert just proposed removing all handicap games from the calculation while in a different topic I proposed that only handicap games should be considered so we don't rely on arbitrary win percentages.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #15 Posted: Tue Mar 25, 2014 7:48 am 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
Pippen wrote:
I am a 5D-Tygem and 1d-KGS. From my experience with Tygem I can say: Ranks at KGS are more stable and consistent than Tygem's. On Tygem you will find some more differences within one rank. Sometimes you play guys that seem like 1-2 stones weaker, sometimes 1-2 stronger, but all have the same rank. But here comes the advantage of such a thing: It's more fun, you have faster chances to get promoted/demoted and to play stronger players that wouldn't play you otherwise. KGS ranking is sounder, but more boring and since ranking is maybe the main motivation to play and stay in Go, it's significant.

I'd like KGS to copy Tygem's ranking system, i.e. a system of x-game series where you get promoted when you win y games and demoted when you lose z games out of it.


KGS's rating system aims to provide the most accurate rank it can with all data available. It aims to do the best job of predicting the probable outcome between any two players and any handicap (though in practice it only accepts feedback from games H6 or less).

Tygem's rating system does not make any predictions. It does not handle handicap games. It does not make any attempt to ensure proper rank spacing. It suffers from large amounts of noise being introduced by players setting their own ranks. Under an ideal set of assumptions (all ranks properly spaced, all players properly ranked, etc) you still expect to spend 30% of your time at the wrong rank. Tygem's rating system has a place in the go world and many people find it fun. Accurately assessing your go strength and comparing yourself on a fixed scale to a pool of larger players isn't it.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #16 Posted: Tue Mar 25, 2014 7:54 am 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
uPWarrior wrote:
It's funny how Robert just proposed removing all handicap games from the calculation while in a different topic I proposed that only handicap games should be considered so we don't rely on arbitrary win percentages.



Someone with a stronger math background than myself could probably come up with a better answer for what the rating system thinks is ideal, but I would think that the best case would be for all players to have an even distribution of games across the whole range of handicaps the system aims to predict. on KGS that would mean 7.69% giving H6, H5, H4, etc. This would leave approximately 23% of your games as having no handicap (e.g. either even or +- 1 stone). Also you would probably want to fix the cultural affinity for using 0.5 komi and make it reverse komi.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #17 Posted: Tue Mar 25, 2014 7:57 am 
Gosei
User avatar

Posts: 1585
Location: Barcelona, Spain (GMT+1)
Liked others: 577
Was liked: 298
Rank: KGS 5k
KGS: RBerenguel
Tygem: rberenguel
Wbaduk: JohnKeats
Kaya handle: RBerenguel
Online playing schedule: KGS on Saturday I use to be online, but I can be if needed from 20-23 GMT+1
I ran a simulation, just for fun. As Herman points out, the system is deflationary. To compensate, I use a closed pool of players. Each player is given a rank from 0 to 40 (9d to 30k, so to say) and an "inner rank," which in some sense is used to model its real rank. So for example, a player can be 4, 8 so he should be losing rank eventually. To calculate game results, I use the difference of inner rank among players and an ELO-like winning probability, and only consider games between players with at most 1 rank difference. The distribution of ranks is a normal distribution, mean 20, sigma 2/3*mu. The population (and ranks) are corrected so that min is 0 and max is 40.

To plot and display, I use the percentiles in the difference between "inner rank" and "system rank." The results with a pool of just 100 players and 50000 games (real games played: 49847) roughly look like:

Code:
Simulation: 100 * gauss(mu=20) for T=50000 steps having 49847 games played (bot stands for bottom):

          top 1%    top 10%    top 25%  top 33.3%     median  bot 33.3%    bot 25%    bot 10%     bot 1%
start      40.00      29.16      22.98      20.82      14.88      10.37       8.22       2.81       0.02
mid         4.66       4.04       2.77       2.16       1.65       0.96       0.74       0.26       0.00
final       3.14       2.38       1.71       1.50       1.21       0.88       0.72       0.26       0.00


Or graphically,
Attachment:
Screen Shot 2014-03-25 at 15.51.56.png
Screen Shot 2014-03-25 at 15.51.56.png [ 61.79 KiB | Viewed 12902 times ]


Even such a simple ranking model has a big flaw (assuming closed pool of players, sure): it takes an awful lot of games to get to a "real strength," and even with 150k playthroughs (just simulated this, to check) the worst result is almost 2 "stones" off (the median is half a stone off).

_________________
Geek of all trades, master of none: the motto for my blog mostlymaths.net


This post by RBerenguel was liked by 2 people: illluck, Mef
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #18 Posted: Tue Mar 25, 2014 8:46 am 
Lives with ko

Posts: 248
Liked others: 23
Was liked: 148
Rank: DGS 2 kyu
Universal go server handle: Polama
Mef wrote:
illluck wrote:
That seems like a demonstration of immobile rank to me - 6:236 and only dropping a fifth of a stone is pretty ridiculous.


To put this in perspective, this is the equivalent to a normal player who plays 2 games /day having a 4 game losing streak in a day.


Nope, not equivalent. Plugging these numbers into a binomial calculator:

If we expect a 41% win rate, the probability of losing at least 236 games out of 242 'by chance' is about 10^-45.

If we factor in the ~17,000 opportunities for that streak, we're still around, call it, 10^-40.

For a player going 2 games a day, that's 365 games in the 6 month span. If we say he went 0-4, that's 12%. Given 361 4 game spans, that's essentially a given to occur (1-(10^-20) or so?) We'd be extremely surprised if a 41% player didn't have a 4 game losing streak in 365 games, and even more surprised if he had a 3/242 streak in a 17,000 game span.

Wins are streaky by nature, so the probability will be higher in practice. But still, 10^-40 is roughly your odds of being dealt a royal flush in poker, 7 hands in a row.

Put another way, auto-resigning most games, he was probably, what? 30 kyu? So the fact that the system thought he'd only fallen 1/5 a stone was extremely wrong. We know he was much worse than that. He demonstrated it over a very significant number of games. Which, as I understand it, is the most common complain about the kgs rating system: that it overestimates (in this case, vastly overestimates) how much variation can be expained away by chance as the number of games played increases.


This post by Polama was liked by: Mef
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #19 Posted: Tue Mar 25, 2014 8:54 am 
Gosei
User avatar

Posts: 1585
Location: Barcelona, Spain (GMT+1)
Liked others: 577
Was liked: 298
Rank: KGS 5k
KGS: RBerenguel
Tygem: rberenguel
Wbaduk: JohnKeats
Kaya handle: RBerenguel
Online playing schedule: KGS on Saturday I use to be online, but I can be if needed from 20-23 GMT+1
Polama wrote:
Mef wrote:
illluck wrote:
That seems like a demonstration of immobile rank to me - 6:236 and only dropping a fifth of a stone is pretty ridiculous.


To put this in perspective, this is the equivalent to a normal player who plays 2 games /day having a 4 game losing streak in a day.


Nope, not equivalent. Plugging these numbers into a binomial calculator:

If we expect a 41% win rate, the probability of losing at least 236 games out of 242 'by chance' is about 10^-45.

If we factor in the ~17,000 opportunities for that streak, we're still around, call it, 10^-40.

For a player going 2 games a day, that's 365 games in the 6 month span. If we say he went 0-4, that's 12%. Given 361 4 game spans, that's essentially a given to occur (1-(10^-20) or so?) We'd be extremely surprised if a 41% player didn't have a 4 game losing streak in 365 games, and even more surprised if he had a 3/242 streak in a 17,000 game span.

Wins are streaky by nature, so the probability will be higher in practice. But still, 10^-40 is roughly your odds of being dealt a royal flush in poker, 7 hands in a row.

Put another way, auto-resigning most games, he was probably, what? 30 kyu? So the fact that the system thought he'd only fallen 1/5 a stone was extremely wrong. We know he was much worse than that. He demonstrated it over a very significant number of games. Which, as I understand it, is the most common complain about the kgs rating system: that it overestimates (in this case, vastly overestimates) how much variation can be expained away by chance as the number of games played increases.


Can't this just be explained by history inertia? It may be statistically relevant, but the KGS ranking system (IIRC, it's been a while since I checked it) it's almost a predictor-corrector system (sorry for the term, this is used in numerical analysis, for example): it will heavily rely on history to predict the rank, probably correcting after more data points are available. Sure, a huge losing streak is significant, and current, but the historical weight says otherwise, and dampens the current "error"

_________________
Geek of all trades, master of none: the motto for my blog mostlymaths.net

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #20 Posted: Tue Mar 25, 2014 9:14 am 
Lives with ko

Posts: 248
Liked others: 23
Was liked: 148
Rank: DGS 2 kyu
Universal go server handle: Polama
RBerenguel wrote:
Can't this just be explained by history inertia? It may be statistically relevant, but the KGS ranking system (IIRC, it's been a while since I checked it) it's almost a predictor-corrector system (sorry for the term, this is used in numerical analysis, for example): it will heavily rely on history to predict the rank, probably correcting after more data points are available. Sure, a huge losing streak is significant, and current, but the historical weight says otherwise, and dampens the current "error"


The algorithm's choice can be explained by history inertia. But the actual performance can't be. If you view a rank as a fixed, static thing and you hit a 200 loss streak the best you can do is throw your hands up and say "that was weird!" and adjust your prediction down slightly. But this streak clearly demonstrates that this account's ability is not static, that the previous 17,000 games are no longer particularly meaningful. When we're at 10^-40 probability, it's significantly more likely that, say, the person suffered extreme head trauma then that they're having a bad day.

The model may work better with humans. But this case is a demonstration that at extreme numbers of games it can no longer respond to absurdly strong signals of a change in rank.

Now, it may be that there's an explicit time mechanism, and that if this account were let to run for a month it would eventually plummet rapidly to 30 kyu. That would be sensible, because the most likely case seems to be that somebody else logged into this account today. You'd want measures from multiple days to be certain. But if we're just looking at game results, the effect should definitely be way, way, way stronger.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 81 posts ]  Go to page 1, 2, 3, 4, 5  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group