It is currently Wed Apr 24, 2024 7:28 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 81 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #61 Posted: Wed Mar 26, 2014 2:10 pm 
Gosei
User avatar

Posts: 1639
Location: Ponte Vedra
Liked others: 642
Was liked: 490
Universal go server handle: Bantari
RobertJasiek wrote:
There are also other reasons why I do not play much on other servers, such as extremely disliking having to use another software for every server.
There are other reasons to like KGS, so I want the worst part of KGS (the rating system) to improve so that I can better enjoy to good features of KGS.

I think what you suggest is not really an "improvement", but only a change which would make it more fun for you, personally, to play there.

But you need to understand that this change, which would make it more fun for you, personally, to play there - this change would ruin the fun for some others to play there, me included, personally. I happen to quite like the KGS rating system as it is.

So basically, what you propose is that because of your personal dislike to play on other servers, you wish the one server which you like playing on to cater to your very personal preferences, even when this means that others are unhappy.

What I would suggest is that instead of remaking the world to be your sweet cozy oyster, you try to figure out how to combat your personal little idiosyncrasy which prevents you from enjoying the oysters already out there. It is easier and infinitely more efficient to change one person that then whole server. And if you really feel about it strongly, why not just switch to Tygem entirely - you will have then also just one server with just one software, and this should make you happy. Or no?

As I said - as long as we don't all agree on exactly the same model, places have to exist which cater to various groups. It seems that the place that caters to you is Tygem, so why worry about KGS?

RobertJasiek wrote:
I have not said that one system must be used on all servers. You have made this up.

I did not "make it up". I have inferred it from what you said. It never occurred to me that you make all this fuss because you are unwilling to play on Tygem or anywhere else but KGS, and thus KGS has to cater to your personal preferences in spite of the fact that places which cater to your personal preferences already exist and thrive elsewhere.

So it is probably my bad, apologies.

However - I think you should lobby to also switch Tygem rating system to more sensible one, just as you lobby for change in KGS system. After all, balance must be preserved. If you try to take away the place I enjoy, at least try to give me a substitute. Fair is fair, no?

RobertJasiek wrote:
There is room for a server with real world ratings. In fact, there is so much room that such a server does not even exist remotely. Don't even try to pretend KGS would be such a server, ridiculous. On KGS, equally KGS-ranked players can easily be 5 real world ranks apart.

As can they in real world. I still remember fondly when I was forced to give a "1d" player 9 handi and trashed him badly.

Anyways - I never said what you suggest, you are making it up. I said "real-world-like", there is a difference, especially when we consider the context, which is rank stability vs. rank instability.

_________________
- Bantari
______________________________________________
WARNING: This post might contain Opinions!!

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #62 Posted: Wed Mar 26, 2014 2:50 pm 
Oza
User avatar

Posts: 2777
Location: Seattle, WA
Liked others: 251
Was liked: 549
KGS: oren
Tygem: oren740, orenl
IGS: oren
Wbaduk: oren
Mef wrote:
Nevertheless even for the corner case KGS has a simple way to solve this problem: Play games handicapped at the rating you think you should be! This will allow you to reach your equilibrium faster and unlike many other rating systems does not penalize the opponents who help you get there.


Or make a new account. :)

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #63 Posted: Wed Mar 26, 2014 3:43 pm 
Gosei
User avatar

Posts: 1585
Location: Barcelona, Spain (GMT+1)
Liked others: 577
Was liked: 298
Rank: KGS 5k
KGS: RBerenguel
Tygem: rberenguel
Wbaduk: JohnKeats
Kaya handle: RBerenguel
Online playing schedule: KGS on Saturday I use to be online, but I can be if needed from 20-23 GMT+1
RobertJasiek wrote:

My system (when worked out to have global non-deflationary stability) would have much greater volatiliy, but I am not at all convinced it would have smaller accuracy. Rather I think that, on average for every particular player, it would have greater accuracy, because it can correct his temporarily wrong ratings much more quickly.


What system? It is clear that it is quite hard to model player's rank real distribution. A higher volatility simulated model is quite wrong, too

_________________
Geek of all trades, master of none: the motto for my blog mostlymaths.net

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #64 Posted: Wed Mar 26, 2014 3:54 pm 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
I really enjoy reading the discussion that's been going on (well at least half of the discussion...Robert defending a rating system he came up with in 10 seconds without thinking of its implications isn't as interesting to me). One of the reasons I wanted to present this corner case was to start some discussion.

I think one thing this has help point out is one of the sources for confusion (and perhaps frustration) related to the KGS rating system. It can be quickly summarized here:

Polama wrote:
What we strictly, factually know is that over 242 games this account was at least 3 stones weaker, potentially more depending on the exact nature of the bug.


Polama wrote:
I think an advanced statistical model would view this case as a meaningful shift


vs.

RBerenguel wrote:
A student in statistics won't look at the data and say, "hey, this player is a sucker now!"



What we have at the core is two questions: "How strong was the bot performing at a given time?" vs. "How do you expect the bot to perform on its next game?"

Many people are worried about the former and this is related to what Polama is calculating. The performance of the bot on that day was clearly well below 11k. This is very easy to show with very high statistical certainty.

The other question is related, however it is not the same. Likewise, when you calculate the expected result it is also not the same. If we were to look for analogies, the closest we will probably find to something like this is a sports injury. If a player is injured, their performance may suffer a sudden drastic drop, but you would not expect this to be representative of how they will be expected to perform if and when they recover.

The KGS rating system aims to answer the latter question (predict the outcome of the next game), at the necessary expense of the former question (describe the result of the previous game). This of course always implies there is a bit of regression to the mean ever-present in all of its calculations.


This post by Mef was liked by: Polama
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #65 Posted: Wed Mar 26, 2014 4:38 pm 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
RobertJasiek wrote:
The case study does not compare well to human players with frequent games, who need, without significant interruption, to win ca. 70+% for weeks up to a few months in order to improve a rank, after it has been VERY MUCH easier to drop a rank.


I let this slide when it was posted because I didn't think it was worth bringing up...but as Robert has continued harping on about how he feels slighted by the flawed KGS rating system and because this claim is so incredibly easy to check (it took me about 10 minutes to make a spreadsheet), I just wanted to point out that Robert has never had consecutive months with >70% win rate in rated games regardless of sample size, playing rate, rating change, handicap, etc. If you count April/May in 2004 he had one set of 2 months where he had 70% and 72%, but that's almost 10 years ago and the KGS rating system has been adjusted several times since then. I have attached a graph which is quite easy for anyone to independently verify with his archives.

Robert's statements are based on assumptions that are divorced from reality.

Edit: Putting imagine in hide tag:
Attachment:
File comment: Monthly win rate 2004-2014
Sum-Monthly-Winrate-small.JPG
Sum-Monthly-Winrate-small.JPG [ 25.68 KiB | Viewed 9006 times ]


Last edited by Mef on Wed Mar 26, 2014 6:33 pm, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #66 Posted: Wed Mar 26, 2014 4:41 pm 
Lives in gote

Posts: 553
Liked others: 61
Was liked: 250
Rank: AGA 5 dan
Mef wrote:
The KGS rating system aims to answer the latter question (predict the outcome of the next game), at the necessary expense of the former question (describe the result of the previous game).

Hmm, I thought I understood the KGS rating system until I read this. I would have said that the KGS rating system is designed to accurately describe the results of the previous games, with the assumption that this allows it to predict the outcome of the next game.

On the subject of a player whose rank changes drastically and discontinuously, that is an unusual case which violates the assumptions of the rating model, and I don't think is it particularly interesting to see how KGS or any other rating system copes with this anomaly.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #67 Posted: Wed Mar 26, 2014 4:46 pm 
Judan

Posts: 6160
Liked others: 0
Was liked: 788
Bantari,

that I have described my preferred kind of rating system does not imply that I would impose it on everybody for the sake of making only myself happy. Nevertheless, you allow me to express my opinion, right?:) - Since different people have different preferences, a rating system can be some compromise. However, currently the KGS rating system is no compromise in its stability aspect. - I think a compromise should be possible so that some stability is there but everybody (incl. the frequent players) can improve if winning a significant (instead of very great) percentage over a reasonable (instead of extraordinarily long) period and without super-human effort (alternatively without playing for a few months, then winning a few games).

To understand your preference for the current system, how many games do you play per day and how many months do you need to improve a rank after having dropped a rank?

Currently, by experience, one effectively needs to win ca. 68.5% for successive weeks or months to improve a rank as a frequently playing player. Assume it would be tweaked to 65%, would you be unhappy then? For me, this might make the difference, because 65% do not require as much super-human effort.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #68 Posted: Wed Mar 26, 2014 4:48 pm 
Lives with ko

Posts: 199
Liked others: 6
Was liked: 55
Rank: KGS 3 kyu
RBerenguel wrote:
What system? It is clear that it is quite hard to model player's rank real distribution.


In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #69 Posted: Wed Mar 26, 2014 5:06 pm 
Oza

Posts: 2494
Location: DC
Liked others: 157
Was liked: 442
Universal go server handle: skydyr
Online playing schedule: When my wife is out.
uPWarrior wrote:
RBerenguel wrote:
What system? It is clear that it is quite hard to model player's rank real distribution.


In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.


I suspect part of the problem with this is that rank is defined as a difference of X handicap stones between two players, but the very definition may be flawed if it doesn't scale correctly as the handicap increases, as well as if it doesn't apply equally to higher and lower ranks. That is, if a 1 rank player gives 3 stones to a 4 rank player for a 50% win rate, and a 4 rank player gives the same for the same win percentage to a 7 rank player, does it actually follow in reality that a 1 rank player will give 6 stones to a 7 rank player and come out with a 50% win percentage? If it does, is this true regardless of whether the hypothetical 1 rank player is 7 dan or 7 kyu as we would judge currently? I certainly don't have a good sense of how well players of any strength conform to this expectation of the definition, though I would welcome data one way or the other.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #70 Posted: Wed Mar 26, 2014 5:08 pm 
Judan

Posts: 6160
Liked others: 0
Was liked: 788
Mef wrote:
Robert has never had consecutive months with >70% win rate


"Ca. 70%" (IIRC, I have not said ">70%") has been a simplifying, rounded number, because I rely on memory. A couple of years ago, I actually counted numbers of wins and losses for one or two periods (a couple of weeks) when I played seriously in order to (and mainly for the purpose to) improve a KGS rank. IIRC, it was ca. 68.5%, but I am not sure of the exact number. I posted the figures somewhere, maybe you find them. I calculated the percentage from the start of making my serious attempt to the moment of reaching the next higher KGS rank. Therefore, it does not matter whether it was consecutive months. What matters is that it was EXACTLY the period during which I made the serious attempt.

I have not claimed to have had consecutive months with >70% win rate. You enjoy to bring forward this argument, which I have not made. Please understand the difference between consecutive calendar months and period of seriously playing until raising a rank.

(As I reported elsewhere, I also had the other experience of playing very little for IIRC months, then winning literally only a few games in order to suddenly improve a rank, i.e. being shown the next higher rank tag.)

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #71 Posted: Wed Mar 26, 2014 5:32 pm 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
RobertJasiek wrote:
Currently, by experience, one effectively needs to win ca. 68.5% for successive weeks or months to improve a rank as a frequently playing player. Assume it would be tweaked to 65%, would you be unhappy then? For me, this might make the difference, because 65% do not require as much super-human effort.


Honestly? Once I have the graph it's virtually 0 effort to check this. Aside from the 1 instance I mentioned previously, you have not had a periods of 2 consecutive months with 65%+ win rate on KGS in rated games either.

That aside, KGS assumes that there is a 66% likelihood of a person half a stone stronger winning a rated game (assuming they are 2d or stronger). An infinitely long 65% win streak would not necessarily be enough to promote. It's been a while since I have done the math on them, but I would assume AGA and EGF are similar in how they compute this.

edit: putting imagine in hide tag
Attachment:
File comment: Monthly win rate with reference lines
Sum-Monthly-Winrate-small-with-lines.JPG
Sum-Monthly-Winrate-small-with-lines.JPG [ 26.96 KiB | Viewed 9003 times ]


Last edited by Mef on Wed Mar 26, 2014 6:31 pm, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #72 Posted: Wed Mar 26, 2014 5:36 pm 
Gosei
User avatar

Posts: 1639
Location: Ponte Vedra
Liked others: 642
Was liked: 490
Universal go server handle: Bantari
RobertJasiek wrote:
Bantari,

that I have described my preferred kind of rating system does not imply that I would impose it on everybody for the sake of making only myself happy. Nevertheless, you allow me to express my opinion, right?:) - Since different people have different preferences, a rating system can be some compromise. However, currently the KGS rating system is no compromise in its stability aspect. - I think a compromise should be possible so that some stability is there but everybody (incl. the frequent players) can improve if winning a significant (instead of very great) percentage over a reasonable (instead of extraordinarily long) period and without super-human effort (alternatively without playing for a few months, then winning a few games).

To understand your preference for the current system, how many games do you play per day and how many months do you need to improve a rank after having dropped a rank?

Currently, by experience, one effectively needs to win ca. 68.5% for successive weeks or months to improve a rank as a frequently playing player. Assume it would be tweaked to 65%, would you be unhappy then? For me, this might make the difference, because 65% do not require as much super-human effort.

Ok, fair enough.

One point, though:
  • If it takes less effort to reach a rank, there would be more players with that rank. For example: you are sitting in a pool of 4d players. If you lower the threshold for rank increase to a lower percentage, you might rise to 5d easier, but so would many of your other fellow 4d players. At the same time, many of the 5d players would rise to 6d, since this would be easier now as well. Taking it to extreme, chances are you will sit in the same pool of the same people just with a different number by your name. To me, this would be absolutely meaningless, its just a label. As long as the system is uniform, I care not that much if people of my strength are called 4d or 5d or whatever.

And a second point, for good measure:
  • With the situation being as it is, it certainly does not take a "superhuman effort" to reach 5d. There are many players who are 5d on KGS, they reached it fair and square, and I have hard time believing that they are all X-Men. What you mean, I assume, is that it would take a "superhuman effort" for *you* to reach 5d. But all that this means is that, according to this particular rating system, you are not yet strong enough to reach 5d on KGS, pure and simple. No matter how your ego makes you think of yourself or how much you would love it to be otherwise.

If, for whatever reasons (for example - teaching fees) it is important for you to have a higher number by your name, best to switch to a server on which the system allows somebody of your strength reach higher ranks. As for KGS... the value of reaching a higher rank is precisely because it is not easy to reach, it means something. Making it easier to reach would make it mean less. Just like Tygem ranks mean squat - certainly I would never consider a Tygem 5d anything near a real-life 5d. While KGS 5d is pretty strong.

This is the best advice I can give you.

_________________
- Bantari
______________________________________________
WARNING: This post might contain Opinions!!

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #73 Posted: Wed Mar 26, 2014 6:04 pm 
Lives with ko

Posts: 199
Liked others: 6
Was liked: 55
Rank: KGS 3 kyu
skydyr wrote:
uPWarrior wrote:
RBerenguel wrote:
What system? It is clear that it is quite hard to model player's rank real distribution.


In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.


I suspect part of the problem with this is that rank is defined as a difference of X handicap stones between two players, but the very definition may be flawed if it doesn't scale correctly as the handicap increases, as well as if it doesn't apply equally to higher and lower ranks. That is, if a 1 rank player gives 3 stones to a 4 rank player for a 50% win rate, and a 4 rank player gives the same for the same win percentage to a 7 rank player, does it actually follow in reality that a 1 rank player will give 6 stones to a 7 rank player and come out with a 50% win percentage? If it does, is this true regardless of whether the hypothetical 1 rank player is 7 dan or 7 kyu as we would judge currently? I certainly don't have a good sense of how well players of any strength conform to this expectation of the definition, though I would welcome data one way or the other.


I don't have data but I think most people would agree both things apply: stones are not fully transitive and stronger players play less swingy games.

The second fact wouldn't impact the rating system at all, if a 7k wins 50% of the games against a 4k then he should be 6k and the same would be true for the 7d. (how easy it is to actually win 50% of the games against a player 3 stones stronger wouldn't have to be modeled at all)
Transitivity could be a problem, you could try to model that distribution instead of the player distribution as this one would not be biased (your player distribution model depends on your own ranking system, while the win/loss ratio does not). Or you could just ignore the fact that high handicaps aren't transitive as they are so rare anyway..


Last edited by uPWarrior on Wed Mar 26, 2014 6:06 pm, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #74 Posted: Wed Mar 26, 2014 6:05 pm 
Oza
User avatar

Posts: 2401
Location: Tokyo, Japan
Liked others: 2339
Was liked: 1332
Rank: Jp 6 dan
KGS: ez4u
Below is a graph that may (or may not) help. This is from the KGS Analytics download. It graphs the 100-game moving average win rate (i.e. the moving average of column 'L' in the download file) against the average 'Rank' (column 'D' in the download file) at the time of those games. The moving average will give us a different view than monthly results due to the changing volume of games played per month. Notice in the X-axis labels that Aug-07 through Dec-07 shows each month. Sum was busy in those months. Compare that to Nov-12 through Dec-13. Only four months are shown: Nov, Mar, Aug, Dec. Sum was not busy in those months.

The rank was averaged and divided by 10 just to fit it in the same scale as the winning rate. Hence 5d = '50%', 4d = '40%', etc. on this graph. This allows us to look at the relationship between winning rate and promotion/demotion timing.
Attachment:
Sum Win Rate - Dan 20140327.jpg
Sum Win Rate - Dan 20140327.jpg [ 76.8 KiB | Viewed 8984 times ]

_________________
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21


This post by ez4u was liked by 3 people: illluck, Mef, RBerenguel
Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #75 Posted: Wed Mar 26, 2014 6:25 pm 
Oza

Posts: 2494
Location: DC
Liked others: 157
Was liked: 442
Universal go server handle: skydyr
Online playing schedule: When my wife is out.
uPWarrior wrote:
skydyr wrote:
uPWarrior wrote:
In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.


I suspect part of the problem with this is that rank is defined as a difference of X handicap stones between two players, but the very definition may be flawed if it doesn't scale correctly as the handicap increases, as well as if it doesn't apply equally to higher and lower ranks. That is, if a 1 rank player gives 3 stones to a 4 rank player for a 50% win rate, and a 4 rank player gives the same for the same win percentage to a 7 rank player, does it actually follow in reality that a 1 rank player will give 6 stones to a 7 rank player and come out with a 50% win percentage? If it does, is this true regardless of whether the hypothetical 1 rank player is 7 dan or 7 kyu as we would judge currently? I certainly don't have a good sense of how well players of any strength conform to this expectation of the definition, though I would welcome data one way or the other.


I don't have data but I think most people would agree both things apply: stones are not fully transitive and stronger players play less swingy games.

The second fact wouldn't impact the rating system at all, if a 7k wins 50% of the games against a 4k then he should be 6k and the same would be true for the 7d. (how easy it is to actually win 50% of the games against a player 3 stones stronger wouldn't have to be modeled at all)
Transitivity could be a problem, you could try to model that distribution instead of the player distribution as this one would not be biased (your player distribution model depends on your own ranking system, while the win/loss ratio does not). Or you could just ignore the fact that high handicaps aren't transitive as they are so rare anyway..


If a 7k is winning 50% of 3 stone games against a 4k, and losing 50% of them, why would you assume their rank should be increased? I suspect I've misunderstood your argument.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #76 Posted: Wed Mar 26, 2014 6:40 pm 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
mitsun wrote:
Mef wrote:
The KGS rating system aims to answer the latter question (predict the outcome of the next game), at the necessary expense of the former question (describe the result of the previous game).

Hmm, I thought I understood the KGS rating system until I read this. I would have said that the KGS rating system is designed to accurately describe the results of the previous games, with the assumption that this allows it to predict the outcome of the next game.

On the subject of a player whose rank changes drastically and discontinuously, that is an unusual case which violates the assumptions of the rating model, and I don't think is it particularly interesting to see how KGS or any other rating system copes with this anomaly.



This is one of those where perhaps you run into the definitions game, but what I mean is this:

KGS's rating is always how it predicts you will play your next game. This is one of the reasons why things like rank drift, etc occur. In spite of new knowledge learned, it never goes back to alter a previous prediction made. For instance in the situation that spawned this thread, if you wanted to have a better descriptive system you would take the set of known data, and find a changing rank to it (probably ending up with a model that has a good fit for a 24 hour step change then reverting).

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #77 Posted: Wed Mar 26, 2014 7:36 pm 
Dies with sente

Posts: 72
Liked others: 6
Was liked: 9
KGS: moboy78
IGS: moboy78
I don't really care about all the math and statistics in this thread thus far, but from what I see and have seen on this forum, I think it's safe to say that while the kgs rating system is mathematically sound, it generally makes people annoyed. I think I speak for just about every go player I've met when I say that if I have winning streak that lasts for an extended period of time, even if I play a lot of games on kgs a week (I realize that "a lot" is a vague term but it doesn't take a genius to think of a number of games big enough to qualify as "a lot" for that period of time), which as I understand kgs's rating system would make each win count for less than a win for someone who plays less games and wins, I would want my actual rank, not my rating, to reflect my increase in strength. If I can go around thrashing everyone my rank (let's assume for the sake of argument that I'm not really playing many people outside of my rank during this proposed winning streak), then it seems to me that I'd need to give those who were once considered my peers a handicap to keep the games fair, therein making me a rank above them.

I can speak from experience that this doesn't really happen on kgs. I remember when I was just a little ways away from becoming a 3 kyu on kgs (assuming my rank graph and the distances between ranks can be believed), I played quite a lot of games. I'd usually play just about every day of the week, and would always try to play enough go games a day so that I'd have won more games than I'd have lost (which, given the fact that I lost far less than I won games, meant my record on a "bad day" would look something like 2-1 in my favor). At that particular point in time I ended up getting a 12 or 14 game winning streak (I don't really remember which it was) and got promoted to 3 kyu. But by that time I was barely a 3 kyu, and after my streak ended I was almost a 4 kyu once more. My rank graph had barely changed at all, even though I could tell I'd gotten much stronger than I was before (and I'm not tooting my own horn,I was told this by others). This, understandably, really irked me.

I realize that my example might not be very scientific, but I do think it highlights a feeling many on kgs share.

And I also think that Mef's original example of GNUgo2 is absolutely worthless, because to go 6-263 in a single day clearly shows that the player has gotten weaker at the end of the day than when the day started. I understand that that losing streak was caused by the bot's owner rather than the bot itself, but a human player would have no such excuse. KGS's ranking system would've failed to punish a human player for such a long streak of losses, nor would it have properly rewarded a player for a similarly long streak of wins.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #78 Posted: Wed Mar 26, 2014 7:46 pm 
Lives with ko

Posts: 199
Liked others: 6
Was liked: 55
Rank: KGS 3 kyu
skydyr wrote:
uPWarrior wrote:
I don't have data but I think most people would agree both things apply: stones are not fully transitive and stronger players play less swingy games.

The second fact wouldn't impact the rating system at all, if a 7k wins 50% of the games against a 4k then he should be 6k and the same would be true for the 7d. (how easy it is to actually win 50% of the games against a player 3 stones stronger wouldn't have to be modeled at all)
Transitivity could be a problem, you could try to model that distribution instead of the player distribution as this one would not be biased (your player distribution model depends on your own ranking system, while the win/loss ratio does not). Or you could just ignore the fact that high handicaps aren't transitive as they are so rare anyway..


If a 7k is winning 50% of 3 stone games against a 4k, and losing 50% of them, why would you assume their rank should be increased? I suspect I've misunderstood your argument.


I wanted to write "if a 7k wins 50% of the games against a 4k with 2 stones" but forgot the last part.

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #79 Posted: Thu Mar 27, 2014 7:24 am 
Lives with ko

Posts: 248
Liked others: 23
Was liked: 148
Rank: DGS 2 kyu
Universal go server handle: Polama
Mef wrote:
What we have at the core is two questions: "How strong was the bot performing at a given time?" vs. "How do you expect the bot to perform on its next game?"

Many people are worried about the former and this is related to what Polama is calculating. The performance of the bot on that day was clearly well below 11k. This is very easy to show with very high statistical certainty.

The other question is related, however it is not the same. Likewise, when you calculate the expected result it is also not the same.


Well summarized!

Quote:
If we were to look for analogies, the closest we will probably find to something like this is a sports injury. If a player is injured, their performance may suffer a sudden drastic drop, but you would not expect this to be representative of how they will be expected to perform if and when they recover.


I'd also been thinking about that analogy, with baseball season starting up. This general topic is debated endlessly there: if a good player has a terrible year or an injury, what do we expect from him the next year? Sometimes they recover fully, sometimes they don't.

Quote:
The KGS rating system aims to answer the latter question (predict the outcome of the next game), at the necessary expense of the former question (describe the result of the previous game). This of course always implies there is a bit of regression to the mean ever-present in all of its calculations.


I mostly just enjoyed reasoning through the math and its implications, but if I came to any conclusion it's that the streak was too extreme to assume a bounce-back was imminent. Modeling explicitly by time, sure, one day is a blip. But modeling by game, 260 isn't, even out of 17K. To switch to the sports metaphor:

If a leadoff hitter usually bats .300 and goes .200 in a month span, I'm going to predict that he'll be right back to .300 next month. That sort of variation occurs. We should trust his track record.

If he instead goes .015 in a month, I don't expect him to immediately jump back to .300. If I'm the manager, I'm not going to bat him leadoff until he demonstrates he can hit again over at least a week or two. The signal that something has fundamentally changed is just too strong to ignore. Maybe he does return to full form tomorrow. But until he demonstrates some change, I think predicting an imminent bounce-back is too aggressive, that you'd get more predictions right by saying "ok, he's not very good right now and won't play well next game".

I agree the formula is probably reasonable for normal, human levels of variation. But at these levels of play and variation it looks overly stubborn in its insistence for hundreds of games at a time that the next one will be different. Ok, the next one. Ok the next one...

Top
 Profile  
 
Offline
 Post subject: Re: A Curious Case Study in KGS Ranks
Post #80 Posted: Thu Mar 27, 2014 8:46 am 
Gosei
User avatar

Posts: 1585
Location: Barcelona, Spain (GMT+1)
Liked others: 577
Was liked: 298
Rank: KGS 5k
KGS: RBerenguel
Tygem: rberenguel
Wbaduk: JohnKeats
Kaya handle: RBerenguel
Online playing schedule: KGS on Saturday I use to be online, but I can be if needed from 20-23 GMT+1
Polama wrote:
Mef wrote:
What we have at the core is two questions: "How strong was the bot performing at a given time?" vs. "How do you expect the bot to perform on its next game?"

Many people are worried about the former and this is related to what Polama is calculating. The performance of the bot on that day was clearly well below 11k. This is very easy to show with very high statistical certainty.

The other question is related, however it is not the same. Likewise, when you calculate the expected result it is also not the same.


Well summarized!

Quote:
If we were to look for analogies, the closest we will probably find to something like this is a sports injury. If a player is injured, their performance may suffer a sudden drastic drop, but you would not expect this to be representative of how they will be expected to perform if and when they recover.


I'd also been thinking about that analogy, with baseball season starting up. This general topic is debated endlessly there: if a good player has a terrible year or an injury, what do we expect from him the next year? Sometimes they recover fully, sometimes they don't.

Quote:
The KGS rating system aims to answer the latter question (predict the outcome of the next game), at the necessary expense of the former question (describe the result of the previous game). This of course always implies there is a bit of regression to the mean ever-present in all of its calculations.


I mostly just enjoyed reasoning through the math and its implications, but if I came to any conclusion it's that the streak was too extreme to assume a bounce-back was imminent. Modeling explicitly by time, sure, one day is a blip. But modeling by game, 260 isn't, even out of 17K. To switch to the sports metaphor:

If a leadoff hitter usually bats .300 and goes .200 in a month span, I'm going to predict that he'll be right back to .300 next month. That sort of variation occurs. We should trust his track record.

If he instead goes .015 in a month, I don't expect him to immediately jump back to .300. If I'm the manager, I'm not going to bat him leadoff until he demonstrates he can hit again over at least a week or two. The signal that something has fundamentally changed is just too strong to ignore. Maybe he does return to full form tomorrow. But until he demonstrates some change, I think predicting an imminent bounce-back is too aggressive, that you'd get more predictions right by saying "ok, he's not very good right now and won't play well next game".

I agree the formula is probably reasonable for normal, human levels of variation. But at these levels of play and variation it looks overly stubborn in its insistence for hundreds of games at a time that the next one will be different. Ok, the next one. Ok the next one...


One of my most "staty" friends works in models for insurance. He told me once about Large deviations theory (wikipedia link.) It's relatively close to this idea of seeing such numbers and wondering "WTF" while doing a relevant model for it.

_________________
Geek of all trades, master of none: the motto for my blog mostlymaths.net

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 81 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: Majestic-12 [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group