It is currently Sun May 11, 2025 8:07 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 16 posts ] 
Author Message
Offline
 Post subject: Alternate goals and alternate aims of rating systems
Post #1 Posted: Thu Mar 27, 2014 10:09 pm 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
As a thread in the KGS forum has kicked off a couple long discussions about rating systems there were a few lines in a few comments that I wanted to respond to but didn't because they would be tangential and honestly, overly pedantic. Nevertheless, because I think the conversations are worth having in their own right, I am starting a new thread. I'm not sure where the best forum for "general discussions of go rating systems" should be so I have chosen the general forum as a default.

To provide a brief background for those who are uninterested in Rating system argument minutiae:

- KGS has a sophisticated mathematical algorithm that aims to most accurately predict a game outcome between two arbitrary players on the server. The trade off is that the system can seem intractable at times much to the frustration of players who wish to make sense of what they must do to change their rank.

-Tygem has a very simple system that is easy to understand how and why ratings move, accepting the tradeoff of clarity for potentially inaccurate ranks and mismatches.

-Most other servers fall somewhere in between on these two ends of the spectrum (The notable exception to this is GoShrine, which to my knowledge falls even on the farther end of the spectrum than KGS, but it doesn't quite have the same player base so you rarely hear complaints about it.)


These above mentioned bullet points I feel we can all more or less agree on, and I don't want this to be another thread hashing out those point. Instead, during the discussion some alternate possibilities for rating systems have been discussed (though sometimes in jest, but I think they are worth considering) so I thought it would be fun to try and outline various scenarios or goals you might have for a rating system, and then discuss what might be a possible way to achieve that. This will hopefully lead to some fun thought experiments and interesting discussion.

There were quite a few things I thought were interesting, but to avoid running off on too many tangents I'll start slow. The first one that I'll throw out to consider comes from one of Bantari's comments (emphasis mine):

Bantari wrote:
Lets look at your various playing "modes" hypothetically. Lets say that: when you play only casual and fun games you play like 3d, when you play seriously you play like 5d, and when you play a mixture of both modes, you play like 4d. This is how it can hypothetically look when taken your history into account, and this is pretty much what you are saying as well. Now what you seem to want is a system which lets you generally play in the mix mode but ranks you as if you were constantly in the serious mode. This is not reasonable, and no system should do that.


Again, it's perhaps a bit pedantic of me...but I do try to err on the side against absolute statements, so this got me thinking: Could there be a time where you do want to do this? And If so how would you do it?.

As a discussion starting point, I will posit a time when I think you may want to do this:

Imagine you are a go teacher and you you have a class of pupils. You are aiming to select for the most promising pupils who you will then encourage to move on to either a more advanced group or perhaps take dedicated lessons. In this case you would be trying to select for those who have the highest "peak" potential. In that case it may be useful to figure out who, when playing at their best, is the strongest (as opposed to who, on average, is strongest).

So, the questions now become:

- What type of rating system would be best for selecting for this top "peak potential" candidates?
- What type of challenges might one face when implementing such a system?
- What other situations might one want to separate out the "strongest" one plays from the "average" one plays?



Aside from this, if anyone has some other interesting scenarios or other interesting goals a rating system may want to have, I would be interested in hearing them!

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #2 Posted: Fri Mar 28, 2014 1:23 am 
Judan

Posts: 6270
Liked others: 0
Was liked: 797
Even if the basic theory of a rating system is sound, it can still be a failure if its parameters are set wrongly. For example:

* In a KGS-like rating system, the parameters are set wrongly if they force part of the players to play 1000+ games to improve a rank. 1000 is just an extreme number. The best maximal number for any player would be set by a compromise, so that undesired objective side effects, such as too many players restarting with new accounts to circumvent the rating system, do not occur.

* In a Tygem-like rating system, the parameters are set wrongly if one win or loss moves a player's rank by 7 ranks. Again, this is just an extreme number, but you get the idea: a good parameter must be set so that people do not run away from or circumvent the system.

Parameters are not god-given by the programmer, but parameters must be evaluated and adjusted properly. This is so for every rating system. There is no good rating system without proper calibration of the parameters in their conflict with human preferences.

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #3 Posted: Fri Mar 28, 2014 3:11 am 
Oza

Posts: 2180
Location: ʍoquıɐɹ ǝɥʇ ɹǝʌo 'ǝɹǝɥʍǝɯos
Liked others: 237
Was liked: 662
Rank: AGA 5d
GD Posts: 4312
Online playing schedule: Every tenth February 29th from 20:00-20:01 (if time permits)
As Bantari says, it is not only unreasonable, it is impractical. A go server, by its very nature, cannot be a source for a reliable rating across all games. Serious competitive games are rarely played online and when they are the real world rankings are usually used to determine handicaps or, more likely, the game is just an even game. So expecting an online server to provide reliable rankings in all scenarios is simply impractical in my opinion. All they can be expected to do is provide a fairly accurate assessment of the handicap which should be used to provide an enjoyable game between two players.

I believe that playing online is an excellent way to improve one's skills, but I don't think that using online rankings to absolutely determine one's strength is a good idea. This can only be achieved by serious over-the-board play. If you really want to push for more reliability with them you need to have an additional parameter in the setting up of an account. A more of play for each account would need to be selected and that account can only play games within certain time limits (the only way to judge seriousness online as far as I can see). This would require everyone to have separate accounts for serious and fun games.

_________________
Still officially AGA 5d but I play so irregularly these days that I am probably only 3d or 4d over the board (but hopefully still 5d in terms of knowledge, theory and the ability to contribute).

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #4 Posted: Fri Mar 28, 2014 6:35 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
The question of determining peaks is an interesting one, and, as you say, relevant to the question of identifying potential and setting goals for improvement. Most of us underperform.

I am sure that there is a literature on this. When I was a kid, I read about how to walk along the beach and remain close to the incoming waves but not get your feet wet. You could see in the sand where the highest waves had come and could stay close to that line but on the dry side of it. (That worked because the tide did not change very quickly. It is not infallible, however. When I was in Hawai'i there were people who were sitting or lying on dry beach who were swept out to sea by killer waves.)

Economists may be interested in the potential performance of an economy or sector, and thus be interested in the upper envelope of trend data. The upper envelope of climate data is also important, as it is the extremes that cause or accompany natural disasters.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


This post by Bill Spight was liked by: Mef
Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #5 Posted: Fri Mar 28, 2014 6:57 am 
Lives with ko

Posts: 199
Liked others: 6
Was liked: 55
Rank: KGS 3 kyu
I guess you could design a rating system that provides people with a confidence interval instead of a single number, sort of not hiding the volatility of ones rating. In that scenario you could rate someone as [3d-5d] instead of 4d, or [4d-5d], or even [1k-5d] if they are somehow likely to be playing drunk.
I would say that trying to get any finer granularity (e.g. trying to keep a rating distribution per player) will likely be impossible due to lack of data.


Last edited by uPWarrior on Fri Mar 28, 2014 9:57 am, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #6 Posted: Fri Mar 28, 2014 7:35 am 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
DrStraw wrote:
If you really want to push for more reliability with them...


I don't!

(at least not in this thread)


This was meant to be "What other goals might you want to achieve with a rating system?" It also wasn't meant to only apply to servers. Clubs might want to do something special too.

One simple example: A streak limiting system. You could design a system that goes out of its way to pair people who are starting winning or losing steaks (aiming to keep the size and frequency of steaks at a minimum).

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #7 Posted: Fri Mar 28, 2014 9:35 am 
Lives in sente
User avatar

Posts: 866
Liked others: 318
Was liked: 345
Mef wrote:
One simple example: A streak limiting system. You could design a system that goes out of its way to pair people who are starting winning or losing steaks (aiming to keep the size and frequency of steaks at a minimum).

IMHO, you are overthinking this. Nobody is asking for streak busters. We just don't see the harm in promoting somebody on a hot streak. If the promotion is valid, they will prove it. If not, they will regress to their old rank naturally. Having a non-mathematically optimal rating for a few games doesn't hurt anybody. Encouraging improvement through positive reinforcement could have lasting value.

Your math ain't wrong. Your incentives are.

_________________
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #8 Posted: Fri Mar 28, 2014 11:10 am 
Lives in gote

Posts: 409
Liked others: 29
Was liked: 182
GD Posts: 1072
Mef wrote:
So, the questions now become:

- What type of rating system would be best for selecting for this top "peak potential" candidates?
- What type of challenges might one face when implementing such a system?
- What other situations might one want to separate out the "strongest" one plays from the "average" one plays?


Several years ago there was a good article in the Nordic Go Journal about the time evolution of people's ratings. It turns out that the data was well fit by a decaying exponential towards a terminal strength. If you're looking to predict peak rating then one way would be to fit the data we do have on a player's rating and extract the terminal strength value.


This post by pwaldron was liked by: Mef
Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #9 Posted: Fri Mar 28, 2014 11:43 am 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
wineandgolover wrote:
]
IMHO, you are overthinking this. Nobody is asking for...


Just to reiterate, because there still appears to be some confusion I am not trying to start yet another thread about the same tired arguments about rating systems...There's plenty of other places to talk about those. This was meant as a thread to discuss other questions/issues/etc you might want a rating system to address in certain cases, and how you would go about doing this.

The streak busting and peak prediction are just two examples of an alternate goals you might have in mind.

Another possible question you might want to answer - "Who was the most valuable player to their AGA city league team last year?"

This question would be fundamentally different from asking "Who performed the strongest?" and different still from "Who would we expect to win the championship if the games were replayed?"

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #10 Posted: Fri Mar 28, 2014 12:10 pm 
Oza

Posts: 2180
Location: ʍoquıɐɹ ǝɥʇ ɹǝʌo 'ǝɹǝɥʍǝɯos
Liked others: 237
Was liked: 662
Rank: AGA 5d
GD Posts: 4312
Online playing schedule: Every tenth February 29th from 20:00-20:01 (if time permits)
Mef wrote:
This was meant as a thread to discuss other questions/issues/etc you might want a rating system to address in certain cases, and how you would go about doing this.


Okay, since you put it that was I only want a rating system to do one thing: give a good approximation of the handicap I should give or take against a potential opponent.

A corollary to this, but not part of the need per se, is that by determining handicaps I can also determine a rank relative to some strong player whose ability is stable. It is up to others to determine that staionary point that individual represents, but from that I can assign a dan/kyu rank to myself.

Anything beyond this is unneccary from a rating system. A ranking system, on the other hand, is usually considered a lifetime achievement award and represents the highest level a person has acheived during their live. The two should not be confused.

_________________
Still officially AGA 5d but I play so irregularly these days that I am probably only 3d or 4d over the board (but hopefully still 5d in terms of knowledge, theory and the ability to contribute).

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #11 Posted: Fri Mar 28, 2014 12:22 pm 
Oza

Posts: 2356
Location: Ireland
Liked others: 662
Was liked: 442
Universal go server handle: Boidhre
Mef wrote:
One simple example: A streak limiting system. You could design a system that goes out of its way to pair people who are starting winning or losing steaks (aiming to keep the size and frequency of steaks at a minimum).


Isn't there natural steak limiting in go though? At sudden break points your handicap versus everyone else changes. I mean, you constantly see this on KGS with a player having a good stretch brought to a sudden halt as soon as they dip their toe into the next rank. If the streaks are limited to ratings movements within one stone, is there a problem other than people not understanding statistics?

You could argue for moves to komi changes to reflect ratings changes between than dividing people into ranks one stone apart but I'm not convinced this gives you much for the amount of complication it introduces.

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #12 Posted: Fri Mar 28, 2014 1:09 pm 
Beginner

Posts: 4
Liked others: 0
Was liked: 1
Mef wrote:
Just to reiterate, because there still appears to be some confusion I am not trying to start yet another thread about the same tired arguments about rating systems...


Sorry, but it's hard to believe you when you throw in your own personal opinions from other threads:
Mef wrote:
KGS has a sophisticated mathematical algorithm that aims to most accurately predict a game outcome between two arbitrary players on the server.


Mef wrote:
Tygem has a very simple system that is easy to understand how and why ratings move, accepting the tradeoff of clarity for potentially inaccurate ranks and mismatches.


Mef wrote:
These above mentioned bullet points I feel we can all more or less agree on, and I don't want this to be another thread hashing out those point.


You argue that you're not trying to start another argument about rating systems while still ignoring the idea that your opinions may not be universal.

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #13 Posted: Fri Mar 28, 2014 1:17 pm 
Gosei
User avatar

Posts: 1639
Location: Ponte Vedra
Liked others: 642
Was liked: 490
Universal go server handle: Bantari
DrStraw wrote:
As Bantari says, it is not only unreasonable, it is impractical. A go server

Exactly!! I probably used the wrong word of saying "system" in my post while I meant a generic "server" which was the context of the discussion my statement was made.

As Mef said - there might be situations in which you want to look and evaluate peak plays and periods, but a generic Go server should never do that, imho.

DrStraw wrote:
, by its very nature, cannot be a source for a reliable rating across all games. Serious competitive games are rarely played online and when they are the real world rankings are usually used to determine handicaps or, more likely, the game is just an even game. So expecting an online server to provide reliable rankings in all scenarios is simply impractical in my opinion. All they can be expected to do is provide a fairly accurate assessment of the handicap which should be used to provide an enjoyable game between two players.

I believe that playing online is an excellent way to improve one's skills, but I don't think that using online rankings to absolutely determine one's strength is a good idea. This can only be achieved by serious over-the-board play. If you really want to push for more reliability with them you need to have an additional parameter in the setting up of an account. A more of play for each account would need to be selected and that account can only play games within certain time limits (the only way to judge seriousness online as far as I can see). This would require everyone to have separate accounts for serious and fun games.

This is what I think as well.

What I can also add is that it would help if separate ratings were set for separate games, per account. So you don't need multiple accounts, just one account with a switch between what kind of game you wish to play at the moment. Blitz, small boards, and so on - all kinds of games which can introduce noise to a rating value. Ideal situation, imho, would be for each account to have multiple ratings. For example: by time controls, board size, and seriousness of the game. I understand that would complicate the matters tremendously in many ways, but might be worth it. I think some chess servers already have something like that, or at least separate blitz and regular game ratings.

_________________
- Bantari
______________________________________________
WARNING: This post might contain Opinions!!

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #14 Posted: Fri Mar 28, 2014 1:30 pm 
Gosei
User avatar

Posts: 1639
Location: Ponte Vedra
Liked others: 642
Was liked: 490
Universal go server handle: Bantari
Mef wrote:
wineandgolover wrote:
]
IMHO, you are overthinking this. Nobody is asking for...


Just to reiterate, because there still appears to be some confusion I am not trying to start yet another thread about the same tired arguments about rating systems...There's plenty of other places to talk about those. This was meant as a thread to discuss other questions/issues/etc you might want a rating system to address in certain cases, and how you would go about doing this.

The streak busting and peak prediction are just two examples of an alternate goals you might have in mind.

Another possible question you might want to answer - "Who was the most valuable player to their AGA city league team last year?"

This question would be fundamentally different from asking "Who performed the strongest?" and different still from "Who would we expect to win the championship if the games were replayed?"

Oh, I see what you mean.
I was mislead by the thread title. To me a "rating system" in Go is a system which serves mainly to assign people some values to help them figure out handicaps.

What you are talking about is another kind of algorithm - I think to avoid confusion we should call it something else, but it might be that I have to widen my definition.

Be it as it may, any such algorithm will start with the same thing - the bulk data about games played, and then it will evaluate this data. Conventional rating algorithm will try to assign each player a value as stated above. But you are right - there might be many other goals for other algorithms. You mention the MVP goal, which is a good one. I can see other goals - like in tournaments, you might want to give rewards based on longest winning streaks. Or in a club - for best improvement over a period of time. Or even for most games played. All of those are pretty trivial to program.

Very interesting.

_________________
- Bantari
______________________________________________
WARNING: This post might contain Opinions!!

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #15 Posted: Fri Mar 28, 2014 2:08 pm 
Lives in sente

Posts: 852
Location: Central Coast
Liked others: 201
Was liked: 333
Rank: KGS [-]
GD Posts: 428
pwaldron wrote:

Several years ago there was a good article in the Nordic Go Journal about the time evolution of people's ratings. It turns out that the data was well fit by a decaying exponential towards a terminal strength. If you're looking to predict peak rating then one way would be to fit the data we do have on a player's rating and extract the terminal strength value.



Interesting...so if I understand this correctly, you're talking about using the slope of a traditional rating graph over time to project an ultimate maximum value? I could see this being useful for the issue as stated (find students who project high early and focus efforts on them). I could also see this being used to rate training programs (how does "ultimate peak and time to 95% of peak compare before, during, and after training?)

My first thought was something similar to warrior's idea-- use a system with a confidence interval and see whose 75% projection (or whatnot) was highest.

Top
 Profile  
 
Offline
 Post subject: Re: Alternate goals and alternate aims of rating systems
Post #16 Posted: Fri Mar 28, 2014 5:33 pm 
Oza
User avatar

Posts: 2414
Location: Tokyo, Japan
Liked others: 2351
Was liked: 1332
Rank: Jp 6 dan
KGS: ez4u
pwaldron wrote:

Several years ago there was a good article in the Nordic Go Journal about the time evolution of people's ratings. It turns out that the data was well fit by a decaying exponential towards a terminal strength. If you're looking to predict peak rating then one way would be to fit the data we do have on a player's rating and extract the terminal strength value.

The article 'Search for a Universal Rating Progress Function' in this issue?

_________________
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21


This post by ez4u was liked by: Mef
Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group