It also defies mathematical theorems. It is always better to have more information (in form of game results). Your belief to the contrary is irrelevant.RobertJasiek wrote: It defies your dream but insignificance should be taken into account instead of being overlooked.
Whole History Rating
-
pwaldron
- Lives in gote
- Posts: 409
- Joined: Wed May 19, 2010 8:40 am
- GD Posts: 1072
- Has thanked: 29 times
- Been thanked: 182 times
Re: Whole History Rating
- prokofiev
- Lives with ko
- Posts: 223
- Joined: Tue Apr 27, 2010 8:03 pm
- Rank: decent sdk
- GD Posts: 138
- Has thanked: 67 times
- Been thanked: 10 times
Re: Whole History Rating
Answering my own question (apologies):prokofiev wrote:- I'm confused by the example rating graph for CrazyStone in the paper. It seems to predict the large rise in CrazyStone's rating during one period of inactivity but not during another. That is, is that graph not "the rating this system would give CrazyStone at each point in time if we had it running and updating" but rather something bizarre like "what CrazyStone's rating seems to most likely have been at each point in time given the later data as well"? (Is that what is meant by "a posteriori" in the paper?)
The second "quote" above is in fact correct, but this is not a bug, it's a feature. The model seeks better & better approximations of the whole rating graph because it takes into account the likelihood of the ratings varying (e.g. slowly varying is more likely than quickly). To get a better approximation now, a better approximation in the past is desired too.
(Also, that isn't really what "a posteriori" refers to in the paper.)
-
pwaldron
- Lives in gote
- Posts: 409
- Joined: Wed May 19, 2010 8:40 am
- GD Posts: 1072
- Has thanked: 29 times
- Been thanked: 182 times
Re: Whole History Rating
The posterior function is a statistical term. It represents an updated probability based on what you knew before (called the prior function), modified by some new information (in this case game results).prokofiev wrote:Also, that isn't really what "a posteriori" refers to in the paper.
- prokofiev
- Lives with ko
- Posts: 223
- Joined: Tue Apr 27, 2010 8:03 pm
- Rank: decent sdk
- GD Posts: 138
- Has thanked: 67 times
- Been thanked: 10 times
Re: Whole History Rating
Thanks. I'd realized the meaning, but still didn't connect the term with prior!pwaldron wrote:The posterior function is a statistical term. It represents an updated probability based on what you knew before (called the prior function), modified by some new information (in this case game results).prokofiev wrote:Also, that isn't really what "a posteriori" refers to in the paper.
-
RobertJasiek
- Judan
- Posts: 6279
- Joined: Tue Apr 27, 2010 8:54 pm
- GD Posts: 0
- Been thanked: 797 times
- Contact:
Re: Whole History Rating
Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.
prokofiev, I want something stronger than weak confidence parameters, which are a makeshift measure.
pwaldron, maybe in theory there are more information is better theorems but currently rating systems are so far from perfect that a more modest approach makes it easier to design better systems. When we will have them, one can still come back to the low confidence sparse data noise and see if one can explain them already well.
prokofiev, I want something stronger than weak confidence parameters, which are a makeshift measure.
pwaldron, maybe in theory there are more information is better theorems but currently rating systems are so far from perfect that a more modest approach makes it easier to design better systems. When we will have them, one can still come back to the low confidence sparse data noise and see if one can explain them already well.
- Harleqin
- Lives in sente
- Posts: 921
- Joined: Sat Mar 06, 2010 10:31 am
- Rank: German 2 dan
- GD Posts: 0
- Has thanked: 401 times
- Been thanked: 164 times
Re: Whole History Rating
It seems to me that Robert's fears are quite inspecific. I can see how they apply to the current systems, but I do not see them as an impediment to studying better ones.
Of course, looking how the new system handles sparse local data is an important subtopic.
Perhaps we could now have a more detailed look at the paper and get an impression on the different parts of the system and how they fit together.
Of course, looking how the new system handles sparse local data is an important subtopic.
Perhaps we could now have a more detailed look at the paper and get an impression on the different parts of the system and how they fit together.
A good system naturally covers all corner cases without further effort.
- Sverre
- Lives with ko
- Posts: 193
- Joined: Thu Apr 22, 2010 1:04 pm
- Rank: 2d EGF and KGS
- GD Posts: 1005
- Universal go server handle: sverre
- Location: Trondheim, Norway
- Has thanked: 77 times
- Been thanked: 29 times
Re: Whole History Rating
OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?RobertJasiek wrote:Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.
- topazg
- Tengen
- Posts: 4511
- Joined: Wed Apr 21, 2010 3:08 am
- Rank: Nebulous
- GD Posts: 918
- KGS: topazg
- Location: Chatteris, UK
- Has thanked: 1579 times
- Been thanked: 650 times
- Contact:
Re: Whole History Rating
Well, I'm guessing the 5 or 6 games I play a year would probably get me booted anyway ..Sverre wrote:OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?RobertJasiek wrote:Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.
-
pwaldron
- Lives in gote
- Posts: 409
- Joined: Wed May 19, 2010 8:40 am
- GD Posts: 1072
- Has thanked: 29 times
- Been thanked: 182 times
Re: Whole History Rating
Robert, chant ten times: It is always better to have more information.RobertJasiek wrote: pwaldron, maybe in theory there are more information is better theorems but currently rating systems are so far from perfect that a more modest approach makes it easier to design better systems. When we will have them, one can still come back to the low confidence sparse data noise and see if one can explain them already well.
The very worst you can have is a game prediction algorithm that flips a coin to predict the winner. Every additional game adds more information, and it cannot make a system less accurate. Some game results are more useful than others in pinning down ratings, but they all have value and it is foolish to throw any away. If the information is not useful then it does little to reduce the uncertainty in the resulting estimated parameters (i.e., ratings) but it's never worse to have the information than not to have it.
-
RobertJasiek
- Judan
- Posts: 6279
- Joined: Tue Apr 27, 2010 8:54 pm
- GD Posts: 0
- Been thanked: 797 times
- Contact:
Re: Whole History Rating
No. One would have to think about it to set useful values. I have wanted to encourage such thinking; I have not carried it out in detail myself.Sverre wrote: OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?
-
RobertJasiek
- Judan
- Posts: 6279
- Joined: Tue Apr 27, 2010 8:54 pm
- GD Posts: 0
- Been thanked: 797 times
- Contact:
Re: Whole History Rating
My first statistics book had a nice example: Estimate the distance between two towns. First you take a rough look: "The next town is about 10km afar." The you measure your town's mediaeval wall: "It is 30cm thick". Now you conclude: "The distance is 10km + 30cm = 10.0003km."pwaldron wrote: Robert, chant ten times: It is always better to have more information.
Likewise if you have two isolated players who claim to 5k each and their total game data consist of exactly 1 game between themselves, you cannot connect that information to a huge pool of 5k players elsewhere.
Chant ten times: Strongly disconnected data should not be compared.:)
The problem lies in the system itself. If it is not good enough, then it does not interprete sparse data correctly. One must not overinterpret such a system by feeding it with also the sparse data.Every additional game adds more information, and it cannot make a system less accurate.
- Harleqin
- Lives in sente
- Posts: 921
- Joined: Sat Mar 06, 2010 10:31 am
- Rank: German 2 dan
- GD Posts: 0
- Has thanked: 401 times
- Been thanked: 164 times
Re: Whole History Rating
Robert, you seem to presume a weakness of the system before you have even looked at it.
In my as yet rough understanding, each game result is a data point. If only few data points are directly connected to a player, then that player's resulting rating graph will be easily moved with further (even indirect) data. Game results against this player will therefore naturally have little impact on the rating graph of a player with more games.
I understand that you have made bad experience with ELO-like systems. My impression is that this kind of problems is naturally covered by a WHR-like approach.
We shall keep this potential problem in mind, but I would like to move on to a more detailed look at the algorithm now.
In my as yet rough understanding, each game result is a data point. If only few data points are directly connected to a player, then that player's resulting rating graph will be easily moved with further (even indirect) data. Game results against this player will therefore naturally have little impact on the rating graph of a player with more games.
I understand that you have made bad experience with ELO-like systems. My impression is that this kind of problems is naturally covered by a WHR-like approach.
We shall keep this potential problem in mind, but I would like to move on to a more detailed look at the algorithm now.
A good system naturally covers all corner cases without further effort.
-
RobertJasiek
- Judan
- Posts: 6279
- Joined: Tue Apr 27, 2010 8:54 pm
- GD Posts: 0
- Been thanked: 797 times
- Contact:
Re: Whole History Rating
I have not referred to only one particular rating system but to rating systems in general.
- Li Kao
- Lives in gote
- Posts: 643
- Joined: Wed Apr 21, 2010 10:37 am
- Rank: KGS 3k
- GD Posts: 0
- KGS: LiKao / Loki
- Location: Munich, Germany
- Has thanked: 115 times
- Been thanked: 102 times
Re: Whole History Rating
One problem with most ranking systems is how to anchor them. On online servers you can anchor some players who don't improve much but are very active and anchoring bots. Anchoring RL systems is much harder.
Perhaps define some percentiles for rank intervals?
Perhaps define some percentiles for rank intervals?
Sanity is for the weak.
- Liisa
- Lives with ko
- Posts: 129
- Joined: Wed Jun 16, 2010 3:30 am
- Rank: EGF 1989 KGS 2d
- GD Posts: 0
- Location: Turku, Finland
- Has thanked: 12 times
- Been thanked: 21 times
- Contact:
Re: Whole History Rating
Anchoring is not necessary because we can just let the system to float freely. Mathematical rating system should not have any direct and fixed relationship with kyuu-dan ranks (that are subjective honorary titles). If we try to force that relationship, it will just decrease the reliability of the mathematical system. (We play handicap games in tournaments only when we are beginner double digit kyuus!)Li Kao wrote:One problem with most ranking systems is how to anchor them. On online servers you can anchor some players who don't improve much but are very active and anchoring bots. Anchoring RL systems is much harder.
Perhaps define some percentiles for rank intervals?
And the good thing of plain and simple Elo is that even though we cannot deduce from Elo exact probability of beating specific opponent. We can always put players in very specific order within certain subpopulation. (This is the reason why GoR works like magic!) And there are always enough traffic between subpopulations (e.g. via EGC) so that we can calibrate them to match roughly each other if that is necessary.
But I agree that history approach has it's merits. The best way is to calculate simultaneously normal Elo and rating that includes enough history (a year or so to the past) and put both figures to the same graph.