Whole History Rating

pwaldron · Post by **pwaldron** » Mon Jun 21, 2010 11:47 am

RobertJasiek wrote: It defies your dream but insignificance should be taken into account instead of being overlooked.

It also defies mathematical theorems. It is always better to have more information (in form of game results). Your belief to the contrary is irrelevant.

prokofiev · Post by **prokofiev** » Mon Jun 21, 2010 11:56 am

prokofiev wrote:- I'm confused by the example rating graph for CrazyStone in the paper. It seems to predict the large rise in CrazyStone's rating during one period of inactivity but not during another. That is, is that graph not "the rating this system would give CrazyStone at each point in time if we had it running and updating" but rather something bizarre like "what CrazyStone's rating seems to most likely have been at each point in time given the later data as well"? (Is that what is meant by "a posteriori" in the paper?)

Answering my own question (apologies):

The second "quote" above is in fact correct, but this is not a bug, it's a feature. The model seeks better & better approximations of the whole rating graph because it takes into account the likelihood of the ratings varying (e.g. slowly varying is more likely than quickly). To get a better approximation now, a better approximation in the past is desired too.

(Also, that isn't really what "a posteriori" refers to in the paper.)

pwaldron · Post by **pwaldron** » Mon Jun 21, 2010 1:24 pm

prokofiev wrote:Also, that isn't really what "a posteriori" refers to in the paper.

The posterior function is a statistical term. It represents an updated probability based on what you knew before (called the prior function), modified by some new information (in this case game results).

prokofiev · Post by **prokofiev** » Mon Jun 21, 2010 2:30 pm

pwaldron wrote:
prokofiev wrote:Also, that isn't really what "a posteriori" refers to in the paper.
The posterior function is a statistical term. It represents an updated probability based on what you knew before (called the prior function), modified by some new information (in this case game results).

Thanks. I'd realized the meaning, but still didn't connect the term with prior!

RobertJasiek · Post by **RobertJasiek** » Mon Jun 21, 2010 2:43 pm

Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.

prokofiev, I want something stronger than weak confidence parameters, which are a makeshift measure.

pwaldron, maybe in theory there are more information is better theorems but currently rating systems are so far from perfect that a more modest approach makes it easier to design better systems. When we will have them, one can still come back to the low confidence sparse data noise and see if one can explain them already well.

Harleqin · Post by **Harleqin** » Mon Jun 21, 2010 3:14 pm

It seems to me that Robert's fears are quite inspecific. I can see how they apply to the current systems, but I do not see them as an impediment to studying better ones.

Of course, looking how the new system handles sparse local data is an important subtopic.

Perhaps we could now have a more detailed look at the paper and get an impression on the different parts of the system and how they fit together.

Sverre · Post by **Sverre** » Mon Jun 21, 2010 4:11 pm

RobertJasiek wrote:Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.

OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?

topazg · Post by **topazg** » Mon Jun 21, 2010 4:26 pm

Sverre wrote:
RobertJasiek wrote:Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.
OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?

Well, I'm guessing the 5 or 6 games I play a year would probably get me booted anyway ..

pwaldron · Post by **pwaldron** » Mon Jun 21, 2010 5:02 pm

RobertJasiek wrote: pwaldron, maybe in theory there are more information is better theorems but currently rating systems are so far from perfect that a more modest approach makes it easier to design better systems. When we will have them, one can still come back to the low confidence sparse data noise and see if one can explain them already well.

Robert, chant ten times: It is always better to have more information.

The very worst you can have is a game prediction algorithm that flips a coin to predict the winner. Every additional game adds more information, and it cannot make a system less accurate. Some game results are more useful than others in pinning down ratings, but they all have value and it is foolish to throw any away. If the information is not useful then it does little to reduce the uncertainty in the resulting estimated parameters (i.e., ratings) but it's never worse to have the information than not to have it.

RobertJasiek · Post by **RobertJasiek** » Tue Jun 22, 2010 12:20 am

Sverre wrote: OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?

No. One would have to think about it to set useful values. I have wanted to encourage such thinking; I have not carried it out in detail myself.

RobertJasiek · Post by **RobertJasiek** » Tue Jun 22, 2010 12:31 am

pwaldron wrote: Robert, chant ten times: It is always better to have more information.

My first statistics book had a nice example: Estimate the distance between two towns. First you take a rough look: "The next town is about 10km afar." The you measure your town's mediaeval wall: "It is 30cm thick". Now you conclude: "The distance is 10km + 30cm = 10.0003km."

Likewise if you have two isolated players who claim to 5k each and their total game data consist of exactly 1 game between themselves, you cannot connect that information to a huge pool of 5k players elsewhere.

Chant ten times: Strongly disconnected data should not be compared.:)

Every additional game adds more information, and it cannot make a system less accurate.

The problem lies in the system itself. If it is not good enough, then it does not interprete sparse data correctly. One must not overinterpret such a system by feeding it with also the sparse data.

Harleqin · Post by **Harleqin** » Tue Jun 22, 2010 2:49 am

Robert, you seem to presume a weakness of the system before you have even looked at it.

In my as yet rough understanding, each game result is a data point. If only few data points are directly connected to a player, then that player's resulting rating graph will be easily moved with further (even indirect) data. Game results against this player will therefore naturally have little impact on the rating graph of a player with more games.

I understand that you have made bad experience with ELO-like systems. My impression is that this kind of problems is naturally covered by a WHR-like approach.

We shall keep this potential problem in mind, but I would like to move on to a more detailed look at the algorithm now.

RobertJasiek · Post by **RobertJasiek** » Tue Jun 22, 2010 3:03 am

I have not referred to only one particular rating system but to rating systems in general.

Li Kao · Post by **Li Kao** » Tue Jun 22, 2010 4:31 am

One problem with most ranking systems is how to anchor them. On online servers you can anchor some players who don't improve much but are very active and anchoring bots. Anchoring RL systems is much harder.
Perhaps define some percentiles for rank intervals?

Liisa · Post by **Liisa** » Tue Jun 22, 2010 8:39 pm

Li Kao wrote:One problem with most ranking systems is how to anchor them. On online servers you can anchor some players who don't improve much but are very active and anchoring bots. Anchoring RL systems is much harder.
Perhaps define some percentiles for rank intervals?

Anchoring is not necessary because we can just let the system to float freely. Mathematical rating system should not have any direct and fixed relationship with kyuu-dan ranks (that are subjective honorary titles). If we try to force that relationship, it will just decrease the reliability of the mathematical system. (We play handicap games in tournaments only when we are beginner double digit kyuus!)

And the good thing of plain and simple Elo is that even though we cannot deduce from Elo exact probability of beating specific opponent. We can always put players in very specific order within certain subpopulation. (This is the reason why GoR works like magic!) And there are always enough traffic between subpopulations (e.g. via EGC) so that we can calibrate them to match roughly each other if that is necessary.

But I agree that history approach has it's merits. The best way is to calculate simultaneously normal Elo and rating that includes enough history (a year or so to the past) and put both figures to the same graph.

Life In 19x19

Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating

Re: Whole History Rating