Whole History Rating open source implementation.

pete · #1

The core of the rating engine used by GoShrine is now open source. It's an implementation of Rémi Coulom's Whole History Rating method with support for handicaps.

https://github.com/goshrine/whole_history_rating

If anyone ends up using it anywhere, I'd love to hear about it.

-Pete

coffeeimam · #2

I believe this is the repo itself, for those who are looking: https://github.com/goshrine/whole_history_rating

pete · #3

coffeeimam wrote:

I believe this is the repo itself, for those who are looking: https://github.com/goshrine/whole_history_rating

Sheesh. Thanks coffeiamam! I updated the post.

Kaya.gs · #4

pete wrote:

coffeeimam wrote:

I believe this is the repo itself, for those who are looking: https://github.com/goshrine/whole_history_rating

Sheesh. Thanks coffeiamam! I updated the post.

Hey pete. I took a very brief look at the gem, but im not entirely sure about the whole solution.

I would consider WHR for Kaya, although im quite content with Glicko so far, ther eis the talk of having 2 separate ratings (blitz/ serious games) and for that , its possible different rating systems are a good solution.

As i had understood, WHR requires a massive recalculation with each result right? Have you benchmarked your solution?

pete · #5

Kaya.gs wrote:

As i had understood, WHR requires a massive recalculation with each result right? Have you benchmarked your solution?

I have about 40k games I'm putting in it right now. To update the two involved players' ratings after a new game is a tiny fraction of a second, and only grows with the number of games each player has played. Periodically you should also do iterations over the entire set of players/games. To do a full iteration over all 40000 games & 3000 players this still only takes a few seconds.

So overall, it's very fast, thanks to Rémi's sophisticated math techniques. And this is running in ruby, not the fastest language on the block.

-Pete

Rémi · #6

pete wrote:

The core of the rating engine used by GoShrine is now open source. It's an implementation of Rémi Coulom's Whole History Rating method with support for handicaps.

https://github.com/goshrine/whole_history_rating

If anyone ends up using it anywhere, I'd love to hear about it.

-Pete

Hi Pete,

I am glad you implemented my algorithm. I wonder how you deal with handicap. In my early experiments, I found that the value of a one-stone handicap is much higher for stronger players.

Rémi

pete · #7

Rémi wrote:

Hi Pete,

I am glad you implemented my algorithm. I wonder how you deal with handicap. In my early experiments, I found that the value of a one-stone handicap is much higher for stronger players.

Rémi

The ruby gem primarily supports a handicap that is a fixed elo amount for a given game. I didn't want to tie the gem to go, so the strength-based handicap system is a bit more complex, and not shown in the example.

To support handicaps that vary with player strength, the library supports passing in a Proc (a ruby callback of sorts) for the handicap attribute, which will be called with the game object as an argument. The Proc can then return a handicap value that is based on the game attributes (such as komi), and the current players' strengths.

On GoShrine, it uses this callback mechanism to implement a handicap that increases the value of each stone relative to the current (pre iteration) strength of the white player, and also factors in komi.

-Pete

palapiku · #8

I suppose the value of each handicap combination should itself be determined by Bayesian methods...

pete · #9

palapiku wrote:

I suppose the value of each handicap combination should itself be determined by Bayesian methods...

I did do some analysis based on game results to help determine the handicap weighting function in GoShrine. It's also constrained by the fact that ranks (non-elo, but kyu-dan ranks) themselves are determined by this same function (players of one rank apart should play an even game with one stone handicap), There is probably another curve to determine the change in elo as you add more handicap stones: is the value of the first handicap stone the same as the 9th? Probably not, but I'm treating them equal.

In the end I came up with a curve that both maps the handicaps in a way that fits the 1-stone = 1-rank idea, and also optimizes the prediction rates in my training data set.

-Pete

Rémi · **#10**

I noticed there is a bit of rating drift. With WHR, you can set bots to have a constant rating. That should prevent drift.

Also, I played a few games (account Remi) and still don't have a rating. Do games against bots count?

I like your go server, it makes me feel strong :-)

Rémi

pete · **#11**

Rémi wrote:

I noticed there is a bit of rating drift. With WHR, you can set bots to have a constant rating. That should prevent drift.

Also, I played a few games (account Remi) and still don't have a rating. Do games against bots count?

RE: rating drift, it seems that on the whole, the mean & spread of all ranks is fairly stable, but yes, bots do go up and down, when in theory they should be at a constant strength. I'm not sure how much of a problem this is.

19x19 games against bots do count, even with a handicap. You still have a ways to go in terms of bringing your rating's confidence above the threshold, which is currently a std deviation of about 120 elo. (Yours is at 168). Does those numbers seem reasonable based on the games you played? Computing the variance was a tricky part of the algorithm.

Thanks for adding links to GoShrine and the ruby library on the WHR page.

Rémi · **#12**

pete wrote:

19x19 games against bots do count, even with a handicap. You still have a ways to go in terms of bringing your rating's confidence above the threshold, which is currently a std deviation of about 120 elo. (Yours is at 168). Does those numbers seem reasonable based on the games you played?

Yes, but I would not use such a threshold, because the games I played already should give a rough indication of my level. If you compute confidence intervals, it would be nice to show/plot them. If you are concerned that some players may reach the top rank if they are lucky in their first games, you could simply sort them according to the lower bound of their confidence interval instead.

BTW, this is what I would do in a server: give the lower bound of the confidence interval as player rating. It has the advatange of giving players an incentive to play more, as the bound would slowly go down when they stop playing. This would be unlike KGS, where the best way to increase one's rating is to stop playing.

Rémi

pete · **#13**

Rémi wrote:

BTW, this is what I would do in a server: give the lower bound of the confidence interval as player rating. It has the advantage of giving players an incentive to play more, as the bound would slowly go down when they stop playing. This would be unlike KGS, where the best way to increase one's rating is to stop playing.

Rémi

Interesting idea. Are there any other servers that do this? It may have some weird side effects, though. Your rank would almost always go up after playing a game if it's been a while since your last game, even if you lost. And if you won an improbable game, would it be possible that your rank could go down, by spreading the variance more than increasing the mean?

Rémi · **#14**

pete wrote:

Rémi wrote:

BTW, this is what I would do in a server: give the lower bound of the confidence interval as player rating. It has the advantage of giving players an incentive to play more, as the bound would slowly go down when they stop playing. This would be unlike KGS, where the best way to increase one's rating is to stop playing.

Rémi

Interesting idea. Are there any other servers that do this? It may have some weird side effects, though. Your rank would almost always go up after playing a game if it's been a while since your last game, even if you lost. And if you won an improbable game, would it be possible that your rank could go down, by spreading the variance more than increasing the mean?

I don't think it is possible. Playing a game will always reduce the variance, so a win will always increase the rank.

In some really extreme cases, when the Gaussian approximation of the posterior is wrong, it may be that a loss will increase the lower confidence bound. But I don't expect that would happen much. And if that happens, the lower bound in question would be much lower than the real level of the player (because the rating would be extremely uncertain), so it should be no problem.

Nobody will complain if their rating increases more than they expect, anyway. What's important, is that it gives an incentive for playing.

Rémi

RobertJasiek · **#15**

Rémi wrote:

Nobody will complain if their rating increases more than they expect

Wrong. (If you don't remember, I have complained about such earlier.)

hyperpape · **#16**

Rémi wrote:

This would be unlike KGS, where the best way to increase one's rating is to stop playing.

Rémi

Can you explain/elaborate on this?

Rémi · **#17**

hyperpape wrote:

Rémi wrote:

This would be unlike KGS, where the best way to increase one's rating is to stop playing.

Rémi

Can you explain/elaborate on this?

With the KGS rating system (not WHR), if you stop playing, your rating improves like your past opponents. So if you want your rating to make fast progress you can select opponents who are making fast progress, and then stop playing.

Rémi

yoyoma · **#18**

I found what look like some numerical stability problems. I had similar problems when I implemented this as well with Newton's method failing or oscillating. For example:

player vs anchor, 20 even games, 50% wins
180 days later:
player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins

This seems to break the code. I ended up using a slower but more stable minorize-majorize algorithm.

Code:

require 'whole_history_rating'

@whr = WholeHistoryRating::Base.new

for game in (1..10) do
   @whr.create_game("anchor", "player", "B", 1, 0)
   @whr.create_game("anchor", "player", "W", 1, 0)
end
for game in (1..10) do
   @whr.create_game("anchor", "player", "B",180, 600)
   @whr.create_game("anchor", "player", "W",180, 600)
end

for i in (1..10) do
  @whr.iterate(10)
  print @whr.ratings_for_player("anchor"), "   "
  print @whr.ratings_for_player("player"), "\n"
end

/var/lib/gems/1.9.1/gems/whole_history_rating-0.1.2/lib/whole_history_rating/player.rb:149:in `block in update_by_ndim_newton': uninitialized constant WholeHistoryRating::Player::WHR (NameError)

pete · **#19**

yoyoma wrote:

player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins

Yes, giving it crazy input can produce crazy output. I should have documented the handicap parameter better, but I'm not sure what the exact range is.

On GoShrine, handicap values for a single stone range roughly between 30-60 elo, depending on the strength of the players. So 600 elo is quite a bit. I've yet to see it go unstable with real data.

I'm guessing what is happening is that we're running into floating point precision issues with certain params. If stability does end up being a problem for real data, I'd definitely take a deeper look at it, though I might need some help from Remi.

-Pete

hyperpape · **#20**

Rémi wrote:

hyperpape wrote:

Rémi wrote:

This would be unlike KGS, where the best way to increase one's rating is to stop playing.

Rémi

Can you explain/elaborate on this?

With the KGS rating system (not WHR), if you stop playing, your rating improves like your past opponents. So if you want your rating to make fast progress you can select opponents who are making fast progress, and then stop playing.

Rémi

So you're not asserting this is true of ordinary players who have (previously) played opponents selected more or less at random?

Whole History Rating open source implementation.

Who is online