Can you explain/elaborate on this?Rémi wrote:This would be unlike KGS, where the best way to increase one's rating is to stop playing.
Rémi
Whole History Rating open source implementation.
-
hyperpape
- Tengen
- Posts: 4382
- Joined: Thu May 06, 2010 3:24 pm
- Rank: AGA 3k
- GD Posts: 65
- OGS: Hyperpape 4k
- Location: Caldas da Rainha, Portugal
- Has thanked: 499 times
- Been thanked: 727 times
Re: Whole History Rating open source implementation.
-
Rémi
- Lives with ko
- Posts: 171
- Joined: Sat Jan 14, 2012 4:11 pm
- Rank: KGS 4 kyu
- GD Posts: 0
- Has thanked: 32 times
- Been thanked: 120 times
- Contact:
Re: Whole History Rating open source implementation.
With the KGS rating system (not WHR), if you stop playing, your rating improves like your past opponents. So if you want your rating to make fast progress you can select opponents who are making fast progress, and then stop playing.hyperpape wrote:Can you explain/elaborate on this?Rémi wrote:This would be unlike KGS, where the best way to increase one's rating is to stop playing.
Rémi
Rémi
-
yoyoma
- Lives in gote
- Posts: 653
- Joined: Mon Apr 19, 2010 8:45 pm
- GD Posts: 0
- Location: Austin, Texas, USA
- Has thanked: 54 times
- Been thanked: 213 times
Re: Whole History Rating open source implementation.
I found what look like some numerical stability problems. I had similar problems when I implemented this as well with Newton's method failing or oscillating. For example:
player vs anchor, 20 even games, 50% wins
180 days later:
player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins
This seems to break the code. I ended up using a slower but more stable minorize-majorize algorithm.
player vs anchor, 20 even games, 50% wins
180 days later:
player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins
This seems to break the code. I ended up using a slower but more stable minorize-majorize algorithm.
Code: Select all
require 'whole_history_rating'
@whr = WholeHistoryRating::Base.new
for game in (1..10) do
@whr.create_game("anchor", "player", "B", 1, 0)
@whr.create_game("anchor", "player", "W", 1, 0)
end
for game in (1..10) do
@whr.create_game("anchor", "player", "B",180, 600)
@whr.create_game("anchor", "player", "W",180, 600)
end
for i in (1..10) do
@whr.iterate(10)
print @whr.ratings_for_player("anchor"), " "
print @whr.ratings_for_player("player"), "\n"
end
/var/lib/gems/1.9.1/gems/whole_history_rating-0.1.2/lib/whole_history_rating/player.rb:149:in `block in update_by_ndim_newton': uninitialized constant WholeHistoryRating::Player::WHR (NameError)
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
Yes, giving it crazy input can produce crazy output. I should have documented the handicap parameter better, but I'm not sure what the exact range is.yoyoma wrote: player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins
On GoShrine, handicap values for a single stone range roughly between 30-60 elo, depending on the strength of the players. So 600 elo is quite a bit. I've yet to see it go unstable with real data.
I'm guessing what is happening is that we're running into floating point precision issues with certain params. If stability does end up being a problem for real data, I'd definitely take a deeper look at it, though I might need some help from Remi.
-Pete
Creator of GoShrine
-
hyperpape
- Tengen
- Posts: 4382
- Joined: Thu May 06, 2010 3:24 pm
- Rank: AGA 3k
- GD Posts: 65
- OGS: Hyperpape 4k
- Location: Caldas da Rainha, Portugal
- Has thanked: 499 times
- Been thanked: 727 times
Re: Whole History Rating open source implementation.
So you're not asserting this is true of ordinary players who have (previously) played opponents selected more or less at random?Rémi wrote:With the KGS rating system (not WHR), if you stop playing, your rating improves like your past opponents. So if you want your rating to make fast progress you can select opponents who are making fast progress, and then stop playing.hyperpape wrote:Can you explain/elaborate on this?Rémi wrote:This would be unlike KGS, where the best way to increase one's rating is to stop playing.
Rémi
Rémi
-
yoyoma
- Lives in gote
- Posts: 653
- Joined: Mon Apr 19, 2010 8:45 pm
- GD Posts: 0
- Location: Austin, Texas, USA
- Has thanked: 54 times
- Been thanked: 213 times
Re: Whole History Rating open source implementation.
60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.pete wrote:Yes, giving it crazy input can produce crazy output. I should have documented the handicap parameter better, but I'm not sure what the exact range is.yoyoma wrote: player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins
On GoShrine, handicap values for a single stone range roughly between 30-60 elo, depending on the strength of the players. So 600 elo is quite a bit. I've yet to see it go unstable with real data.
I'm guessing what is happening is that we're running into floating point precision issues with certain params. If stability does end up being a problem for real data, I'd definitely take a deeper look at it, though I might need some help from Remi.
-Pete
- quantumf
- Lives in sente
- Posts: 844
- Joined: Tue Apr 20, 2010 11:36 pm
- Rank: 3d
- GD Posts: 422
- KGS: komi
- Has thanked: 180 times
- Been thanked: 151 times
Re: Whole History Rating open source implementation.
So after 5 games (3 wins 2 losses) I still don't have a rank. This is somewhat frustrating and not encouraging me to carry on trying. In general I prefer servers that allow one to self-select a starting rank, and find KGS quite annoying, but even KGS gives me a rank after 2 games. Kind of off-topic, but relevant in the sense that there are usability considerations that override perfection/accuracy in ranking systems.
-
Rémi
- Lives with ko
- Posts: 171
- Joined: Sat Jan 14, 2012 4:11 pm
- Rank: KGS 4 kyu
- GD Posts: 0
- Has thanked: 32 times
- Been thanked: 120 times
- Contact:
Re: Whole History Rating open source implementation.
Newton's method is very efficient but tricky. In order to guarantee it works, it is necessary to check that the Newton iteration brings an improvement in the log-likelihood. If it does not, a fallback method should be used (such as a line search in the gradient direction).yoyoma wrote:I found what look like some numerical stability problems. I had similar problems when I implemented this as well with Newton's method failing or oscillating.
IIRC, in my implementation I add a small negative constant to the diagonal of the Hessian before inversion. This prevents instability very well, at almost no cost in terms of efficiency. Maybe a good fallback method would be to increase this additional diagonal until the Newton's step increases the log-likelihood.
Rémi
-
Rémi
- Lives with ko
- Posts: 171
- Joined: Sat Jan 14, 2012 4:11 pm
- Rank: KGS 4 kyu
- GD Posts: 0
- Has thanked: 32 times
- Been thanked: 120 times
- Contact:
Re: Whole History Rating open source implementation.
If you don't play on KGS, your rating will improve like your opponents.hyperpape wrote:So you're not asserting this is true of ordinary players who have (previously) played opponents selected more or less at random?
Rémi
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
KGS and WHR have a different elo scale, I believe. The total spread of ranks from my 40k games on GoShrine is ~2000 elo which, if spread evenly, is 50 elo per rank/stone.yoyoma wrote: 60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.
Again, I'd be interested to know if you see this with real data. The example you propose is not completely impossible in general usage, but is one I would certainly not see on GoShrine (600 elo is a 15 stone handicap at 25 kyu),
-Pete
Creator of GoShrine
-
Rémi
- Lives with ko
- Posts: 171
- Joined: Sat Jan 14, 2012 4:11 pm
- Rank: KGS 4 kyu
- GD Posts: 0
- Has thanked: 32 times
- Been thanked: 120 times
- Contact:
Re: Whole History Rating open source implementation.
How did you select the volatility meta-parameter of WHR? handicap values?pete wrote:KGS and WHR have a different elo scale, I believe. The total spread of ranks from my 40k games on GoShrine is ~2000 elo which, if spread evenly, is 50 elo per rank/stone.yoyoma wrote: 60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.
Again, I'd be interested to know if you see this with real data. The example you propose is not completely impossible in general usage, but is one I would certainly not see on GoShrine (600 elo is a 15 stone handicap at 25 kyu),
-Pete
In my experiments, it was very clear that the handicap value changes a lot with player strength, and also volatility. When choosing the volatility in order to optimize prediction quality over the KGS database, it was too low (14 Elo^2/Day) for beginners, so it produced very "compressed" ratings.
For a rating system to properly understand the variations of strength in a pool of players that mixes beginners and experts, it is really necessary to consider that the strengths of beginners changes faster than the strengths of experts.
Rémi
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
Thanks for the feedback, quantum. I'm leaning towards implementing what Remi suggested about using the lower confidence bound as the rating, which would give you a rank much sooner (though probably lower than your actual rank).quantumf wrote:So after 5 games (3 wins 2 losses) I still don't have a rank. This is somewhat frustrating and not encouraging me to carry on trying. In general I prefer servers that allow one to self-select a starting rank, and find KGS quite annoying, but even KGS gives me a rank after 2 games. Kind of off-topic, but relevant in the sense that there are usability considerations that override perfection/accuracy in ranking systems.
Creator of GoShrine
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
I did some optimization runs, and came up with 300 Elo^2/day, somehow. You can configure the library like this:Rémi wrote: How did you select the volatility meta-parameter of WHR? handicap values?
Code: Select all
@whr = WholeHistoryRating::Base.new(:w2 => 17)BTW, yoyoma, if you bump :w2 down below 100, your example remains stable.
Are you working on a new version of WHR that takes this into consideration?Rémi wrote: In my experiments, it was very clear that the handicap value changes a lot with player strength, and also volatility. When choosing the volatility in order to optimize prediction quality over the KGS database, it was too low (14 Elo^2/Day) for beginners, so it produced very "compressed" ratings.
For a rating system to properly understand the variations of strength in a pool of players that mixes beginners and experts, it is really necessary to consider that the strengths of beginners changes faster than the strengths of experts.
Rémi
Creator of GoShrine
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
As an aside, I'm glad to finally have some questions and feedback on this code that I struggled to write.
I'm certainly open to the possibility that there may be mistakes in the code, and would love to have someone other than me look it over. That's one of the reasons I open sourced it. If you see anything, or have questions, send a pull request on github, or just send me an email.
-Pete
I'm certainly open to the possibility that there may be mistakes in the code, and would love to have someone other than me look it over. That's one of the reasons I open sourced it. If you see anything, or have questions, send a pull request on github, or just send me an email.
-Pete
Creator of GoShrine
-
yoyoma
- Lives in gote
- Posts: 653
- Joined: Mon Apr 19, 2010 8:45 pm
- GD Posts: 0
- Location: Austin, Texas, USA
- Has thanked: 54 times
- Been thanked: 213 times
Re: Whole History Rating open source implementation.
I like playing around with rating math, sorry for the tldr text.pete wrote:KGS and WHR have a different elo scale, I believe. The total spread of ranks from my 40k games on GoShrine is ~2000 elo which, if spread evenly, is 50 elo per rank/stone.yoyoma wrote: 60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.
Again, I'd be interested to know if you see this with real data. The example you propose is not completely impossible in general usage, but is one I would certainly not see on GoShrine (600 elo is a 15 stone handicap at 25 kyu),
-Pete
I did convert from the KGS scale to the standard Elo scale, and it looks like your WHR code handicap parameter takes a standard Elo scale number.
KGS: P = 1 / ( 1 + e^(k*(RankB-RankA)) ) [k=0.85 for 30k-5k, k=1.3 for 2d+]
Elo: P = 1 / ( 1 + 10^((RankB-RankA)/400)) )
So for kyu players and 1 rank difference: RankB-RankA=1 and k=0.85. Then you can solve for what the Elo difference is. EGF has some statistics on even games here: http://gemma.ujf.cas.cz/~cieply/GO/statev.html
Generally for weaker kyu players the chance of upset is around 45%, for stronger players it goes down. I put the expected win rates for KGS and EGF formulas, along with the observed win rates for EGF tournaments here:
Code: Select all
| | KGS | EGF | EGF | KGS | EGF | EGF |
| | exp. | exp. | obs. | exp. | exp. | obs. |
| even game | win % | win % | win % | elo | elo | elo |
|-----------|-------|-------|-------|-------|-------|-------|
| 10k vs 9k | 30.0 | 33.9 | 44.8 | 148 | 116 | 36 |
| 5d vs 6d | 21.4 | 20.1 | 27.8 | 226 | 232 | 166 |
Remi do you have any numbers like this for observed KGS games to get numbers for Elo/Rank from them?