Page 2 of 3
Re: AlphaZero paper published in journal Science
Posted: Sat Dec 08, 2018 8:54 am
by Bill Spight
dfan wrote:..."I am willing for my chance of winning to go down from 98% to 97% in return for winning by 10.5 points instead of 0.5". Due to the nature of the playing system, there's no good way to say "I have a 100% chance of winning, and now I want to maximize my score while retaining that 100% chance", although of course that statement is logically meaningful.
ez4u wrote:Bill Spight wrote:ez4u wrote:
The statements may be logically meaningful but they are trivial. Isn't the real challenge to make sense of a statement like, "I have a 51% chance of winning by 0.5 points by playing X and a 49% chance of winning by 1.5 points by playing Y. I want to maximize my score; which should I choose?"
The thing is, amateur dans play the late endgame almost perfectly; but even pros do not play the late endgame perfectly. Under those circumstances, if it's a close call in the late endgame between going for a ½ pt. win versus going for a 1½ pt. win, the extra point gives a margin of safety. At least for humans.
But most, if not all, modern top bots do not assume nearly perfect play when they calculate winrates. And they do not estimate the margin of safety by expected scores, but by percentages. As far as I can tell, the endgame, particularly the late endgame, is one of the places where humans play better than bots; life and death, semeai, and ladders being others. In all of these places, local reading can give the right global results. Bots excel at global reading, humans still excel at local reading.
If the discussion is about switching from a winrate strategy to a maximum point strategy, then the starting point is the fuseki not the late endgame.
To take your example, in general, in the opening the difference between estimated winrates of 51% and 49% is more indicative of the chances of winning than the difference between estimated results of 1½ pts. and ½ pt. But in the late endgame I think that the difference between estimated results (by current human pros) of 1½ pts. and ½ pt. is more indicative of the chances of winning than the difference between estimated winrates (by current top bots) of 51% and 49%. The reason lies in the reduction of the uncertainty in estimated point scores as the game goes on. Currently the uncertainty of estimated point scores is so great in the opening that no pros that I know of even attempt to estimate them. (The traditional approach is to estimate locally secure territory and to use that as one factor to consider.)
Re: AlphaZero paper published in journal Science
Posted: Sat Dec 08, 2018 1:19 pm
by seberle
I just finished reading the AlphaZero paper, which was fascinating. I have a couple of questions, if anyone happens to know more.
On page 2, they explain that each move is selected "either proportionally (for exploration) or greedily (for exploitation) with respect to the visit counts at the root state." What does that mean in layman's terms?
It's interesting that they abandoned symmetry because chess and shogi don't have symmetric boards. I wonder if AlphaZero has any idiosyncrasies, such as preferring a certain joseki in one corner, but a different variation in another corner. Did anyone read the supplemental material? Do they mention anything like this?
Re: AlphaZero paper published in journal Science
Posted: Sat Dec 08, 2018 3:55 pm
by dfan
seberle wrote:
On page 2, they explain that each move is selected "either proportionally (for exploration) or greedily (for exploitation) with respect to the visit counts at the root state." What does that mean in layman's terms?
Say that when deciding on its next move, it has considered 500 variations starting with move A, 300 variations starting with move B, and 200 starting with move A. (In general, it tries to look more at moves that look more promising, for obvious reasons.)
In the proportional case (this is "temperature = 1", if you see it elsewhere), it would pick move A with 50% probability, move B with 30% probability, and move C with 20% probability, proportionally to their visit counts. This emphasizes
exploration, and is done early in self-play games to generate a varied data set and make sure it tries lots of ideas and doesn't get stuck in its learning.
In the greedy case (this is "temperature = 0"), it would pick move A all of the time. This emphasizes
exploitation, and is what you do in competition when you want to play your best.
Re: AlphaZero paper published in journal Science
Posted: Sat Dec 08, 2018 10:33 pm
by seberle
Uberdude wrote:I wouldn't try to convert those Elo differences to handicap, it's like converting apples to volts. To take the example of LeelaZero vs Haylee a while ago (a bit weaker than Fan Hui I suppose), it absolutely demolished her on even and 2 stones, in a manner that if a human (e.g. Lee Sedol) did that I'd expect her to lose on 3 stones too, but she won easily on 3 with LZ going silly.
Two questions for anybody:
So what do people say is the proper handicap between top pros and perfect play? I remember before AlphaGo I read that some pros thought that the top players would need no more than a 4 stone handicap against "God". Is that still what some think?
If Elo can't be converted to handicap at these high ranks, how do you determine handicap from Elo? Or can you? At what rank does the rule "100 Elo points = 1 rank" begin to break down?
Re: AlphaZero paper published in journal Science
Posted: Sun Dec 09, 2018 4:42 am
by moha
seberle wrote:So what do people say is the proper handicap between top pros and perfect play? I remember before AlphaGo I read that some pros thought that the top players would need no more than a 4 stone handicap against "God". Is that still what some think?
There may be a stone or two uncertainity here, but it seems obvious that >3 and <9 stones are necessary. It is just hard to imagine a top pro losing at 9 stones, the board is simply not big and the game not long enough.
If Elo can't be converted to handicap at these high ranks, how do you determine handicap from Elo? Or can you? At what rank does the rule "100 Elo points = 1 rank" begin to break down?
You may look at W's avg winrate at each strength level to get an idea. Since we can guess that fair komi is 7, the significance of the half point (slightly more with imperfect play) advantage also hints about the significance of one handicap stone at that level. It worths a few % at amateur levels, few % more at top pro levels, even more at top bot levels, and 100% at perfect level.
Re: AlphaZero paper published in journal Science
Posted: Sun Dec 09, 2018 3:36 pm
by lightvector
seberle wrote:
If Elo can't be converted to handicap at these high ranks, how do you determine handicap from Elo? Or can you? At what rank does the rule "100 Elo points = 1 rank" begin to break down?
If you're equating ranks with stones, I'd say it breaks down all over the place, since 100 Elo points = 1 rank is not so great a rule of thumb to begin with. You might be confused due to the fact certain rating models used by various Go organizations or servers that
alter the very definition of what an "Elo point" is to try to make it so that under those systems 100 "Elo points" = 1 rank by definition. But of course those altered "Elo points" have little to do with the traditional Elo points that presumably you're asking about, i.e. the ones that underlie FIDE chess ratings, goratings.org, BayesElo, WHR, and the ones that academic publications will usually use when reporting ratings differences. With traditional Elo points, a fixed Elo difference corresponds to a fixed modeled winning chance rather than a fixed rank difference, scaled so that 400 Elo ~= 10:1 winning odds.
The correspondence between traditional Elo differences (i.e. winning chance) and rank difference is not simply a fixed ratio and it becomes highly nonlinear once you get into even amateur dan ranks, much less pro level or beyond. If you're interested in some actual data, I know there are some studies on OGS and/or KGS out there that have been done, or if you like, here's some old real data from EGF tournaments:
http://gemma.ujf.cas.cz/~cieply/GO/statev.html
That data is just among humans of course. If you want to add bots into the mix, any computer chess programmer will tell you that Elo differences between bots (particularly ones measured with self-play) don't necessarily translate into the same Elo differences against humans, and the same appears to be true for Go. And in Go it appears that without clever tricks like "dynamic komi" (or even with such tricks), strong Go bots also scale quite differently than humans in handicap games versus even games.
Hope that helps. Basically rating and rankings are a pretty complex mess and you can't really boil it down into any simple rule.

Re: AlphaZero paper published in journal Science
Posted: Sun Dec 09, 2018 9:47 pm
by seberle
lightvector wrote:seberle wrote:
100 Elo points = 1 rank is not so great a rule of thumb to begin with.
Thanks, that was very helpful.
Ok, see if I'm understanding better. The EGF rating system, for example, has modified the Elo system so as to force 100 rating points to be equivalent to one rank. If the table here (
https://senseis.xmp.net/?EGFRatingSystem) is any indication, it looks like the EGF system wanders far from the Elo win rate of about 36% for one rank difference in the SDK ranks, but is reasonably close for DDK. Am I interpreting this correctly?
Re: AlphaZero paper published in journal Science
Posted: Mon Dec 10, 2018 3:07 am
by jlt
My understanding is the same, but I think that winning percentages are calculated from a theoretical formula, and not really observed. To get observed winning percentages, go to the website
http://www.europeangodatabase.eu/EGD/winning_stats.php
Between 2003 and 2018, we get the table
where the grades are the "declared grades". Assuming that declared grades reflect real strength accurately, we can see that
At the 15k rank, 100 EGF points = 57 real Elo
At the 5-10k ranks, 100 EGF points = 50 real Elo
At the 2k rank, 100 EGF points = 72 real Elo
At the 1d rank, 100 EGF points = 102 real Elo
At the 3d rank, 100 EGF points = 117 real Elo
At the 6d rank, 100 EGF points = 220 real Elo
It seems however difficult to convert Elo points into handicap stones. We can read on the same website
The table says for instance that a 3d wins 19% of his H1 games against a 4d, which is very strange since he wins 33.8% of his even games against a 4d. So maybe there is a bias (people choose to play handicap games when they think that their real strength difference is larger than their official rank difference), so it's not easy to determine how many stones represent a difference of 1 EGF rank.
Re: AlphaZero paper published in journal Science
Posted: Mon Dec 10, 2018 4:48 am
by seberle
jlt wrote:
It seems however difficult to convert Elo points into handicap stones. We can read on the same website
The table says for instance that a 3d wins 19% of his H1 games against a 4d, which is very strange since he wins 33.8% of his even games against a 4d. So maybe there is a bias (people choose to play handicap games when they think that their real strength difference is larger than their official rank difference), so it's not easy to determine how many stones represent a difference of 1 EGF rank.
Are you sure that "The table says for instance that a 3d wins 19% of his H1 games against a 4d"? I'm new to this, but I thought the table was saying the weaker player (any rank) wins 19% of their games against a player 3 ranks stronger when given a handicap of one stone.
I'm not surprised that handicap stones don't even things out smoothly since the first "handicap stone" is just komi, which is only half the value of the first move. Two handicap stones are actually only worth 1 1/2 ranks, and so forth. Or at least, that is what I have understood. Correct me if I've gotten this wrong!
Re: AlphaZero paper published in journal Science
Posted: Mon Dec 10, 2018 5:06 am
by jlt
Yes, I misread the table and you are right.
Anyway the statistics of handicap games are not precise enough, so I don't have enough data to determine how many stones is worth one EGF rank difference.
Re: AlphaZero paper published in journal Science
Posted: Mon Dec 10, 2018 9:46 am
by Bill Spight
seberle wrote:jlt wrote:
It seems however difficult to convert Elo points into handicap stones.
{snip}
The table says for instance that a 3d wins 19% of his H1 games against a 4d, which is very strange since he wins 33.8% of his even games against a 4d. So maybe there is a bias (people choose to play handicap games when they think that their real strength difference is larger than their official rank difference), so it's not easy to determine how many stones represent a difference of 1 EGF rank.
I'm not surprised that handicap stones don't even things out smoothly since the first "handicap stone" is just komi, which is only half the value of the first move. Two handicap stones are actually only worth 1 1/2 ranks, and so forth. Or at least, that is what I have understood. Correct me if I've gotten this wrong!
Traditionally, rank differences were determined by handicap differences. In theory, one stone difference was equivalent to one rank difference. But handicap differences (at least for amateurs) gave an advantage to White, an advantage equivalent to komi (i.e., ½ stone). So a player two ranks stronger gave only a two stone handicap, with no komi, instead of giving three stones with Black giving komi or giving two stones with White giving komi.
Modern tournament ranks and online ranks are based upon even games, and do not necessarily tell us the proper handicap.
Re: AlphaZero paper published in journal Science
Posted: Wed Dec 12, 2018 12:38 am
by seberle
Traditionally, rank differences were determined by handicap differences. In theory, one stone difference was equivalent to one rank difference. But handicap differences (at least for amateurs) gave an advantage to White, an advantage equivalent to komi (i.e., ½ stone). So a player two ranks stronger gave only a two stone handicap, with no komi, instead of giving three stones with Black giving komi or giving two stones with White giving komi.
Modern tournament ranks and online ranks are based upon even games, and do not necessarily tell us the proper handicap.
This is interesting. First of all, how were handicap differences handled "traditionally" (do you mean before komi?). If we don't change komi, then what is the difference between a one-stone handicap and simply letting black go first? Or was going first considered being one rank stronger traditionally?
Secondly, does either system work out precisely (without doing fine adjustments to komi)? I mean, if a two-stone handicap (any system) means a 7k can play an even game against a 5k and a 5k can play an even game against a 3k, does it necessarily mean that a four-stone handicap for the 7k will get an even game against the 3k? I suppose this question is even more important for the one-stone, two-stone question: if one stone means one rank, does two stones really mean two ranks? I know I saw a debate on Sensei's Library about this once, but I didn't understand it very well and I don't remember exactly where I saw it.
Re: AlphaZero paper published in journal Science
Posted: Wed Dec 12, 2018 4:01 am
by John Fairbairn
This is interesting. First of all, how were handicap differences handled "traditionally" (do you mean before komi?). If we don't change komi, then what is the difference between a one-stone handicap and simply letting black go first? Or was going first considered being one rank stronger traditionally?
To get your head round this you need to understand that the ranks we now use are a relatively modern construct. Traditionally (Edo times) ranks were limited to pro-level dans. They didn't even use komi. Honinbo Shuho introduced some lower grades for amateurs but his system was soon abandoned (for political reasons) and players reverted to the old dan-only/pro-only system, essentially until the democratisation after World War II. The amateurs started using their own dan scale then, and the first amateur 6-dan was Hirata Hironori in 1955 (for winning the 1st Amateur Honinbo - the prize nowadays is 8-dan).
Since then amateurs in Japan were able to use kyus - and did, but the lower ranks have been used with much more gusto in the west. Indeed, a number-only system was introduced by amateurs in Germany even before the war, and was either used or copied by other western amateur associations. We have seen western amateurs - very many with a mathematical background like those early Germans - obsessively try to apply rules and numbers to many aspects of go.
But handicaps existed well before ranking systems and so it follows they can have no real correlation. They were used to a very limited extent between pros but mostly were (and still are, in Japan) nothing more than a teaching tool. No doubt for that pedagogic reason, too, the stone placings were fixed - the idea of free handicap placement is another modern idea, inspired first by mathematical amateurs in Japan (and even giving rise to a book on them by a pro!). The use of these for rankings, and of komi, likewise has no tradition (or even theory) behind it.
The use of komi (mainly in trying to determine what an even game means - and that's varied a lot in the last 100 years) is likewise mainly an Japanese amateur idea, from 1751. Pros tried it a few times from the early 19th century, starting at 5 points and gradually reducing over the decades until it even reached 2 points. It only started to rise after World War II.
So you can see that trying to tie ranks to handicaps is like climbing up a greasy pole. Historical grades and handicaps both differed for pros and amateurs from modern ones, modern grades and handicaps differ between amateurs and pros (and by country). Komi has been messed up for 300 years. The philosophical drive behind rankings differs in the west and the Far East. People have different ideas on how to implement handicaps. Etc, etc.
Life is too short to worry about such things. Of course we all would like a way of quantifying how much stronger A is than B, but it seems sensible to accept it's always going to be a wild guesstimate - at least until we get a DeepRatingsZeroPixie algorithm.
Re: AlphaZero paper published in journal Science
Posted: Wed Dec 12, 2018 4:34 am
by jlt
Statistics on handicap games on the EGD website lack precision but seem to show that handicaps are approximately additive (if a rank A is 2 stones stronger than B, which is 3 stones stronger than C, then A is 5 stones stronger than C). However some players are particularly good at playing with or against handicap, and vice-versa, so the proper handicap between two given people cannot be predicted by subtracting their ranks. Bots are an extreme example (very bad at handicap go).
Re: AlphaZero paper published in journal Science
Posted: Wed Dec 12, 2018 7:06 am
by moha
It is true that using the number of (extra) black stones as handicap - without black giving komi - is mathematically incorrect. It also makes it harder to maintain a good "feeling" about who is ahead (since in some games W gets komi which must be factored into the human intuition, but in other games he gets none - same problem as for a value net). So the same board position can be roughly even AND W significantly ahead depending on this.
But there are problems even without the doubtful human systems. If player A wins 50% against player B even after passing his first move, he is clearly one stone stronger. His winrate against B in even games cannot be correctly guessed from this, as that also depends on how close they are to perfect play.
IMO there are two ways to define one rank class: a certain (like 70%) winrate in even games (ELO-like), OR 50% winrate in N-stone games. For the latter approach, the scale ends with perfect play being a few stones above top pro level. For the former, the scale continues to amplify smaller and smaller strength differences and has many more steps after top pros (but is probably still bounded somewhere).