Life In 19x19
http://www.lifein19x19.com/

"Indefinite improvement" for AlphaZero-like engines
http://www.lifein19x19.com/viewtopic.php?f=18&t=17397
Page 1 of 4

Author:  sorin [ Thu Apr 16, 2020 9:30 pm ]
Post subject:  "Indefinite improvement" for AlphaZero-like engines

I read this quote from David Silver (from Deepmind) which caught my attention (https://www.chess.com/news/view/david-silver-alphazero-reinforcement-learning):

Quote:
"If someone in the future was to take AlphaZero as an algorithm and run it with greater computational resources than we have available today, then I will predict that they would be able to beat the previous system 100-0. If they were then to do the same thing a couple of years later, that system would beat the previous system 100-0. That process would continue indefinitely throughout at least my human lifetime."


I was puzzled about the "process would continue indefinitely throughout at least my human lifetime" part. Why would subsequent generations of AlphaZero beat older ones 100-0 - does it imply that older ones hit some sort of "bug" in the learning process that newer ones just learn to exploit?
If the learning process is "smooth", I expect that it would be harder and harder for newer generations to beat older ones, and they would't do that at 100-0 rate, but more "human-like", say 80-20, etc.

I guess no one knows "the right answer", but I am curious what others think about this statement.
(David Silver is talking about the chess version of the engine in this particular article, but I think the same would apply to go as well.)

Author:  Bill Spight [ Thu Apr 16, 2020 10:00 pm ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

sorin wrote:
I read this quote from David Silver (from Deepmind) which caught my attention (https://www.chess.com/news/view/david-silver-alphazero-reinforcement-learning):

Quote:
"If someone in the future was to take AlphaZero as an algorithm and run it with greater computational resources than we have available today, then I will predict that they would be able to beat the previous system 100-0. If they were then to do the same thing a couple of years later, that system would beat the previous system 100-0. That process would continue indefinitely throughout at least my human lifetime."


I was puzzled about the "process would continue indefinitely throughout at least my human lifetime" part. Why would subsequent generations of AlphaZero beat older ones 100-0 - does it imply that older ones hit some sort of "bug" in the learning process that newer ones just learn to exploit?
If the learning process is "smooth", I expect that it would be harder and harder for newer generations to beat older ones, and they would't do that at 100-0 rate, but more "human-like", say 80-20, etc.

I guess no one knows "the right answer", but I am curious what others think about this statement.
(David Silver is talking about the chess version of the engine in this particular article, but I think the same would apply to go as well.)


Back before Monte Carlo Tree Search improved go playing programs by leaps and bounds, top programs were advancing at a rate of around one stone per year. I got a laugh out of Martin Mueller by opining that that advancement was the result of the improvement in computer hardware by Moore's Law. ;)

David Silver apparently has cloud computing in mind, and the possibility that it will become both better and cheaper each year for the foreseeable future. I have no opinion about that.

Edit: The Elf team did some experiments with running Elf with more rollouts in the search tree. For the range of rollouts that the tried, doubling the number increased the Elo rating by around 200 pts. That is, they found no leveling off. It is quite possible that near perfection at 19x19 go is decades away.

Author:  sorin [ Thu Apr 16, 2020 10:16 pm ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

What puzzles me is that AlphaZero cannot be many handicap stones away from "perfect play": a 100-0 score suggests more than one stone handicap difference in strength, so hearing David Silver saying that he expects at least 20 or so more generations where each beats the previous one 100-0 seems very counter-intuitive.

Even if 100-0 doesn't mean one handicap stone difference, but only means 1 point difference in final scoring, it doesn't seem to me that we can expect a gap between current top AlphaZero-like engines and "perfect play" to be 20 points or more...

Author:  Uberdude [ Fri Apr 17, 2020 1:11 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

As a player gets stronger they become more consistent so a particular winning percentage in games becomes less and less points / handicap stones difference.
So maybe AG0.1 beats AG0 100-0 and is 2 points stronger aka 1/7 of a handicap stone.
AG0.2 beats AG0.1 100-0 and is 1.5 points stronger.
AG0.3 beats AG0.2 100-0 and is 1.2 points stronger.
AG0.4 beats AG0.3 100-0 and is 1.0 points stronger.
AG0.5 beats AG0.4 100-0 and is 0.8 points stronger.

Depending how these point margins decrease they may have a finite or infinite sum, and even if finite might not reach 20 points or however far we currently are from perfect play in the required timeframe.

And it's all kind of hand-wavy with lots of dubious assumptions: if A is x points stronger than B and B y stronger than C it does not necessarily follow A is x+y stronger than C (I forget what this operator property is called, transitive?). And what does being 0.2 points stronger mean in a game which is ultimately scored in integers?

Author:  Bill Spight [ Fri Apr 17, 2020 3:09 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

I just took a look at the chess.com article. My conclusion now is that Silver's quote is hype.

To be fair, the article says that Silver says that his claim is a falsifiable hypothesis. Well, not really. It's too vague.

Not to take anything away from the achievements of the Deep Mind team, the hype has always been strong in Deep Mind.

----

OC, the Alpha Zero algorithm has the ability, given enough computing resources, to solve go in infinite time. And Silver may have the research done by the Elf team and possibly within Deep Mind that has so far failed to observe any slowdown in the increase of strength with exponential increase in search. So, if something like Moore's Law applies to cloud computing in the future, top bots should continue to show steady progress.

Edit: Part of the hype is that, because Silver specifies the AlphaZero algorithm, his claim sounds like it means that the AlphaZero algorithm is so wonderful, when it really means that computer hardware is so wonderful.

Author:  moha [ Fri Apr 17, 2020 3:52 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

In stones there are unlikely more than a few above today's best bots, but strength classes defined as having ~75% winrate above the previous one are harder to tell. As Uberdude wrote, one stone worths more and more classes as you approach perfect play.

100-0 is not a meaningful result but 99-1 is, and needs around 3.5 classes difference. I don't think the last stone worths more than 28 classes (komi*4 since a class needs at least 0.5 pts), and even less for each previous stone. So if there are, say, 2-3 stones above FineArt, that maybe means 50-60 classes - around 15 steps of 99-1?

Author:  Bill Spight [ Fri Apr 17, 2020 3:59 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

sorin wrote:
What puzzles me is that AlphaZero cannot be many handicap stones away from "perfect play": a 100-0 score suggests more than one stone handicap difference in strength, so hearing David Silver saying that he expects at least 20 or so more generations where each beats the previous one 100-0 seems very counter-intuitive.

Even if 100-0 doesn't mean one handicap stone difference, but only means 1 point difference in final scoring, it doesn't seem to me that we can expect a gap between current top AlphaZero-like engines and "perfect play" to be 20 points or more...


Well, we used to think that humans were at most 4 stones away from perfect play, but current top bots, despite their obvious imperfections, seem to be around 3 stones better than humans, despite not being trained for handicap play. Now, I think that humans will get at least one stone better in the coming decade, because we can learn from the bots. Even I, as an old dog, have gotten better at predicting their plays. :) What about youngsters who are growing up with bots available?

Back in the 1960s or 70s Sports Illustrated ran an article comparing improvements for both men and women in a variety of sporting events. While both sexes showed steady improvement, the graphs of the men's improvement were starting to bend towards level, while the women's graphs continued straight up. The men's abilities were showing signs of starting to reach their limits, but not the women's. According to the Elf team, Elf's abilities have not shown signs of reaching their limit, given exponential growth in search. At some point, surely the law of diminishing returns will kick in, but we have little evidence that it has started to do so yet. If the gap between top bots and perfect play were only 20 pts., I think that we would see some bending of the curve. (OC, to say top bot without specifying its search or time parameters is not really appropriate, as they matter to its strength.)

Author:  Bill Spight [ Fri Apr 17, 2020 4:33 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

While it may seem that, with increasing strength the variability of results should decrease, that is not necessarily so. The main reason is that strength at go (or chess) is not one dimensional. There are still some aspects of go in which humans are superior to bots, despite the bots' overall superiority, for instance. This multidimensionality means that there are plays and strategies that are incomparable; you can't say that one is better than the other, except for a given position. Increasing strength does not necessarily reduce this fuzziness, or confusion interval, as CGT calls it.

To take a simple example that is well understood, today's bots are not yet able to solve some of the temperature 1 problems that humans have constructed, where the difference in score is 1 or 2 pts. While the bot may be able to read out a given board, in general this difference is irreducible. Which means that a bot could be very much stronger than its opponent, but if the game comes down to one of these positions, through handicap stones or reverse komi, it is a tossup. The same is true at higher temperatures, OC. But at higher temperatures the weaker player typically has more chances to make a mistake before the game ends. ;)

Author:  jlt [ Fri Apr 17, 2020 5:05 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

i doubt that there exists an infinite sequence of bots alpha0, alpha1,... such that alpha(n+1) beats alpha(n) 99% of the time. The reason is that the score is an integer.

Consider a different game, where each player has to answer to 1000 questions, and gets 1 point for each correct answer.

Suppose player P(0) is a perfect player,
For n=1,..., 1000, P(n) gets 1001-n correct answers with probability 0.5 and 1000-n correct answers with probability 0.5.
And P(1001) gets all answers wrong.

Then P(0) beats P(1) 50% of the time,
P(n) beats P(n+1) 25% of the time if n=1,...,999.
P(1000) beats P(1001) 50% of the time.

The game of go is more complicated, but at each move, a player loses an integral number of points compared to perfect play.

Author:  mumps [ Fri Apr 17, 2020 6:00 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

Uberdude wrote:
As a player gets stronger they become more consistent so a particular winning percentage in games becomes less and less points / handicap stones difference.


The European Go Database supports this hypothesis with some actual statistics.

Author:  mhlepore [ Fri Apr 17, 2020 7:49 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

To me, Silver's claim seems reasonable (at least to the point of the game being solved).

To use a sports analogy, occasionally weaker players can beat stronger players. Why?
--Human athletes make blunders.
--Balls rattle off the rim in unlucky ways.
--There is a lot of natural noise in certain games (e.g., baseball), such that the best teams lose 1/3 of the time.

The strong Go programs, to me anyway, are not likely to be impacted by this variability. They are not likely to misread a life/death problem, or a ladder, or miscount. And once you knock out a lot of the mistakes, it becomes in essence about backward induction, and the more powerful system should pretty much always win.

That said, I wonder if Silver would still back the statement, however replacing 100-0 with 1,000,000-0.

Author:  jann [ Fri Apr 17, 2020 8:09 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

mhlepore wrote:
The strong Go programs, to me anyway, are not likely to be impacted by this variability.

They are, otherwise W would win 100% of selfplay.

Author:  Bill Spight [ Fri Apr 17, 2020 9:02 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

mhlepore wrote:
To me, Silver's claim seems reasonable (at least to the point of the game being solved).


I'll believe the 100-0 score against AlphaZero when I see it. Not that I think it won't eventually happen, the question is when.

Quote:
They are not likely to misread a life/death problem, or a ladder, or miscount.


These are, in fact, their known areas of weakness.

Author:  mhlepore [ Fri Apr 17, 2020 9:11 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

I guess my point is that there are differences between humans and machines.

Even though Human A is better than Human B, there is still often some nontrivial chance Human B will win. And many of the reasons for this do not apply to machines.

Author:  Bill Spight [ Fri Apr 17, 2020 9:17 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

mhlepore wrote:
I guess my point is that there are differences between humans and machines.

Even though Human A is better than Human B, there is still often some nontrivial chance Human B will win. And many of the reasons for this do not apply to machines.


Perhaps because of self-play, they are too similar. With time we should see a wide variety of strong bots. :)

Author:  gennan [ Fri Apr 17, 2020 11:39 am ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

moha wrote:
In stones there are unlikely more than a few above today's best bots, but strength classes defined as having ~75% winrate above the previous one are harder to tell.

Defining "moha" classes as separated by 75% winrate is more or less the same as saying that these classes are 200 Elo points wide.
But this is not the same as go ranks. The Elo point width of go ranks (in units of handicap stones or multiples of 14 points loss per game) varies along the rank range. Ranks around 10k are about 50 Elo points wide, ranks a round 1d EGF are about 100 points wide, ranks around 7d EGF are about 200 Elo points wide and in the limit of perfect play, rank width approaches infinity Elo points.

BTW, these widths are not the widths as predicted by the EGF system, but the actual widths from historical data in the EGF database.

Author:  moha [ Fri Apr 17, 2020 12:32 pm ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

gennan wrote:
Defining "moha" classes as separated by 75% winrate is more or less the same as saying that these classes are 200 Elo points wide.
Yes, or normals ~1 sd apart.

Quote:
ranks around 7d EGF are about 200 Elo points wide and in the limit of perfect play, rank width approaches infinity Elo points.
This I find hard to believe. As mentioned I think 1 stone (14 pts) cannot worth more than 28 classes, thus quite finite Elo-wise.

Consider the last class difference. A player getting 25% against perfect play is something like dropping a point in every other game (thus 0%*50% + 50%*50%, either as draws with correct komi or W wins with fractional komi). This already means more than half point strength difference, and it seems reasonable to assume that preceding classes only get wider pointwise.

Author:  gennan [ Fri Apr 17, 2020 2:27 pm ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

moha wrote:
gennan wrote:
.. in the limit of perfect play, rank width approaches infinity Elo points.
This I find hard to believe. As mentioned I think 1 stone (14 pts) cannot worth more than 28 classes, thus quite finite Elo-wise.

Consider the last class difference. A player getting 25% against perfect play is something like dropping a point in every other game (thus 0%*50% + 50%*50%, either as draws with correct komi or W wins with fractional komi). This already means more than half point strength difference, and it seems reasonable to assume that preceding classes only get wider pointwise.

Perfect komi between two perfect players would be an integer (probably 6 or 7, possibly depending on the rule set) and they would always get a jigo.

But you are right. I was thinking about a game with fractional scores where non-perfect player always makes some mistake, however small. But with integer scores and perfect komi, some games between a near perfect player and a perfect player would end up as a jigo (there is a non-zero probability that the near perfect player plays a perfect game), so the perfect player would not get an infinite Elo rating. The perfect player's Elo rating would be very high, but finite.

But I don't understand why 1 point komi cannot exceed 400 Elo points.

Author:  moha [ Fri Apr 17, 2020 2:42 pm ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

gennan wrote:
But I don't understand why 1 point komi cannot exceed 400 Elo points (1 point = 2 classes and 1 class = 200 Elo), or vice versa, why 400 Elo points is always less difference than 1 point komi.
In the above example the player 1 class below perfection is half point from it (drops slightly more than 0.5 pts per total games on average), and I assumed that preceding classes mean more and more total error difference (as we know from practical 1d-9d range).

Author:  gennan [ Fri Apr 17, 2020 3:24 pm ]
Post subject:  Re: "Indefinite improvement" for AlphaZero-like engines

OK, I think I understand now.

In my own words:

Let's take a near perfect player who on average loses 1 point in 2 games (1 class below a perfect player).

To maintain a rating of 200 Elo below the perfect player, he would have to score 25% against the perfect player. To do that, he has to play prefectly in 50% of his games (alternating between jigo and losing gives a 25% score). With his average of losing 1 point in 2 games, he would be alternating between jigo and losing by 1 point. So this is perfectly consistent.
If the perfect player gives the near perfect player a handicap of 0.5 points, the score would become 50% (alternating wins and losses). Again, this is consistent with a perfect handicap between these players

Very interesting stuff!

From an analysis that I made from the EGD historical data, I estimate that the value of 1 point komi is as low as 1 Elo point around 50k. I thought there was no maximum Elo value for 1 point komi, but the above is convincing me that the maximum value of 1 point komi really is 400 Elo. So the gap between a perfect player and 1 rank below (14 points) should be less than 5,600 Elo.

LeelaZero currently has a selfplay rating of 16,000 Elo (where 0 is random play). If we estimate that LZ is about 1 rank (14 points) below perfect play, the perfect player's rating would not exceed 21,600 Elo.

Page 1 of 4 All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/