AlphaGo Zero: Learning from scratch

Uberdude · Post by **Uberdude** » Thu Oct 19, 2017 7:39 am

tartaric wrote:This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released.

If by "this" you mean AlphaGo Zero you are wrong, Ke Jie played a slightly stronger version of AlphaGo Master, not AlphaGo Zero (and he got stomped in the 1st game, the 0.5 score was a gift from AlphaGo to reduce the win margin, it was the 2nd game he played better and kept level for some time). But yes AG Master (which is much stronger than top humans) beat it 11 times out of 100, so if you want to call that "not that strong" you could do so, but others might find that an odd choice of words: I might say "not invincible" or "very strong but still beatable".

tartaric · Post by **tartaric** » Thu Oct 19, 2017 7:52 am

Uberdude wrote:
tartaric wrote:This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released.
If by "this" you mean AlphaGo Zero you are wrong, Ke Jie played a slightly stronger version of AlphaGo Master, not AlphaGo Zero (and he got stomped in the 1st game, the 0.5 score was a gift from AlphaGo to reduce the win margin, it was the 2nd game he played better and kept level for some time). But yes AG Master (which is much stronger than top humans) beat it 11 times out of 100, so if you want to call that "not that strong" you could do so, but others might find that an odd choice of words: I might say "not invincible" or "very strong but still beatable".

Thanks for your message

It's clearer now. Maybe I am muddling with the Alpha vs Alpha Go series recently released but it was already talked about an Alpha go without human data. I thought it was the one who played Ke Jie because they said it was stronger than Alpha Go Master.

Bill Spight · Post by **Bill Spight** » Thu Oct 19, 2017 10:00 am

Uberdude wrote:AlphaGo Master so rarely pincered approaches to its 4-4s that such opportunities rarely arose. However, AG Zero seems to like pincering a lot more now, often the 3-space low, or 2-space high as below.

Yes. AlphaGo Master pincers at a low rate, compared to humans. Perhaps the more frequent pincering by AlphaGo Zero in these games has to do with the longer time limits. My impression with AlphaGo Master was that with longer time limits it tended to play more like humans.

OC, there is not enough data to draw a conclusion, and "more like humans" is not well defined. (The frequency of pincers is well defined, OC.

)

Bill Spight · Post by **Bill Spight** » Thu Oct 19, 2017 12:40 pm

RobertJasiek wrote:Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material.

What do they mean by "learn the rules"? Certainly not the ability to quote the rules, and maybe not even the ability to handle the example positions that are published with the rules, or which are considered rules beasts. Rather, they mean that the program does not make an illegal move in thousands, perhaps millions, of games. They don't even have to know how to score.

Even rather dumb programs can learn the rules in that sense through self-play and reinforcement learning. Illegal moves are penalized, that is enough.

pwaldron · Post by **pwaldron** » Fri Oct 20, 2017 5:59 am

The AlphaGo blog at the beginning of this thread has an animated graph showing the (Elo) strength of AlphaGo Zero as a function of time. Two things struck me.

First, the strength of the engine that beat Lee Sedol happens just as the graph starts to roll over. Clearly diminishing returns set in there.

Second, after about 15 days the rate of improvement is quite slow, as we might expect. Nevertheless at two points, roughly days 33 and 36, there appears to be comparatively sharp jumps upward. We can only speculate what the neural networks learned to make those jumps, but I'd love to know what it is.

KGS 4 kyu · Post by **fwiffo** » Fri Oct 20, 2017 2:14 pm

A really common technique in ML is to reduce the "learning rate" as a model starts to converge. And it produces bumps in the model performance exactly like that. So it probably didn't learn specific knowledge at that point, or anything more important than other parts of the learning process, it was just a momentary acceleration of learning.

For those confused, learning rate is a parameter used in gradient descent (the standard algorithm used for training machine learning models these days). It's a bit misleadingly named; it's the size of the steps that the model should make as it's progressively walking down the gradient of the loss function. There is usually an empirically discovered "optimal" learning rate for any given model that gets the best performance. Either a higher or lower learning rate results in slower learning. And the ideal learning rate usually decreases as the model learns.

For those who are now more confused, there is a number that you can tweak when training models to make them work better, and it causes bumps in graphs when you tweak it.

Pio2001 · Post by **Pio2001** » Sat Oct 21, 2017 10:50 am

Bill Spight wrote:Even rather dumb programs can learn the rules in that sense through self-play and reinforcement learning. Illegal moves are penalized, that is enough.

The pictures posted says that the program already had the "basic rules" as input when it started learning.

Pippen · Post by **Pippen** » Sat Oct 21, 2017 5:43 pm

RobertJasiek wrote:Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material.

IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost. The real interesting things will be if they apply their algorithm to real-life problems, because there you don't have precise rules, definite results and much more complexity (e.g. Go's sample space has about 10^170 elements, but soccer should have easily 10^10^170 even if you look at it discretely.)

alphaville · Post by **alphaville** » Sun Oct 22, 2017 12:30 pm

Pippen wrote:IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost.

What do you mean - AlphaGo Zero did just that, it learned Go from scratch (from just a board and stones location representation).

pookpooi · Post by **pookpooi** » Sun Oct 22, 2017 12:47 pm

alphaville wrote:
Pippen wrote:IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost.
What do you mean - AlphaGo Zero did just that, it learned Go from scratch (from just a board and stones location representation).

I think the 'rule' was absent from 'IMO it is impossible to learn Go from scratch'

KGS 4 kyu · Post by **fwiffo** » Mon Oct 23, 2017 12:12 pm

Some go knowledge was involved, but indirectly. A winner is determined (by a simple, non-ML portion of the program) at the terminal state of each game. For training the network, the position on the board is given, along with a history of recent moves, and an estimated winning probability given various possible moves. The network is trained to predict the likelihood of next moves and the probabilities for the eventual winner. It learns what board configurations are likely wins or losses, and how to get there. (this is a simplification)

There is still a part external to the neural network that has enough of the rules in order to handle capture, to be able to score the game, to handle ko rules, etc. The AI is not asked to reinvent the rules of go (any number of other games could be played on a go board too). It's more precise to say that there was no go strategy hardwired in and no human games to learn from.

So the network learns how to select moves that maximize the probability of arriving at a winning condition. It doesn't itself determine which side won the game. How its go knowledge is represented in the network (and whether it represents something like rules) is probably not interpretable.

How ko and other illegal moves is handled is not in the paper, but there are several ways to do it (e.g. simply masking out illegal moves from the network's predictions, or disqualifying the player that plays them and scoring it as a loss, or imposing a penalty of some kind.)

This is similar to Deepmind's Atari game demonstrations. The network is given raw pixels, and the score. It's not told the rules of Breakout or whatever, it just learns how to make moves to get to the highest score.

Pippen · Post by **Pippen** » Mon Oct 23, 2017 4:12 pm

Obviously AG had to be given 1) the basic rules, 2) counting & scoring and 3) the goal of the game in advance. You cannot learn that as an AI. We has humans can because we can look outside of the game and learn 1)-3) from that "higher perspective", e.g. when we see multiple times that someone smiles with a trophy after having more points on the board surrounded than his opponent we can figure that in this game someone wins if he has more points. AG doesn't have this perspective, but soon it might have.

Recusant · Post by **Recusant** » Tue Oct 24, 2017 10:14 am

I've looked in the various threads about AlphaGo Zero and failed to find mention of the article linked below. It begins with some famous Go history then goes on to quote some thoughts from Michael Redmond and others.

"The AI That Has Nothing to Learn From Humans" | The Atlantic

pookpooi · Post by **pookpooi** » Tue Oct 24, 2017 10:34 am

Recusant wrote:I've looked in the various threads about AlphaGo Zero and failed to find mention of the article linked below. It begins with some famous Go history then goes on to quote some thoughts from Michael Redmond and others.

"The AI That Has Nothing to Learn From Humans" | The Atlantic

I read almost every article on Zero, this article has pro in that it include go history and very well-written, but Redmond, Shi Yue, Lockhart comments are actually toward Master version, not Zero. I'd take any expert in a non go-related field comments on Zero over old information anytime. But yes, this article was posted in reddit baduk and many people like it, and many people in lifein19x19 will like it too, that's for sure https://www.reddit.com/r/baduk/comments ... om_humans/

Bill Spight · Post by **Bill Spight** » Tue Oct 24, 2017 6:54 pm

I found the following exchange on reddit interesting ( https://www.reddit.com/r/MachineLearnin ... ittwieser/ )

cassandra wrote:Do you think that AlphaGo would be able to solve Igo Hatsuyôron's problem 120, the "most difficult problem ever", i. e. winning a given middle game position, or confirm an existing solution (e.g. http://igohatsuyoron120.de/2015/0039.htm)?

David_SilverDeepMind wrote: We just asked Fan Hui about this position. He says AlphaGo would solve the problem, but the more interesting question would be if AlphaGo found the book answer, or another solution that no one has ever imagined. That's the kind of thing which we have seen with so many moves in AlphaGo’s play!

As if Fan Hui would know!

Besides, Fan Hui is a cheerleader for AlphaGo. I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree. Chess programs are more search oriented than AlphaGo, and, despite their superhuman strength, sometimes miss plays that humans have found in actual play.

Life In 19x19

AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch