AlphaGo Zero: Learning from scratch

For discussing go computing, software announcements, etc.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: AlphaGo Zero: Learning from scratch

Post by Uberdude »

tartaric wrote:This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released.

If by "this" you mean AlphaGo Zero you are wrong, Ke Jie played a slightly stronger version of AlphaGo Master, not AlphaGo Zero (and he got stomped in the 1st game, the 0.5 score was a gift from AlphaGo to reduce the win margin, it was the 2nd game he played better and kept level for some time). But yes AG Master (which is much stronger than top humans) beat it 11 times out of 100, so if you want to call that "not that strong" you could do so, but others might find that an odd choice of words: I might say "not invincible" or "very strong but still beatable".
tartaric
Dies in gote
Posts: 24
Joined: Tue Aug 29, 2017 11:59 am
GD Posts: 0
KGS: 4 dan
Has thanked: 1 time

Re: AlphaGo Zero: Learning from scratch

Post by tartaric »

Uberdude wrote:
tartaric wrote:This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released.

If by "this" you mean AlphaGo Zero you are wrong, Ke Jie played a slightly stronger version of AlphaGo Master, not AlphaGo Zero (and he got stomped in the 1st game, the 0.5 score was a gift from AlphaGo to reduce the win margin, it was the 2nd game he played better and kept level for some time). But yes AG Master (which is much stronger than top humans) beat it 11 times out of 100, so if you want to call that "not that strong" you could do so, but others might find that an odd choice of words: I might say "not invincible" or "very strong but still beatable".


Thanks for your message :) It's clearer now. Maybe I am muddling with the Alpha vs Alpha Go series recently released but it was already talked about an Alpha go without human data. I thought it was the one who played Ke Jie because they said it was stronger than Alpha Go Master.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: AlphaGo Zero: Learning from scratch

Post by Bill Spight »

Uberdude wrote:AlphaGo Master so rarely pincered approaches to its 4-4s that such opportunities rarely arose. However, AG Zero seems to like pincering a lot more now, often the 3-space low, or 2-space high as below.


Yes. AlphaGo Master pincers at a low rate, compared to humans. Perhaps the more frequent pincering by AlphaGo Zero in these games has to do with the longer time limits. My impression with AlphaGo Master was that with longer time limits it tended to play more like humans. :) OC, there is not enough data to draw a conclusion, and "more like humans" is not well defined. (The frequency of pincers is well defined, OC. :))
Last edited by Bill Spight on Thu Oct 19, 2017 12:41 pm, edited 1 time in total.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: AlphaGo Zero: Learning from scratch

Post by Bill Spight »

RobertJasiek wrote:Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material.


What do they mean by "learn the rules"? Certainly not the ability to quote the rules, and maybe not even the ability to handle the example positions that are published with the rules, or which are considered rules beasts. Rather, they mean that the program does not make an illegal move in thousands, perhaps millions, of games. They don't even have to know how to score.

Even rather dumb programs can learn the rules in that sense through self-play and reinforcement learning. Illegal moves are penalized, that is enough.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
pwaldron
Lives in gote
Posts: 409
Joined: Wed May 19, 2010 8:40 am
GD Posts: 1072
Has thanked: 29 times
Been thanked: 182 times

Re: AlphaGo Zero: Learning from scratch

Post by pwaldron »

The AlphaGo blog at the beginning of this thread has an animated graph showing the (Elo) strength of AlphaGo Zero as a function of time. Two things struck me.

First, the strength of the engine that beat Lee Sedol happens just as the graph starts to roll over. Clearly diminishing returns set in there.

Second, after about 15 days the rate of improvement is quite slow, as we might expect. Nevertheless at two points, roughly days 33 and 36, there appears to be comparatively sharp jumps upward. We can only speculate what the neural networks learned to make those jumps, but I'd love to know what it is.
User avatar
fwiffo
Gosei
Posts: 1435
Joined: Tue Apr 20, 2010 6:22 am
Rank: Out of practice
GD Posts: 1104
KGS: fwiffo
Location: California
Has thanked: 49 times
Been thanked: 168 times

Re: AlphaGo Zero: Learning from scratch

Post by fwiffo »

A really common technique in ML is to reduce the "learning rate" as a model starts to converge. And it produces bumps in the model performance exactly like that. So it probably didn't learn specific knowledge at that point, or anything more important than other parts of the learning process, it was just a momentary acceleration of learning.

For those confused, learning rate is a parameter used in gradient descent (the standard algorithm used for training machine learning models these days). It's a bit misleadingly named; it's the size of the steps that the model should make as it's progressively walking down the gradient of the loss function. There is usually an empirically discovered "optimal" learning rate for any given model that gets the best performance. Either a higher or lower learning rate results in slower learning. And the ideal learning rate usually decreases as the model learns.

For those who are now more confused, there is a number that you can tweak when training models to make them work better, and it causes bumps in graphs when you tweak it.
Pio2001
Lives in gote
Posts: 418
Joined: Mon Feb 16, 2015 12:13 pm
Rank: kgs 5 kyu
GD Posts: 0
KGS: Pio2001
Has thanked: 9 times
Been thanked: 83 times

Re: AlphaGo Zero: Learning from scratch

Post by Pio2001 »

Bill Spight wrote:Even rather dumb programs can learn the rules in that sense through self-play and reinforcement learning. Illegal moves are penalized, that is enough.


The pictures posted says that the program already had the "basic rules" as input when it started learning.
Pippen
Lives in gote
Posts: 677
Joined: Thu Sep 16, 2010 3:34 pm
GD Posts: 0
KGS: 2d
Has thanked: 6 times
Been thanked: 31 times

Re: AlphaGo Zero: Learning from scratch

Post by Pippen »

RobertJasiek wrote:Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material.


IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost. The real interesting things will be if they apply their algorithm to real-life problems, because there you don't have precise rules, definite results and much more complexity (e.g. Go's sample space has about 10^170 elements, but soccer should have easily 10^10^170 even if you look at it discretely.)
alphaville
Dies with sente
Posts: 101
Joined: Sat Apr 22, 2017 10:28 pm
GD Posts: 0
Has thanked: 24 times
Been thanked: 16 times

Re: AlphaGo Zero: Learning from scratch

Post by alphaville »

Pippen wrote:IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost.


What do you mean - AlphaGo Zero did just that, it learned Go from scratch (from just a board and stones location representation).
pookpooi
Lives in sente
Posts: 727
Joined: Sat Aug 21, 2010 12:26 pm
GD Posts: 10
Has thanked: 44 times
Been thanked: 218 times

Re: AlphaGo Zero: Learning from scratch

Post by pookpooi »

alphaville wrote:
Pippen wrote:IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost.


What do you mean - AlphaGo Zero did just that, it learned Go from scratch (from just a board and stones location representation).


I think the 'rule' was absent from 'IMO it is impossible to learn Go from scratch'
User avatar
fwiffo
Gosei
Posts: 1435
Joined: Tue Apr 20, 2010 6:22 am
Rank: Out of practice
GD Posts: 1104
KGS: fwiffo
Location: California
Has thanked: 49 times
Been thanked: 168 times

Re: AlphaGo Zero: Learning from scratch

Post by fwiffo »

Some go knowledge was involved, but indirectly. A winner is determined (by a simple, non-ML portion of the program) at the terminal state of each game. For training the network, the position on the board is given, along with a history of recent moves, and an estimated winning probability given various possible moves. The network is trained to predict the likelihood of next moves and the probabilities for the eventual winner. It learns what board configurations are likely wins or losses, and how to get there. (this is a simplification)

There is still a part external to the neural network that has enough of the rules in order to handle capture, to be able to score the game, to handle ko rules, etc. The AI is not asked to reinvent the rules of go (any number of other games could be played on a go board too). It's more precise to say that there was no go strategy hardwired in and no human games to learn from.

So the network learns how to select moves that maximize the probability of arriving at a winning condition. It doesn't itself determine which side won the game. How its go knowledge is represented in the network (and whether it represents something like rules) is probably not interpretable.

How ko and other illegal moves is handled is not in the paper, but there are several ways to do it (e.g. simply masking out illegal moves from the network's predictions, or disqualifying the player that plays them and scoring it as a loss, or imposing a penalty of some kind.)

This is similar to Deepmind's Atari game demonstrations. The network is given raw pixels, and the score. It's not told the rules of Breakout or whatever, it just learns how to make moves to get to the highest score.
Pippen
Lives in gote
Posts: 677
Joined: Thu Sep 16, 2010 3:34 pm
GD Posts: 0
KGS: 2d
Has thanked: 6 times
Been thanked: 31 times

Re: AlphaGo Zero: Learning from scratch

Post by Pippen »

Obviously AG had to be given 1) the basic rules, 2) counting & scoring and 3) the goal of the game in advance. You cannot learn that as an AI. We has humans can because we can look outside of the game and learn 1)-3) from that "higher perspective", e.g. when we see multiple times that someone smiles with a trophy after having more points on the board surrounded than his opponent we can figure that in this game someone wins if he has more points. AG doesn't have this perspective, but soon it might have.
User avatar
Recusant
Beginner
Posts: 6
Joined: Sun Apr 18, 2010 3:29 pm
GD Posts: 0
Has thanked: 14 times
Been thanked: 1 time

Re: AlphaGo Zero: Learning from scratch

Post by Recusant »

I've looked in the various threads about AlphaGo Zero and failed to find mention of the article linked below. It begins with some famous Go history then goes on to quote some thoughts from Michael Redmond and others.

"The AI That Has Nothing to Learn From Humans" | The Atlantic
"I teach you the overplay. Joseki is something that shall be overcome.
What have you done to overcome it?"

— Nietzsche, via Bill Spight
pookpooi
Lives in sente
Posts: 727
Joined: Sat Aug 21, 2010 12:26 pm
GD Posts: 10
Has thanked: 44 times
Been thanked: 218 times

Re: AlphaGo Zero: Learning from scratch

Post by pookpooi »

Recusant wrote:I've looked in the various threads about AlphaGo Zero and failed to find mention of the article linked below. It begins with some famous Go history then goes on to quote some thoughts from Michael Redmond and others.

"The AI That Has Nothing to Learn From Humans" | The Atlantic

I read almost every article on Zero, this article has pro in that it include go history and very well-written, but Redmond, Shi Yue, Lockhart comments are actually toward Master version, not Zero. I'd take any expert in a non go-related field comments on Zero over old information anytime. But yes, this article was posted in reddit baduk and many people like it, and many people in lifein19x19 will like it too, that's for sure https://www.reddit.com/r/baduk/comments ... om_humans/
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: AlphaGo Zero: Learning from scratch

Post by Bill Spight »

I found the following exchange on reddit interesting ( https://www.reddit.com/r/MachineLearnin ... ittwieser/ )

cassandra wrote:Do you think that AlphaGo would be able to solve Igo Hatsuyôron's problem 120, the "most difficult problem ever", i. e. winning a given middle game position, or confirm an existing solution (e.g. http://igohatsuyoron120.de/2015/0039.htm)?


David_SilverDeepMind wrote:We just asked Fan Hui about this position. He says AlphaGo would solve the problem, but the more interesting question would be if AlphaGo found the book answer, or another solution that no one has ever imagined. That's the kind of thing which we have seen with so many moves in AlphaGo’s play!


As if Fan Hui would know! ;) Besides, Fan Hui is a cheerleader for AlphaGo. I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree. Chess programs are more search oriented than AlphaGo, and, despite their superhuman strength, sometimes miss plays that humans have found in actual play.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Post Reply