Life In 19x19 :: AlphaGo Zero: Learning from scratch

Life In 19x19 http://www.lifein19x19.com/

AlphaGo Zero: Learning from scratch http://www.lifein19x19.com/viewtopic.php?f=18&t=15042	Page 1 of 2

Author:	Uberdude [ Wed Oct 18, 2017 10:12 am ]
Post subject:	AlphaGo Zero: Learning from scratch
https://deepmind.com/blog/alphago-zero- ... g-scratch/ Holy crap!!

Author:	jeromie [ Wed Oct 18, 2017 10:19 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
I got to this point and felt surprised... Quote: AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0. and then I kept going and got to this point: Quote: It also differs from previous versions in other notable ways. AlphaGo Zero only uses the black and white stones from the Go board as its input, whereas previous versions of AlphaGo included a small number of hand-engineered features. It uses one neural network rather than two. Earlier versions of AlphaGo used a “policy network” to select the next move to play and a ”value network” to predict the winner of the game from each position. These are combined in AlphaGo Zero, allowing it to be trained and evaluated more efficiently. AlphaGo Zero does not use “rollouts” - fast, random games used by other Go programs to predict which player will win from the current board position. Instead, it relies on its high quality neural networks to evaluate positions. Wow, this is so cool! I desperately hope they release self-play games from this version.

Author:	jeromie [ Wed Oct 18, 2017 10:24 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Sorry for double posting, but I just glanced at the freely available version of the paper (don't have time for more right now). A couple interesting points: AlphaGo learned and used several common joseki sequences during the course of learning. They show when in the training process it learned each one and which it preferred at various stages. The full online version ~~(only available with a Nature subscription)~~ (edit: as Uberdude pointed out, this was wrong) includes the first 100 moves of several games at various points in the learning process.

Author:	Uberdude [ Wed Oct 18, 2017 10:32 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Here is the game they show after 70 hours:

Author:	Uberdude [ Wed Oct 18, 2017 10:35 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
jeromie wrote: The full online version (only available with a Nature subscription) includes the first 100 moves of several games at various points in the learning process. https://deepmind.com/documents/119/agz_ ... nature.pdf has many game diagrams in the appendix EDIT: and Andrew Jackson just posted a zip of the sgfs on reddit: https://www.nature.com/nature/journal/v ... 270-s2.zip An example AG Zero beating AG Master:

Author:	Kirby [ Wed Oct 18, 2017 8:00 pm ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Fun to watch the progression among the self-play games. One of the earlier games: And after a bit of learning from self-play:

Author:	Gomoto [ Wed Oct 18, 2017 9:20 pm ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Where can I buy shares? Thank me later! (I wont buy myself)

Author:	alphaville [ Wed Oct 18, 2017 10:00 pm ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Kirby wrote: Fun to watch the progression among the self-play games. So the "20 block" self-play games are from various stages of training, while the "40 block" folder come only from the strongest version? That is confusing, I wish that they labeled the "in-training" games somehow, to be able to tell the strength.

Author:	alphaville [ Wed Oct 18, 2017 10:22 pm ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
alphaville wrote: Kirby wrote: Fun to watch the progression among the self-play games. So the "20 block" self-play games are from various stages of training, while the "40 block" folder come only from the strongest version? That is confusing, I wish that they labeled the "in-training" games somehow, to be able to tell the strength. I think I got it now: both groups of self-play games show progression during training, according to Nature. For the "20 block" folder: "The 3-day training run was subdivided into 20 periods. The best player from each period (as selected by the evaluator) played a single game against itself, with 2 h time controls" For the 40-block" folder: "The 40-day training run was subdivided into 20 periods. The best player from each period (as selected by the evaluator) played a single game against itself, with 2 h time controls." If the "20 periods" are divided equally by time, then the weakest game in the 40-bucket folder matches random-playing engines, 2nd game matches engines after 2 days of training, etc.

Author:	Uberdude [ Thu Oct 19, 2017 12:17 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Marcel Grünauer wrote: A patriotic side note - I just learned that Julian Schrittwieser, one of the main authors of that paper, is from Austria and studied at the Technical University of Vienna. He has worked for Google since 2012 and switched to DeepMind when he heard Demis Hassabis talk about AlphaGo. His background is, naturally, in machine learning. Yes, AlphaGo is an international effort and shows the remarkable success that comes from assembling the best talents from around the world. I really wonder if it would still be possible post-Brexit. Maybe so as Google is a big rich name with admin staff to help sponsor through our kafkaesque visa process, but maybe not...

Author:	RobertJasiek [ Thu Oct 19, 2017 12:30 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Uberdude wrote: the best talents It is not like all best talents would be in one place. Rather call it a "selection of some of the allegedly best talents" * Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material. * Learning from scratch so fast and successfully is exceptionally impressive. However, it is still possible that AlphaGo Zero fails at expert positions that rarely occur in practical play. IOW, neural nets can err. Self-driving cars can kill. Self-replicating or war-fighting AI-(nano)-robots might cause extinction of mankind. We must never forget this, regardless however impressive an AI might seem.

Author:	Uberdude [ Thu Oct 19, 2017 4:36 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
There was some welcome clarification about the different versions of AlphaGo, from the Methods section; though no mention of Ke Jie version (this paper was submitted before that match, sitting on a big secret!) but I think that and the 55 self-play released after were basically the same as Master with just minor incremental improvements (I think not exactly the same: 55 self-play version does seem to like early 3-3 more than online 60 Master version, and that's so early it can't be explained away as the more chaotic style can with "when it is winning against weak humans it simplifies, but against itself it plays 100%"). Quote: AlphaGo versions We compare three distinct versions of AlphaGo: 1. AlphaGo Fan is the previously published program that played against Fan Hui in October 2015. This program was distributed over many machines using 176 GPUs. 2. AlphaGo Lee is the program that defeated Lee Sedol 4–1 in March, 2016. It was previously unpublished but is similar in most regards to AlphaGo Fan . However, we highlight several key differences to facilitate a fair comparison. First, the value network was trained from the outcomes of fast games of self-play by AlphaGo, rather than games of self-play by the policy network; this procedure was iterated several times – an initial step towards the tabula rasa algorithm presented in this paper. Second, the policy and value networks were larger than those described in the original paper – using 12 convolutional layers of 256 planes respectively – and were trained for more iterations. This player was also distributed over many machines using 48 TPUs, rather than GPUs, enabling it to evaluate neural networks faster during search. 3. AlphaGo Master is the program that defeated top human players by 60–0 in January, 2017. It was previously unpublished but uses the same neural network architecture, reinforcement learning algorithm, and MCTS algorithm as described in this paper. However, it uses the same handcrafted features and rollouts as AlphaGo Lee and training was initialised by supervised learning from human data. 4. AlphaGo Zero is the program described in this paper. It learns from self-play reinforcement learning, starting from random initial weights, without using rollouts, with no human supervision, and using only the raw board history as input features. It uses just a single machine in the Google Cloud with 4 TPUs (AlphaGo Zero could also be distributed but we chose to use the simplest possible search algorithm).

Author:	pookpooi [ Thu Oct 19, 2017 5:28 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Aja Huang mentioned that it is the same version but slightly stronger (perhaps due to longer time setting?) source: http://sports.sina.com.cn/go/2017-05-24 ... 9285.shtml Also in the DeepMind website there's animated graph that says Master version and version that play 3 match with Ke Jie is the same version source: https://storage.googleapis.com/deepmind ... 20Time.gif

Author: Uberdude [ Thu Oct 19, 2017 7:07 am ]

Post subject: Re: AlphaGo Zero: Learning from scratch

So what new moves is AlphaGo Zero playing? One very noticeable pattern in the AlphaGo Zero (40 blocks) [strongest version] vs AlphaGo Master 20 games is shown below. This happens after a low plus high double approach against a 4-4. This in itself is remarkable as AlphaGo Master so rarely pincered approaches to its 4-4s that such opportunities rarely arose. However, AG Zero seems to like pincering a lot more now, often the 3-space low, or 2-space high as below. This corner sequence happened in 8 games of the 20, with AG Zero always being the one capturing the inside stones, and it won 7 of the 8 games. So it seems like AG Zero thinks it is even to good for it, and AG Master likewise thinks the sequence from the other side is even to good for it, but probably AG Zero is closer to the truth given the results/strengths. According to waltheri this sequence has never happened before in pro games. My initial feeling was it looked an interesting sacrifice for white compared to the normal entering the corner after attachment (maybe with hane first) and you also get some nice forcing moves on the outside with the cut aji, but set against that black is solid and almost 100% alive which AG tends to place a lot of value on (and white isn't: in some games the white group gets into trouble later; but actually in the one game AG Master won the black group does die!).

Click Here To Show Diagram Code: [go]$$W $$ | . . . , . . . . . , $$ | . . . . . . . . . . $$ | . . . . . . . . . . $$ | . 0 8 6 . . . . . . $$ | . . 7 1 2 . . . . . $$ | . 9 . 4 3 5 . . . . $$ | . . . X . . . . X , $$ | . . . . . O . . . . $$ | . . . . . . . . . . $$ | . . . . . . . . . . $$ +--------------------[/go]

Click Here To Show Diagram Code: [go]$$Wm11 $$ | . . . , . . . . . , $$ | . . . . . . . . . . $$ | . . . . . . . . . . $$ | . X X X . . . . . . $$ | . . O O X . . . . . $$ | . O . X O O . . . . $$ | . 4 2 X . . . . X , $$ | . . 3 1 . O . . . . $$ | . . . . . . . . . . $$ | . . . . . . . . . . $$ +--------------------[/go]

Author:	tartaric [ Thu Oct 19, 2017 7:22 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released.

Author:	Uberdude [ Thu Oct 19, 2017 7:39 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
tartaric wrote: This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released. If by "this" you mean AlphaGo Zero you are wrong, Ke Jie played a slightly stronger version of AlphaGo Master, not AlphaGo Zero (and he got stomped in the 1st game, the 0.5 score was a gift from AlphaGo to reduce the win margin, it was the 2nd game he played better and kept level for some time). But yes AG Master (which is much stronger than top humans) beat it 11 times out of 100, so if you want to call that "not that strong" you could do so, but others might find that an odd choice of words: I might say "not invincible" or "very strong but still beatable".

Author:	tartaric [ Thu Oct 19, 2017 7:52 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Uberdude wrote: tartaric wrote: This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released. If by "this" you mean AlphaGo Zero you are wrong, Ke Jie played a slightly stronger version of AlphaGo Master, not AlphaGo Zero (and he got stomped in the 1st game, the 0.5 score was a gift from AlphaGo to reduce the win margin, it was the 2nd game he played better and kept level for some time). But yes AG Master (which is much stronger than top humans) beat it 11 times out of 100, so if you want to call that "not that strong" you could do so, but others might find that an odd choice of words: I might say "not invincible" or "very strong but still beatable". Thanks for your message It's clearer now. Maybe I am muddling with the Alpha vs Alpha Go series recently released but it was already talked about an Alpha go without human data. I thought it was the one who played Ke Jie because they said it was stronger than Alpha Go Master.

Author:	Bill Spight [ Thu Oct 19, 2017 10:00 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
Uberdude wrote: AlphaGo Master so rarely pincered approaches to its 4-4s that such opportunities rarely arose. However, AG Zero seems to like pincering a lot more now, often the 3-space low, or 2-space high as below. Yes. AlphaGo Master pincers at a low rate, compared to humans. Perhaps the more frequent pincering by AlphaGo Zero in these games has to do with the longer time limits. My impression with AlphaGo Master was that with longer time limits it tended to play more like humans. OC, there is not enough data to draw a conclusion, and "more like humans" is not well defined. (The frequency of pincers is well defined, OC. )

Author:	Bill Spight [ Thu Oct 19, 2017 12:40 pm ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
RobertJasiek wrote: Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material. What do they mean by "learn the rules"? Certainly not the ability to quote the rules, and maybe not even the ability to handle the example positions that are published with the rules, or which are considered rules beasts. Rather, they mean that the program does not make an illegal move in thousands, perhaps millions, of games. They don't even have to know how to score. Even rather dumb programs can learn the rules in that sense through self-play and reinforcement learning. Illegal moves are penalized, that is enough.

Author:	pwaldron [ Fri Oct 20, 2017 5:59 am ]
Post subject:	Re: AlphaGo Zero: Learning from scratch
The AlphaGo blog at the beginning of this thread has an animated graph showing the (Elo) strength of AlphaGo Zero as a function of time. Two things struck me. First, the strength of the engine that beat Lee Sedol happens just as the graph starts to roll over. Clearly diminishing returns set in there. Second, after about 15 days the rate of improvement is quite slow, as we might expect. Nevertheless at two points, roughly days 33 and 36, there appears to be comparatively sharp jumps upward. We can only speculate what the neural networks learned to make those jumps, but I'd love to know what it is.

Page 1 of 2	All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/