Reddit thread with Aja's recent talk:
https://www.reddit.com/r/baduk/comments ... e_29_2016/Video direct link:
http://www.liacs.leidenuniv.nl/~kosters ... 16/ah1.movCopy/paste of my comments from reddit
Contents:
0:00 Introduction/Awards
9:20 Aja Huang's talk starts, history of his involvement in Go programming
15:20 Google DeepMind -- Why write Go programs?
17:30 Operating AlphaGo vs Lee Sedol
18:10 What makes AlphaGo so strong?
21:40 Convolutional neural networks
24:40 How to train the networks
26:20 Policy network Supervised learning -- 30 million positions from KGS 5d+ games, 4 weeks x 50 GPUs
27:35 Policy network Reinforcement learning -- self-play, 1 week x 50 GPUs
29:50 Value network Reinforcement learning -- 30 million games of self-play, 1 week x 50 GPUs
30:40 Graph of Mean Squared error of move guessing
32:00 Tree search algorithm
34:55 Mistake/typo in Nature paper Accuracy of raw policy should be 30%, not 24% as in the paper
35:15 Graph of AlphaGo Elo (Nature paper version v13 [vs Fan Hui])
35:55 Graph of v18 [vs Lee Sedol] AlphaGo Elo -- v18 is 3~4 stones stronger than v13
37:30 Game 4 vs Lee Sedol "Horizion effect? The answer is too deep in the tree. Weakness of the value network? Too few similar positions in the training set. Anyway, the problem is fixed in the latest version!"
39:45 What's next?
There isn't much new information in this talk. At 35:55 he shows the Elo graph of v18 (the one that played Lee Sedol), and that it is 3~4 stones stronger than v13 (the one that played Fan Hui). It's the same graph they have shown in some other talks given since the Lee Sedol game.
I'm pretty sure that recent rumors about AlphaGo being 4 stones stronger come from seeing that graph. He didn't say anything about how much stronger current AlphaGo is than v18.
He did say that the current AlphaGo plays correctly after Lee Sedol's 78th move in Game 4. As far as how they fixed it, it seems they mostly just did more and more training and now it plays that position correctly.