AlphaGo "Bug" Is Fixed

wineandgolover · #1

In his June 29 presentation at Leiden University, Aja Huang discussed move 79 in Game 4 of the Google Deepmind AlphaGo Challenge, in which AlphaGo blundered and lost a favorable game against Lee Sedol.

He claimed that the problem is fixed, and reportedly said that when presented with the same board configuration, AlphaGo now finds the correct response.

Presentation Slide

Maybe the rumors that the current version of AG can give four stones to the version that played Lee Sedol aren't so crazy, after all!

Supposedly, he also said DeepMind still has plans for AlphaGo, so I suppose we just need to be patient.
I wasn't at the event. If anybody has the presentation slides or a transcript, I'd very much love to see it. Thanks.

Krama · #2

Can't wait to see more info about AlphaGo.

Would be nice to see other top pros play against it.

yoyoma · #3

Reddit thread with Aja's recent talk:
https://www.reddit.com/r/baduk/comments ... e_29_2016/

Video direct link:
http://www.liacs.leidenuniv.nl/~kosters ... 16/ah1.mov

Copy/paste of my comments from reddit
Contents:
0:00 Introduction/Awards
9:20 Aja Huang's talk starts, history of his involvement in Go programming
15:20 Google DeepMind -- Why write Go programs?
17:30 Operating AlphaGo vs Lee Sedol
18:10 What makes AlphaGo so strong?
21:40 Convolutional neural networks
24:40 How to train the networks
26:20 Policy network Supervised learning -- 30 million positions from KGS 5d+ games, 4 weeks x 50 GPUs
27:35 Policy network Reinforcement learning -- self-play, 1 week x 50 GPUs
29:50 Value network Reinforcement learning -- 30 million games of self-play, 1 week x 50 GPUs
30:40 Graph of Mean Squared error of move guessing
32:00 Tree search algorithm
34:55 Mistake/typo in Nature paper Accuracy of raw policy should be 30%, not 24% as in the paper
35:15 Graph of AlphaGo Elo (Nature paper version v13 [vs Fan Hui])
35:55 Graph of v18 [vs Lee Sedol] AlphaGo Elo -- v18 is 3~4 stones stronger than v13
37:30 Game 4 vs Lee Sedol "Horizion effect? The answer is too deep in the tree. Weakness of the value network? Too few similar positions in the training set. Anyway, the problem is fixed in the latest version!"
39:45 What's next?

There isn't much new information in this talk. At 35:55 he shows the Elo graph of v18 (the one that played Lee Sedol), and that it is 3~4 stones stronger than v13 (the one that played Fan Hui). It's the same graph they have shown in some other talks given since the Lee Sedol game.

I'm pretty sure that recent rumors about AlphaGo being 4 stones stronger come from seeing that graph. He didn't say anything about how much stronger current AlphaGo is than v18.

He did say that the current AlphaGo plays correctly after Lee Sedol's 78th move in Game 4. As far as how they fixed it, it seems they mostly just did more and more training and now it plays that position correctly.

yoyoma · #4

New link: http://www.liacs.leidenuniv.nl/~csicga/cg2016/ah1.mov
Q&A link: http://www.liacs.leidenuniv.nl/~csicga/cg2016/ah2.mov
Q&A summary:
0:00 Q/A -- AlphaGo's next match? Answer: Cannot comment, but we will do something.
1:10 Neural networks for smaller developers with less resources?
2:00 How deep you go in tree before you stop and call the Value network?
3:40 Is the method specific to Go? (During the answer you can guess that they still have not trained a network from scratch without human games as a bootstrap).
4:30 Special approaches for handicap used? Answer: no
5:30 AlphaGo version naming question
6:30 Hardware for running AlphaGo (vs training AlphaGo).
7:40 Question about training time statistics -- for Nature v13? Or Lee Sedol v18?
9:30 Did they use Google TensorFlow?
9:50 Supervised learning vs Deep learning? (I couldn't understand the question)
11:00 Convolutional networks are usually used for image recognition, why use a Convolutional network for Go?
12:15 How did they pick 12 layers for the network?
13:15 Isn't AlphaGo's opening influenced too much by training data from humans? (During the answer he says directly they haven't trained a network from scratch without human data)
16:00 How many more holes similar to Game 4's hole are in AlphaGo's network?
17:35 How is the tree updated asynchronously?
18:00 What other fields can this technique be used for?
20:20 Clarification on how Policy/Value networks were trained? (I couldn't understand the question)

Babelardus · #5

AlphaGo is four stones stronger against it's former self; I wonder if it is also four stones stronger than any human. What I would be interested in is to see a match 'in the old way', against Ke Jie for example: Start normally, and then take one handicap stone for the next step if you lose. Keep doing this to determine when the match becomes even.

I'm not interested in the fact that AlphaGo is stronger than any human player, but I'm interested in *how much* stronger it is.

In Chess, this has never been determined. All engine rating lists are incompatible with human lists as they are different ELO pools. A huge human/engine tournament should be organized; like 10-12 humans, and 10-12 engines (from the strongest to progressively weaker), so there will be a pool that measures the engines against known human ratings. Then start a normal (but very large) tournament, humans against humans, engines against engines, and engines against humans. 52 swiss rounds or something. One a week, it could take a year, if need be. Play it all over the world, over the internet for all I care. Use powerful, but normal desktop computers that anyone could buy; an i7-6700K or 6800K with 16GB RAM or something.

Then we can use this information to recalibrate one of the prominent engine ELO lists, and we will know exactly which engines are stronger than humans, and by how much.

"Stockfish/Komodo defeats Carlsen 9-1 in a match" is useless information, as it could be 200 ELO stronger, or 2000 ELO. Nobody knows. "Stockfish/Komodo performs at 3215 ELO in human/engine tournament" is useful, because at that point I know that the top engine is 365 ELO stronger than the top human; roughly the difference between a new grandmaster (@ 2500 ELO) and an super top of the line grandmaster/Carlsen (@ ~2800-2875 ELO).

The same is true for AlphaGo. "AlphaGo beats Ke Jie by 7-0" is useless. "AlphaGo is 5 stones stronger than Ke Jie" is exactly quantifiable. Then Ke Jie stands to AlphaGo as a 5 dan amateur stands to Ke Jie (roughly... I'm estimating here, that 1p-9p is about 3 stones).

AlphaGo "Bug" Is Fixed

Who is online