John Fairbairn wrote:
Quote:
The interview starts at 15:10 and ends at 24:37. At 18:06, AJ and MR ask about the value network's opinions on various moves for black 1.
Thank you. I would love to know more but even with the volume set at maximum and earphones on I just can't hear this (or many other videos). Your partial summary is therefore a blessing

Oh. My apologies, I didn't know that. I tried to hit the highlights in my previous post, but for what it's worth here's the full interview transcript (I left out some "umm"s and "uh-huh"s etc. to improve readability, since they don't add any content, but it is otherwise complete):
Quote:
15:13 AJ: Hello out there everyone, I'm Andrew Jackson with the American Go Association. I'm joined today with Michael Redmond 9-dan professional--
15:20 MR: Hello.
15:20 AJ: --from the Nihon Ki-in, and David Silver, lead researcher of the AlphaGo team. David, thanks for joining us.
15:25 DS: No problem, nice to be here.
15:27 AJ: I think in the pre-match commentary here we'd like to ask you a few questions about AlphaGo, you know, where it's come in the last year. I guess we should start with if you'd like to summarize for those of us who weren't able to see the talk yesterday, how has AlphaGo changed since last year's version that we saw against Lee Sedol?
15:46 DS: Well, we've worked very hard to improve the algorithms that are inside AlphaGo, and in fact people tend to assume that when you do something like machine learning, it's all about the amount of computation and the amount of data that you actually use. But in fact, often it's the algorithms that really make the difference, and so this is really where we've focused on AlphaGo. And in fact, now the new version, AlphaGo Master, actually uses a lot less computation power. It uses about a tenth of the computation power of the version that played against Lee Sedol last year--
16:14 AJ: Shocking.
16:15 DS: --and it trains in a matter of weeks rather than months. And so why does this happen? Well, it's actually down to the methods that we use. And the main difference inside AlphaGo Master is that it's actually become its own teacher. So the way to understand this is that we really want to train AlphaGo on the best quality data that we could possibly find. And in our case, the best data that we can get our hands on is actually games played by AlphaGo itself. And so what we do is we actually get AlphaGo to play itself millions of times, and we use this extremely high-quality data--where it's kind of searched really deeply in each of these positions and done that in each position in the game and played all the way to the end--we use that data to train its neural networks.
16:54 AJ: To retrain it, in other words. Sort of like distilling its previous understanding.
16:58 DS: That's right.
16:59 AJ: Got it.
16:59 DS: And because of this it's much less reliant on human data and human knowledge than previous versions.
17:04 AJ: Got it. Awesome.
17:05 MR: That's very interesting.
17:06 AJ: Very interesting. So with the new value network, I think-- [gestures to MR]
17:10 MR: I have some questions about--well, I have zillions of questions, but just to keep it to some safe areas--I'd like to ask you, like with an open board like this, with komi--there's various komis that we play with throughout the world, like 6.5 or 7.5, so in this match it's going to be a 7.5 komi--so how does AlphaGo sort of, it always gives itself a certain winning percentage. So like, for instance, with this open board, with no stones on the board, it would give a winning percentage to black or white. How would that be?
17:43 DS: So AlphaGo interestingly actually thinks that the game is really balanced with a 7.5 point komi, but it thinks there's just a slight advantage to the player taking the white stones.
17:52 MR: Ah yes, ok. So today, let's see, Ke Jie has white so he has a slight advantage at this point.
17:58 AJ: And Ke Jie is famous for playing very well with white.
18:00 MR: Oh yes, it's going to be interesting, yes.
18:02 AJ: A very good winrate with white. I guess a followup question to that would be, of black's possible first moves, is there any that makes a dent in that slight advantage? Is there any move that seems to be-- [pauses]
18:15 MR: Or are there any moves that are maybe not as good as--
18:18 AJ: Or not as good, sure. Does the value network have any opinions on the first black move?
18:22 DS: Well the value network would certainly say if you played something crazy--for example, to play on the 1-1 point--it would certainly, you'd see a very big dent in the evaluation of the position. But amongst the standard opening moves which are played, AlphaGo actually evaluates them quite closely. So we're talking about very slight, just, you know, decimal point advantages to one move over another in the opening.
18:41 AJ: Just really small differences.
18:43 MR: Could I just ask, well, maybe you won't be able to know about this, but--since AlphaGo plays star points and 3-4 points mostly, and in the 60-game series we were seeing that--but of course, naturally a 1-1 point is obviously bad to a human, too. But we don't really know about moves in the center of the board, like, for instance, or on the sides, even, and very few games of professionals have been played like with tengen--
19:09 AJ: [simultaneously] --start at Tengen--
19:10 MR: --there's just a handful of games played by top pros. So does AlphaGo have any evaluation of that kind of unusual position with a move--
19:16 AJ: [simultaneously] --any of the unorthodox openings?
19:18 MR: --that is more ambiguous?
19:19 DS: At least from what we've seen so far, AlphaGo does tend to prefer moves around the corner area to opening right in the center.
19:26 MR: Oh, yes.
19:28 AJ: That's quite an interesting tidbit there, thank you very much. All right, well that--is there any other, perhaps future work or plans for AlphaGo that you can mention at this time?
19:41 DS: Well, I mean we're really just focused on what's happening this week. We're really excited to see what happens during the remaining matches and I can't wait to see the game played today. You know, as one of the developers, actually I think the second game is where you can start to actually relax and enjoy it a little bit--
19:56 AJ: Oh, good--
19:56 DS: --because the first game we're just watching to make sure everything goes as it should from a technical perspective--
20:02 AJ: Right, right--
20:03 DS: --and now we can all just sit back and--
20:05 AJ: --enjoy the games--
20:05 DS: [simultaneously] --perhaps enjoy the game a little more than the first one.
20:07 AJ: Proud parent, I know the feeling. All right, well that's wonderful, thank you very much. [pauses] I think we have five more minutes here of our pre-game commentary--
20:20 MR: Ok, well--
20:22 DS: Well, perhaps it--maybe it's worth clarifying, because you mentioned the value network--
20:25 AJ: Yes! Yeah--
20:26 DS: --and so would it be worth explaining what the value network is to some of the audience?
20:29 AJ: Sure, by all means.
20:30 MR: Oh yes, certainly.
20:32 DS: So AlphaGo inside it has two different neural networks, and these neural networks are representations of go knowledge. And so this is really what gives AlphaGo its brain, if you like; it's kind of its way of understanding what's happening in the position. And so it takes in something like the board position and it passes it through this brain, through the value network, to come up with some estimate of who's winning in that position. And that's just a number--if that number's high, then it says that AlphaGo thinks it's going to win in this position, and if it's low it thinks the opponent's going to win in this position.
21:04 AJ: Is that number directly translatable to a percentage?
21:07 DS: Yes. That number actually tells you the probability that AlphaGo assigns to winning the game from that point onwards. And then there's this second part of the brain which we call the policy network, and the policy network looks at the position and essentially recommends a move to play in this position. And again you can translate the final output of this part of the brain into its preferences over all of the moves, and, again, these you can think of as probabilities if you like. And what's different in AlphaGo Master is actually the way that these networks are trained. So, in AlphaGo Master these are trained from games that it's played against itself. And this means that it's played using this very high quality data that was played with these very long--the full power of AlphaGo playing searches, you can imagine produces very high quality moves. And those very high quality moves produced by AlphaGo's searches provide the training data which we use to train the policy network. In other words we try and get the policy network to predict the move that AlphaGo itself would have played at its full power of search and look-ahead.
22:07 AJ: Which sort of builds all of that previous knowledge into the new policy network that is being trained on this [unintelligible]
22:12 DS: That's right. And similarly the value network then trains on these very high-quality outcomes we have at the end of the games played by AlphGo against itself. You can imagine that if you're in a certain position, and you want to know who's ahead in that position, then, you know, a very good form of training data is actually to get AlphaGo to play out the game from that point all the way to the end. And so that's the training data we use to train the value network. We ask it to predict what would have happened in games against itself from that point onwards. And then this process gives us a new policy network and a new value network that we plug back into AlphaGo's search, and we iterate the whole process many many times.
22:46 MR: Yes.
22:47 AJ: And I know yesterday Demis [Hassabis] and yourself I think mentioned the concern that self-play might lead to gaps in that knowledge, and that Lee Sedol was brought in last year to maybe help find out those gaps, the same way Ke Jie--you know, is a way to see if there is any sort of self-fit. Is there some concern that the self-play would lead to almost an overfitting, where it has mutual blind spots? Or is there steps taken to address that?
23:12 DS: So that's a very interesting question, but in fact what we've seen is the opposite. That in fact, the kind of gaps in the knowledge that we saw previously in the match against Lee Sedol--we actually see that by continuing to play games against itself, and to train further and further on those games, it actually starts to address some of those issues.
23:30 AJ: That's very interesting.
23:30 DS: And so progressing further with the self-play actually appears to correct some of these misconceptions that AlphaGo used to have. Now of course this doesn't guarantee that it doesn't still have some gaps--
23:40 AJ: --that there might still be some out there--
23:42 DS: --and for us it's very hard to evaluate those additional gaps in knowledge without playing a match of this type.
23:48 AJ: Without actually have an evaluation.
23:49 DS: And so that's one of the reasons we're all so excited to be here, because we want to try and explore these kinds of amazing games against a top pro who's at the pinnacle of his game and the pinnacle of the go world. And then we have a chance to really find out if it has gaps. And if it doesn't, well, the beautiful thing is, we, you know--you can think of this as like two sculptors combining to make some kind of work of art together, and I'm just really excited to see what that sculpture ends up looking like today.
24:16 AJ: That's wonderful, that's wonderful.
24:17 MR: Yes.
24:19 AJ: All right, well we'll hope that Ke Jie can be a willing partner.
24:21 MR: Oh yes. [laughs]
24:23 AJ: All right, looking forward to it. Ke Jie with the white stones will be coming right up, and we'll see you soon.
24:29 MR: Yes.
24:29 DS: Thank you.