Teaching a convolutional deep network to play go

RBerenguel · #1

Just saw this on Arxiv, will read one of these days

daal · #2

Here's an article about the research: http://www.technologyreview.com/view/53 ... irst-time/

Watch out, it can already beat GnuGo 90 % of the time... :shock:

RBerenguel · #3

daal wrote:

Here's an article about the research: http://www.technologyreview.com/view/53 ... irst-time/

Watch out, it can already beat GnuGo 90 % of the time... :shock:

The paper is surprisingly readable, too, even if you don't know anything about deep/convo nets. Worth a shot, too.

What's striking IS NOT the 90% win rate, it's that the neural net plays with NO LOOKAHEAD. Had to put this in caps because I find it mind-blowing

Mike Novack · #4

Ah, I have been waiting to see if somebody was going to try to teach go to a neural net. My sense of the difficulties was apparently wrong. I'll have to look at this.

This is actually great news about a possible new way forward beyond MCTS.

The thing to understand is that the program that implements a neural net has nothing to do with the particular task to be mastered, whether predicting the weather, driving a car in city traffic, or in this case, playing the game of go.

The neural net "learns" how to do the task (is "taught") and all the "teachers" need to be able to do is determine "that was a good job" or "that was a bad job". They exhibit some of the properties of "real" brains, notably, if "damaged" relearn the task quicker than from scratch (and hitting them with minor damage, aka "annealing", is how they get beyond being trapped on a local "hill" of the performance "surface).

With a neural net, concepts like "look ahead" aren't relevant. While neural nets can perform tasks, it's not meaningful to ask how they do it. All you end up with is "this arrangement of network connections and set of cell values works" (a different one might also work). While a human player can picture "I am looking ahead" we don't know what that means in terms of the connection between neurons in the brain and that is the level we can see if we look at how a neural net that has learned to play go differs from the same neural net if learned to drive a car.

RBerenguel · #5

Mike Novack wrote:

Ah, I have been waiting to see if somebody was going to try to teach go to a neural net. My sense of the difficulties was apparently wrong. I'll have to look at this.

This is actually great news about a possible new way forward beyond MCTS.

The thing to understand is that the program that implements a neural net has nothing to do with the particular task to be mastered, whether predicting the weather, driving a car in city traffic, or in this case, playing the game of go.

The neural net "learns" how to do the task (is "taught") and all the "teachers" need to be able to do is determine "that was a good job" or "that was a bad job". They exhibit some of the properties of "real" brains, notably, if "damaged" relearn the task quicker than from scratch (and hitting them with minor damage, aka "annealing", is how they get beyond being trapped on a local "hill" of the performance "surface).

With a neural net, concepts like "look ahead" aren't relevant. While neural nets can perform tasks, it's not meaningful to ask how they do it. All you end up with is "this arrangement of network connections and set of cell values works" (a different one might also work). While a human player can picture "I am looking ahead" we don't know what that means in terms of the connection between neurons in the brain and that is the level we can see if we look at how a neural net that has learned to play go differs from the same neural net if learned to drive a car.

Time-delayed neural nets can be considered as "with lookahead" at least when training.

The point of no lookahead, is that the net was beating a program looking ahead for what was going to happen, whereas the net was "playing on instinct", so to say.

Btw, convolutional deep nets can be analysed for activations (the images in the paper or the folks report) and then somehow understand what the net is actually doing in a general sense.

Mike Novack · #6

RBerenguel wrote:

Time-delayed neural nets can be considered as "with lookahead" at least when training.

The point of no lookahead, is that the net was beating a program looking ahead for what was going to happen, whereas the net was "playing on instinct", so to say.

Well, if I understand it, the learning in this case isn't literally "how to play go" so no, I don't think "look ahead" comes into it. The "function" being learned could be thought of something like "given this board position (ignore what came before and what might come after) what would be the next move Go Seigen would make" (the net has learned to predict, for all positions of Go's games, what next move did he make). I haven't yet looked to see if they used one go master or several, and that in itself an interesting question.

Similarly I don't know what computer resources were needed and when they said "much faster than feugo" how small a fraction of the time was required. There are interesting problems and possibilities. It isn't clear whether better results would come from the net learning uisng the games of one master or several (there could be style conflicts) and if very fast (small fraction of what a MCTS evaluator would require) there are some interesting possibilities for the future.

A neural net can be "switched" very quickly. Let's suppose for just a moment that the time to run the net is 10% of the time to do a MCTS from scratch. Let's suppose that we had three nets each trained to predict the next move of anpother master. "What would Go do?", What would Jowa do?" etc. OK, 30% of the time has been used up. Now use the remaining 70% to do an in depth MCTS on those three moves.

I think this is perhaps just the beginning of investigating the possibilities of neural nets for strong play.

RBerenguel · #7

Mike Novack wrote:

RBerenguel wrote:

Time-delayed neural nets can be considered as "with lookahead" at least when training.

The point of no lookahead, is that the net was beating a program looking ahead for what was going to happen, whereas the net was "playing on instinct", so to say.

Well, if I understand it, the learning in this case isn't literally "how to play go" so no, I don't think "look ahead" comes into it. The "function" being learned could be thought of something like "given this board position (ignore what came before and what might come after) what would be the next move Go Seigen would make" (the net has learned to predict, for all positions of Go's games, what next move did he make). I haven't yet looked to see if they used one go master or several, and that in itself an interesting question.

Similarly I don't know what computer resources were needed and when they said "much faster than feugo" how small a fraction of the time was required. There are interesting problems and possibilities. It isn't clear whether better results would come from the net learning uisng the games of one master or several (there could be style conflicts) and if very fast (small fraction of what a MCTS evaluator would require) there are some interesting possibilities for the future.

A neural net can be "switched" very quickly. Let's suppose for just a moment that the time to run the net is 10% of the time to do a MCTS from scratch. Let's suppose that we had three nets each trained to predict the next move of anpother master. "What would Go do?", What would Jowa do?" etc. OK, 30% of the time has been used up. Now use the remaining 70% to do an in depth MCTS on those three moves.

I think this is perhaps just the beginning of investigating the possibilities of neural nets for strong play.

Time delayed nets are different. But yes, a neural net is essentially a non-linear interpolator which autodetects features.

The nets were trained on the whole GoGoD, other test networks on a subset of GoGoD and KGS high dan games. Just the games of a player or a few would be too little (Go played ~900 games, at 300 moves per game that's just 2.7 e 5 moves) to train the network, way too little (hard to tell a minimum number, but when the net could well have 10k nodes, you need a lot more data than nodes)

Training was done on a GPU during 2 or so days, but the cost of getting a result from a net like this is essentially "0". Think about it: it's just computing some easy nonlinear functions and some large (but not huge) matrix/vector products. Peanuts, really. Training is expensive, classifying afterwards is incredibly cheap. A MCTS does a lot of heavy lifting on the other hand.

It has been suggested (reddit, computer go mailing list) to use such a net as a move generator function for the MCTS pruning. This can be huge for computer go, I think, as it is, easily 1 stone stronger. Probably slightly more, just by plugging this in front of MCTS without more thinking/work.

Mike Novack · #8

RBerenguel wrote:

Time delayed nets are different. But yes, a neural net is essentially a non-linear interpolator which autodetects features.

The nets were trained on the whole GoGoD, other test networks on a subset of GoGoD and KGS high dan games. Just the games of a player or a few would be too little (Go played ~900 games, at 300 moves per game that's just 2.7 e 5 moves) to train the network, way too little (hard to tell a minimum number, but when the net could well have 10k nodes, you need a lot more data than nodes)

Training was done on a GPU during 2 or so days, but the cost of getting a result from a net like this is essentially "0". Think about it: it's just computing some easy nonlinear functions and some large (but not huge) matrix/vector products. Peanuts, really. Training is expensive, classifying afterwards is incredibly cheap. A MCTS does a lot of heavy lifting on the other hand.

It has been suggested (reddit, computer go mailing list) to use such a net as a move generator function for the MCTS pruning. This can be huge for computer go, I think, as it is, easily 1 stone stronger. Probably slightly more, just by plugging this in front of MCTS without more thinking/work.

We maybe need to help other folks here understand what we are talking about?

For the moment we can ignore all thought of the internals of the neural net and think of it as a problem* in mathematics. Think of it as a "black box" evaluating the value of a function. This function reduces 3**361 input values (not all can exist but that's a simple way to look at it) to 361 output values. But the box isn't "coded" to be able to do this but "trained" to be able to do this with internal cell values fiddled until for all inputs used in training got the correct result (from each particular training value).

Some properties of this. While training can use a great deal of time/crunch use does not. And once a neural net has been trained, easily "cloned" (you don't have to train another net, just populate with the cell values).

But as I indicated, I think this is early days. Probably we don't yet know the way to train for best results. I suspect that divergence of playing style is going to come into this, and while I agree no one master has anywhere enough data to train to just this master's style, there might be very different results using first one's data, then another, then another, etc. (and when done, repeat the cycle several times) compared to mixing all the data randomly and repeating that same several number of times. And there are perhaps a whole bunch of other possible training strategies**.

Notice that to train a neural net to be able to perform a task the trainer does not have to be able to perform the task, just have an independent method to determine "right output value" or "wrong output value" (in this case, look up in the game record, what was the move made)

* Note that this is the same "problem" being solved by the MCTS programs. The difference is what goes on within the the "black box" to evaluate the function and perhaps what other "side values" besides "next move" might or might not be available as outputs.

** Neural nets tend to have the interesting property that if the trained net is "damaged" somewhat (the cell values randomly disturbed a bit) it can be fairly quickly retrained. Sort of like a person relearning after a stroke. So there might be interesting results taking the net trained "entire database" and then after slight damage retraining using Go's data (for example -- a case where a lot of data even though too little for full training). Like I said, this is early days for this approach.

PS --- I don't know exactly what resources were used for this, but because time and crunch power trade off, this might be well within the capabilities of "amateurs". Two days on a really powerful machine might not be more resources than two months on a much more modest machine.

RBerenguel · #9

Okay, I'll write my own version and then people can do the average

First, the simplest way of creating a really cool and strong game engine (for now, let's say we are talking about a turn-based game with perfect information... so, rule out poker but keep chess, go, shogi, hex, backgammon is almost here, too) is to create a tree of possible moves, while using a minimax approach to prune possible future moves: I want a move that maximises my chances to win, and then I imagine you play a move that minimises my chances of winning. The tricky part is how to identify the "chances to win" part.

Chess has it "easy": assigning the "standard" points for pieces and adding some positional based tweaking for the points (so, a trapped bishop is worth less than 3, a posted knight is worth more than 3...) means you can have incredibly strong engines already. But there's no easy way to do this in go, as we may clearly understand as players. Shape, thickness, "cash", tesuji... Too much information! So, how do we decide that a move is better than another?

Then, go had a breakthrough: Monte Carlo tree search. We don't know how to evaluate a middle game (or even opening!) go position, but we can easily evaluate a final position: with komi, we have a clear winner once you can't add more stones! So, instead of deciding possible next moves using heuristics (like, don't play close to thickness, cut where you can cut...) and then evaluating how good they are "somehow," just throw crunch power at it: pick a set of candidate "move 1" at random and generate many move trees of a certain width and as depth, until the game ends. Then, the best move is that move with more won games among the set of random trees. Easy, isn't it? (in a real MCTS there's a lot of tweaking to make it work fast)

The problem is that (even if go is very uniform in move types, not like chess) there are many moves, so the trees have to be very deep and very wide. This needs (in general, and as a simplification) a lot of processing power to generate them and quite a lot of memory to store the results. But turns out it is feasible and gets a decently strong opponent.

Okay, now to neural nets. Although neural nets sound very cool and are very "en vogue," in a mathematical-philosophical sense they are "just" a non-linear interpolator. Say you have a function f(x) which you don't know exactly. You only know what f(x) is for a set of points x, a lot of them maybe, but f is tricky. You can reconstruct something that is "almost like f" using interpolation techniques (functional analysis FTW!) In the case of neural nets, you have a special type of black box we call "net" and does exactly this procedure: interpolate some "unknown" given a lot of input-output data. So, give inputs x and their outputs f(x) (without knowing f) and after some magic you'll get a black box that gives something similar to f when given x, and gives something out of a point "y" which we can call f(y), why not? More details later.

Convolutional neural nets are a special type of net with a very specific way to arrange how the data gets in and behaves inside (and deep means there are many layers.) Mumbo jumbo, so to say. Just a neat old interpolation technique. More or less.

In the case of go, we have a huge, unknown function K(x) and in an ideal world you input as x a 19x19 go position and then K(x) outputs the absolute best move in the current situation. Sounds cool, doesn't it? The hand of god is here!

Of course, we don't know K, if we did we'd be kicking Iyama's (or Murakawa's?) ass. But we have something somewhat close to values of K(x) for some values of x: games played by pro players. They don't play perfectly, but they play close enough to perfect so we can use it to approximate K "well enough." So, we get our neural net approximation NK which behaves as a "pro player" (actually, the set of all pro players on GoGoD) plays. Given a go position x, just compute NK(x) to get a good next move according to the network. What's the difference with MCTS? Well, the neural net didn't really care about what the next moves were, or what the result of some game was. The suggested move was just suggested because this black box thought it fitted well with the data we gave it during training. That's it. Also, computing with this black box is stupidly cheap: there are no huge trees to evaluate, or moves to generate, nothing. It's almost like putting 5 on your calculator and pressing sqrt. Bam, instant (it's somewhat more expensive than instant, but compared with MCTS or heuristic based evaluators it's nothing).

Getting back to neural nets, to get the neat cool black box you have to actually do something: training the net. Since you have this set of I moved here and the good answer(s) was/were there, you know what you want to get. Then you use some magic (backpropagation is the name of the algorithm, for deep nets you need some tweaking) and find the values to make the black box work. Bam, amateur player out of nothing!

snorri · **#10**

How would such a program handle ladders?

oca · **#11**

snorri wrote:

How would such a program handle ladders?

I'm not sure the program will even identify there is a ladder...
we can even say that at some point the program even don't know that it is playing go when making a proposition for a move... (of course there should be a second stage that reject non valid moves and ask for a new one or something like that...)

oca · **#12**

daal wrote:

Here's an article about the research: http://www.technologyreview.com/view/53 ... irst-time/

Watch out, it can already beat GnuGo 90 % of the time... :shock:

In the article, we can read this :
"(Go rankings range from a beginner with a rank of 30-20 kyu to a professional expert with a ranking of 1 kyu)"

Krama · **#13**

I believe these networks couldn't be used in actual local fighting but could be used to play whole board moves.

Once you finish a local fight you ask the network where to play the biggest move etc.

There is no way (no matter how much games they put in the network) it can be good at complex local fights or even fights that turn into whole board fights.

You would need to somehow program the thing so that it "knows" when it is fighting and when it is doing something else.

Then ask it to give you few good moves and then proceed to analyze them with MTC method.

Polama · **#14**

Krama wrote:

I believe these networks couldn't be used in actual local fighting but could be used to play whole board moves.

Once you finish a local fight you ask the network where to play the biggest move etc.

There is no way (no matter how much games they put in the network) it can be good at complex local fights or even fights that turn into whole board fights.

You would need to somehow program the thing so that it "knows" when it is fighting and when it is doing something else.

Then ask it to give you few good moves and then proceed to analyze them with MTC method.

I disagree. It could fight just like we fight in a blitz game: we don't have time to properly read and understand the fight, we spot key points and play them and hope for the best.

If you give me 5 minutes per move and a professional 5 seconds per move, and somehow keep them from thinking on my turn, my money is definitely still on the professional.

We'd expect a properly trained net to end up with internal nodes representing the aliveness of a group. When looking at a possible direction to play, the status of the surrounding intersections would hook in to encourage that move (there's a tesuji to cut through here) or discourage it (you're running straight into a wall). Rather than reading out a tesuji, the non-linearity of the networks internal connections could encode the existence of the tesuji directly in the shape.

That said, I do expect the ability to play ahead would help.

Mike Novack · **#15**

Looks like we have to try again since there seems to be great misunderstanding about what this thing is doing.

Imagine that there is a function G such that G(position of board) returns the best move. To say that there is no such function is equivalent to saying that there are positions for which there is no best move (more than one move could be equal) so lets modify that to G(position of board) returns a move that is at least as good as any other. Reasonable so far?

But we DON'T know "G". At the very start, we don't know even an approximation of G. All we do know is that if we had G it would play a perfect game of go.

OK, a "neural net" is a very special kind of black box that implements a function. In other words, given an input it returns an output (always defined). Think of this as numbers for a moment, a number in and a number out, and to the extent that F(in) = the correct "out" for the definition of F then we say that the neural net has "learned" F. I want us to stick to numbers for a moment so that we can forget that these numbers have anything to do with playing go.

The inputs are the numbers from 0 to 3**361-1 and the possible outputs from 1 to 361. We put a number in and see what comes out. The right number? The wrong number? Well for some values of the number in (positions of games in our database) we think we do know what was the right number that should have come out, the move that the pro made. So we can in this case tell "right" form "wrong".

The property of a neural net is that whenever this is true (we can distinguish right from wrong)this information can be used to "train" the net. Not a matter of changing the program at all (the program implementing the neural net), just modifying a set of values within the net. And not doing this by anything other than a random process. It's sort of like giving the student a little smack on the head if got it wrong and a little reward if got it right and the student eventually learns to get it right.

The more pairs of input and outputs we know the easier it is to "train" the net to implement the function. If we knew all of them we could train the net to implement the function perfectly, but we don't. We only know a very small subset of pairs so the net will only be an approximation of G. How good an approximation? Well besides how many pairs that also depends on some of the properties of G. How "smooth" is G (or how jumpy). We don't know that (G is unknown) so it's an experimental question. Well some folks have tried it and it seems G is a smooth enough function that the approximation can play at the level being reported.

Note that we actually have a bit more sets of pairs than exist in the database because if right for a pair must also be right for the pairs representing the symmetries of a game of go (rotations and reflections. In other words, we do know that G must have that property so we can use it in training. Or looked an another way, for every pair in the database we can consider (and use in training) the rotations and reflections of those pairs.

It bears repeating so I will. The program implementing the neural net has nothing to do with the function the net ends up being taught. You would have the same program if you wanted the neural net to implement a function that, drove a car through traffic, predicted the weather, etc.

Krama · **#16**

What would happen if a professional played against this network and started playing pointless and stupid moves?

How would the network react since not any kind of this silly move was recorded in pro games ever, nor was it studied by the network?

Couple of silly moves wouldn't really mean anything when you are a pro.

Look at AgainstZen games. The way this unknown pro makes fun of an amateur 6d is really funny to watch.

Polama · **#17**

Mike Novack wrote:

It bears repeating so I will. The program implementing the neural net has nothing to do with the function the net ends up being taught. You would have the same program if you wanted the neural net to implement a function that, drove a car through traffic, predicted the weather, etc.

Perhaps this is nitpicking, but is that true? I was under the impression that we were still in the phase where human hyper-parameter tuning was necessary, that how the neural network is laid out (how many internal nodes? in what arrangement?) is at least bounded by humans and that for predicting the weather you'd probably choose different algorithms for propogation/whatever because problem size, complexity and time constraints lead to different choices on the algorithm to use.

I certainly understand that the general process is the same, but existing in parallel fields to this stuff, I'm curious whether it's becoming more automated and less arcane algorithmic tuning.

RBerenguel · **#18**

Polama wrote:

Mike Novack wrote:

It bears repeating so I will. The program implementing the neural net has nothing to do with the function the net ends up being taught. You would have the same program if you wanted the neural net to implement a function that, drove a car through traffic, predicted the weather, etc.

Perhaps this is nitpicking, but is that true? I was under the impression that we were still in the phase where human hyper-parameter tuning was necessary, that how the neural network is laid out (how many internal nodes? in what arrangement?) is at least bounded by humans and that for predicting the weather you'd probably choose different algorithms for propogation/whatever because problem size, complexity and time constraints lead to different choices on the algorithm to use.

I certainly understand that the general process is the same, but existing in parallel fields to this stuff, I'm curious whether it's becoming more automated and less arcane algorithmic tuning.

Net topology (way of connecting, number of nodes...) is indeed set by humans, and definitely affects the quality of the approximation. This is why the "deep" and "convolutional" parts in the title are not only mumbo jumbo, but actually useful.

Mike Novack · **#19**

Polama wrote:

Perhaps this is nitpicking, but is that true? I was under the impression that we were still in the phase where human hyper-parameter tuning was necessary, that how the neural network is laid out (how many internal nodes? in what arrangement?) is at least bounded by humans and that for predicting the weather you'd probably choose different algorithms for propogation/whatever because problem size, complexity and time constraints lead to different choices on the algorithm to use.

I certainly understand that the general process is the same, but existing in parallel fields to this stuff, I'm curious whether it's becoming more automated and less arcane algorithmic tuning.

OK, I was oversimplifying, but not in the sense you think. Remember when I mentioned rotation and reflection? That suggests that the way the data is INPUT is going to matter (the "topology" of the input). So yes, the function might be "smoother" for a special was of inputting and an "input array" that just happened to be 19x19. In that sense different from "the weather" where we start out not being able to assume symmetries in terms of rotation and reflection and that a certain size might have special import.

But the question of "the opponent made a different move", won't that screw things up?" completely misses what a neural net accomplishes. If making the same move (one that was training data) it will make the same move (that it learned to make). Now that "different move" represents a point in the input space that was not part of training. The net will return some move but is that a good move (the right move as a pro would judge). Well that depends on how good an approximation of G the net has learned.

Let's try a very simple, very smooth function as an example, and the human brain (one used to math problems) as the neural net. Here is some training data (pairs of numbers) representing the function you are to learn:

(3,1) (4,3) (5,5) (8,11) (9,13) (12, 19) OK, you have been "trained"

So I give you (10,?) ---- in other words, what is ?

And no of course, you can't be certain of the right answer. There are other functions that would have fit all six training pairs. But notice that you feel reasonably confident about your answer even though "10" was not part of your training data to learn the function y = 2x - 5

Polama · **#20**

Mike Novack wrote:

So yes, the function might be "smoother" for a special was of inputting and an "input array" that just happened to be 19x19. In that sense different from "the weather" where we start out not being able to assume symmetries in terms of rotation and reflection and that a certain size might have special import.

A deep network by definition has internal nodes that are neither directly the input nor the output. Again, nitpicky, but I agree that changing the input nodes doesn't change the program, but changing all those internal connectors and probably subbing out a different propogation algorithm for going from playing go to predicting the weather to get good results doesn't seem like "the same program". And I was mostly just curious if we were there yet, because there has been a lot of work on automatically making those sort of hyperparameter tuning decisions.

Quote:

Let's try a very simple, very smooth function as an example, and the human brain (one used to math problems) as the neural net. Here is some training data (pairs of numbers) representing the function you are to learn:

(3,1) (4,3) (5,5) (8,11) (9,13) (12, 19) OK, you have been "trained"

So I give you (10,?) ---- in other words, what is ?

I think Krama's concern was not how that would do on 10 (a new value but one in an area of dense coverage) but on how it would do on (-157,?) As you say, it depends on how good of an approximation has been learned. It's likely to interpolate less well far from any previous data point, but on the other hand playing crazy is likely to be weaker than playing normal moves, so it might not matter.

Teaching a convolutional deep network to play go

Who is online