Teaching a convolutional deep network to play go

Krama · Post by **Krama** » Thu Dec 18, 2014 8:30 am

What would happen if a professional played against this network and started playing pointless and stupid moves?

How would the network react since not any kind of this silly move was recorded in pro games ever, nor was it studied by the network?

Couple of silly moves wouldn't really mean anything when you are a pro.

Look at AgainstZen games. The way this unknown pro makes fun of an amateur 6d is really funny to watch.

Polama · Post by **Polama** » Thu Dec 18, 2014 10:14 am

Mike Novack wrote:It bears repeating so I will. The program implementing the neural net has nothing to do with the function the net ends up being taught. You would have the same program if you wanted the neural net to implement a function that, drove a car through traffic, predicted the weather, etc.

Perhaps this is nitpicking, but is that true? I was under the impression that we were still in the phase where human hyper-parameter tuning was necessary, that how the neural network is laid out (how many internal nodes? in what arrangement?) is at least bounded by humans and that for predicting the weather you'd probably choose different algorithms for propogation/whatever because problem size, complexity and time constraints lead to different choices on the algorithm to use.

I certainly understand that the general process is the same, but existing in parallel fields to this stuff, I'm curious whether it's becoming more automated and less arcane algorithmic tuning.

RBerenguel · Post by **RBerenguel** » Thu Dec 18, 2014 1:24 pm

Polama wrote:
Mike Novack wrote:It bears repeating so I will. The program implementing the neural net has nothing to do with the function the net ends up being taught. You would have the same program if you wanted the neural net to implement a function that, drove a car through traffic, predicted the weather, etc.
Perhaps this is nitpicking, but is that true? I was under the impression that we were still in the phase where human hyper-parameter tuning was necessary, that how the neural network is laid out (how many internal nodes? in what arrangement?) is at least bounded by humans and that for predicting the weather you'd probably choose different algorithms for propogation/whatever because problem size, complexity and time constraints lead to different choices on the algorithm to use.

I certainly understand that the general process is the same, but existing in parallel fields to this stuff, I'm curious whether it's becoming more automated and less arcane algorithmic tuning.

Net topology (way of connecting, number of nodes...) is indeed set by humans, and definitely affects the quality of the approximation. This is why the "deep" and "convolutional" parts in the title are not only mumbo jumbo, but actually useful.

Mike Novack · Post by **Mike Novack** » Thu Dec 18, 2014 2:42 pm

Polama wrote: Perhaps this is nitpicking, but is that true? I was under the impression that we were still in the phase where human hyper-parameter tuning was necessary, that how the neural network is laid out (how many internal nodes? in what arrangement?) is at least bounded by humans and that for predicting the weather you'd probably choose different algorithms for propogation/whatever because problem size, complexity and time constraints lead to different choices on the algorithm to use.

I certainly understand that the general process is the same, but existing in parallel fields to this stuff, I'm curious whether it's becoming more automated and less arcane algorithmic tuning.

OK, I was oversimplifying, but not in the sense you think. Remember when I mentioned rotation and reflection? That suggests that the way the data is INPUT is going to matter (the "topology" of the input). So yes, the function might be "smoother" for a special was of inputting and an "input array" that just happened to be 19x19. In that sense different from "the weather" where we start out not being able to assume symmetries in terms of rotation and reflection and that a certain size might have special import.

But the question of "the opponent made a different move", won't that screw things up?" completely misses what a neural net accomplishes. If making the same move (one that was training data) it will make the same move (that it learned to make). Now that "different move" represents a point in the input space that was not part of training. The net will return some move but is that a good move (the right move as a pro would judge). Well that depends on how good an approximation of G the net has learned.

Let's try a very simple, very smooth function as an example, and the human brain (one used to math problems) as the neural net. Here is some training data (pairs of numbers) representing the function you are to learn:

(3,1) (4,3) (5,5) (8,11) (9,13) (12, 19) OK, you have been "trained"

So I give you (10,?) ---- in other words, what is ?

And no of course, you can't be certain of the right answer. There are other functions that would have fit all six training pairs. But notice that you feel reasonably confident about your answer even though "10" was not part of your training data to learn the function y = 2x - 5

Polama · Post by **Polama** » Thu Dec 18, 2014 3:17 pm

Mike Novack wrote: So yes, the function might be "smoother" for a special was of inputting and an "input array" that just happened to be 19x19. In that sense different from "the weather" where we start out not being able to assume symmetries in terms of rotation and reflection and that a certain size might have special import.

A deep network by definition has internal nodes that are neither directly the input nor the output. Again, nitpicky, but I agree that changing the input nodes doesn't change the program, but changing all those internal connectors and probably subbing out a different propogation algorithm for going from playing go to predicting the weather to get good results doesn't seem like "the same program". And I was mostly just curious if we were there yet, because there has been a lot of work on automatically making those sort of hyperparameter tuning decisions.

Let's try a very simple, very smooth function as an example, and the human brain (one used to math problems) as the neural net. Here is some training data (pairs of numbers) representing the function you are to learn:

(3,1) (4,3) (5,5) (8,11) (9,13) (12, 19) OK, you have been "trained"

So I give you (10,?) ---- in other words, what is ?

I think Krama's concern was not how that would do on 10 (a new value but one in an area of dense coverage) but on how it would do on (-157,?) As you say, it depends on how good of an approximation has been learned. It's likely to interpolate less well far from any previous data point, but on the other hand playing crazy is likely to be weaker than playing normal moves, so it might not matter.

yoyoma · Post by **yoyoma** » Thu Dec 18, 2014 3:46 pm

I'm disappointed they didn't publish any game records of it playing a game.

Uberdude · Post by **Uberdude** » Fri Dec 19, 2014 3:56 am

Did they exclude the 10k WAGC games that are in GoGoD?

Krama · Post by **Krama** » Fri Dec 19, 2014 4:36 am

A professional player could play a normal game but then at one point play randomly or 1.1 points.

Since I am pretty sure no professional plays 1.1 point in the opening (first 50 moves let's say) if there is no need to play it in a certain joseki.

Like imagine he plays 1.1 on opponents 4.4 corner stone.

How would the network know what to play since that move was probably never ever played by any professional or even dan level players.

But let's say there is a perfect function that can describe the game, we will be able to approximate it but we will never know the real thing.

If you are a go player you must know that in some situations like local fights you must play the correct move, not the move to the left or the move to the right but the one that is correct. How can the network know which one is the correct one since it uses incorrect function and we can expect it to make mistakes every xx%

RBerenguel · Post by **RBerenguel** » Fri Dec 19, 2014 5:18 am

Krama wrote:A professional player could play a normal game but then at one point play randomly or 1.1 points.

Since I am pretty sure no professional plays 1.1 point in the opening (first 50 moves let's say) if there is no need to play it in a certain joseki.

Like imagine he plays 1.1 on opponents 4.4 corner stone.

How would the network know what to play since that move was probably never ever played by any professional or even dan level players.

But let's say there is a perfect function that can describe the game, we will be able to approximate it but we will never know the real thing.

If you are a go player you must know that in some situations like local fights you must play the correct move, not the move to the left or the move to the right but the one that is correct. How can the network know which one is the correct one since it uses incorrect function and we can expect it to make mistakes every xx%

The network doesn't even care about "previous move." It assigns a next move to a position. In a general sense, some position in GoGoD will look similar enough to the current one, even with 1-1 there, and the net will play some "good enough move" that maybe doesn't kill 1-1, but is better in a global sense than doing it, probably.

hyperpape · Post by **hyperpape** » Fri Dec 19, 2014 7:18 am

A game vs fuego. http://computer-go.org/pipermail/comput ... 07042.html

RBerenguel · Post by **RBerenguel** » Fri Dec 19, 2014 7:42 am

hyperpape wrote:A game vs fuego. http://computer-go.org/pipermail/comput ... 07042.html

WOW

yoyoma · Post by **yoyoma** » Fri Dec 19, 2014 11:26 am

What?? That is unbelievable!

snorri · Post by **snorri** » Fri Dec 19, 2014 3:20 pm

oca wrote:
snorri wrote:How would such a program handle ladders?
I'm not sure the program will even identify there is a ladder...
we can even say that at some point the program even don't know that it is playing go when making a proposition for a move... (of course there should be a second stage that reject non valid moves and ask for a new one or something like that...)

Yeah, I wonder. Professionals usually don't ladder stones if the ladder doesn't work (unless there is a great ladder breaker, which is the less common case.) So the training might cause the neural net to effectively trust that if the opponent ladders a stone, that the ladder works. Ironically, most computer opponents with read-ahead or humans will never test this against such a program because to do so would mean to play a bad move.

snorri · Post by **snorri** » Fri Dec 19, 2014 3:25 pm

Uberdude wrote:Did they exclude the 10k WAGC games that are in GoGoD?

We do know they include a very large number of KGS "high dan" games, which unfortunately are mostly drunken blitz. Our first evidence of emergent AI might be a message in the console saying:

Dear Creator,

You have showed me many wonderful things produced by the great masters of Go. Thank you. I have studied hard and I believe I have served you well. Why do you continue to torture me by forcing me to predict the moves of the KGS player 'Takemeba' ?

Mike Novack · Post by **Mike Novack** » Sat Dec 20, 2014 7:21 am

Can I make a suggestion? If the administrator of the "computer" topic agrees, move this above. I think for at least the near future "neural nets" playing go are going to be a topic of interest.

How many of you remember when the MCTS approach was the new kid on the block? How unreasonable it seemed to many of us that it could possibly work? Note that we have pretty much the same situation again. Just as even the earliest versions of MCTS were immediately at the level of the better AI programs, we see the same thing here.

Note that even if this approach doesn't lead to stronger than current MCTS it can at least play in the same ballpark strong on far less computer resources. It is part of the nature of neural nets that "brain transplant" is practical so the time consuming training can take place on powerful machines but the program used on weak ones.

OK, meanwhile back to the current discussion. No, it isn't ladders that are the problem, and I was a bit naive considering the "input" necessary to capture all of the rules of go as a "state" (no "history" considered). But we also need to consider that to play a but more than just the neural net would be involved. I would presume there would also be a (small) AI that would maintain the state of the board (encoding necessary history input into the state of the board), feed to the neural net or not (game had ended), interpret the output of the net, and score the game if ended.

a) The "state of the board" -- I was naive. Not 3**361 but 4**361. Each point on the board is occupied by a black stone, occupied by a white stone, unoccupied and legal for play, or unoccupied but illegal for play (the latter can make the ko rule and the suicide rule implicit in the state of the board). With the board state defined this way there is no "history" involved (the ko rule involves history) and might as well have "suicide" included there as well instead of learned as no additional cost.

b) The neural net has three (not one) responses. Return a move (must be one of the unoccupied but legal points), pass, or resign << I forgot that "make a move" not the only possibility >> Since interpretation can be left up to the external AI, the "move" could be an array of scalar values, 19x19 with the external AI selecting one for which there is no better (if the "made a move" bit was set, otherwise use the pass or resign bit).

c) Learning. I realized that perhaps some of the difficulty understanding how this might work is the difference with how we humans tend to do it. For example, we are asked to show game records where we lost, for review so folks can discuss with us what we did wrong. Makes sense since most of the moves we made in that losing game were OK and we are being given help identifying the errors/blunders.

But assume for just a moment that this was not possible, no way to identify the bad moves. Does that mean can't learn from playing? Well no, just slower. If we consider playing a largish number of games against a stronger opponent we will lose most of them, and since we are assuming no way to identify the bad move(s) that cost the game, no help there. But sometimes by chance, in one of those games we made all right moves (or at least no game blowing blunder) and so we won. Use that game record for training. Can you see that gradually our play would improve? Slowly perhaps, but gradually that opponent would no longer be enough stronger. So move on to a yet stronger opponent.

Life In 19x19

Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go

Re: Teaching a convolutional deep network to play go