Teaching a convolutional deep network to play go
Posted: Mon Dec 15, 2014 1:11 pm
Just saw this on Arxiv, will read one of these days
Life in 19x19. Go, Weiqi, Baduk... Thats the life.
https://www.lifein19x19.com/
The paper is surprisingly readable, too, even if you don't know anything about deep/convo nets. Worth a shot, too.daal wrote:Here's an article about the research: http://www.technologyreview.com/view/53 ... irst-time/
Watch out, it can already beat GnuGo 90 % of the time...
Time-delayed neural nets can be considered as "with lookahead" at least when training.Mike Novack wrote:Ah, I have been waiting to see if somebody was going to try to teach go to a neural net. My sense of the difficulties was apparently wrong. I'll have to look at this.
This is actually great news about a possible new way forward beyond MCTS.
The thing to understand is that the program that implements a neural net has nothing to do with the particular task to be mastered, whether predicting the weather, driving a car in city traffic, or in this case, playing the game of go.
The neural net "learns" how to do the task (is "taught") and all the "teachers" need to be able to do is determine "that was a good job" or "that was a bad job". They exhibit some of the properties of "real" brains, notably, if "damaged" relearn the task quicker than from scratch (and hitting them with minor damage, aka "annealing", is how they get beyond being trapped on a local "hill" of the performance "surface).
With a neural net, concepts like "look ahead" aren't relevant. While neural nets can perform tasks, it's not meaningful to ask how they do it. All you end up with is "this arrangement of network connections and set of cell values works" (a different one might also work). While a human player can picture "I am looking ahead" we don't know what that means in terms of the connection between neurons in the brain and that is the level we can see if we look at how a neural net that has learned to play go differs from the same neural net if learned to drive a car.
Well, if I understand it, the learning in this case isn't literally "how to play go" so no, I don't think "look ahead" comes into it. The "function" being learned could be thought of something like "given this board position (ignore what came before and what might come after) what would be the next move Go Seigen would make" (the net has learned to predict, for all positions of Go's games, what next move did he make). I haven't yet looked to see if they used one go master or several, and that in itself an interesting question.RBerenguel wrote: Time-delayed neural nets can be considered as "with lookahead" at least when training.
The point of no lookahead, is that the net was beating a program looking ahead for what was going to happen, whereas the net was "playing on instinct", so to say.
Time delayed nets are different. But yes, a neural net is essentially a non-linear interpolator which autodetects features.Mike Novack wrote:Well, if I understand it, the learning in this case isn't literally "how to play go" so no, I don't think "look ahead" comes into it. The "function" being learned could be thought of something like "given this board position (ignore what came before and what might come after) what would be the next move Go Seigen would make" (the net has learned to predict, for all positions of Go's games, what next move did he make). I haven't yet looked to see if they used one go master or several, and that in itself an interesting question.RBerenguel wrote: Time-delayed neural nets can be considered as "with lookahead" at least when training.
The point of no lookahead, is that the net was beating a program looking ahead for what was going to happen, whereas the net was "playing on instinct", so to say.
Similarly I don't know what computer resources were needed and when they said "much faster than feugo" how small a fraction of the time was required. There are interesting problems and possibilities. It isn't clear whether better results would come from the net learning uisng the games of one master or several (there could be style conflicts) and if very fast (small fraction of what a MCTS evaluator would require) there are some interesting possibilities for the future.
A neural net can be "switched" very quickly. Let's suppose for just a moment that the time to run the net is 10% of the time to do a MCTS from scratch. Let's suppose that we had three nets each trained to predict the next move of anpother master. "What would Go do?", What would Jowa do?" etc. OK, 30% of the time has been used up. Now use the remaining 70% to do an in depth MCTS on those three moves.
I think this is perhaps just the beginning of investigating the possibilities of neural nets for strong play.
We maybe need to help other folks here understand what we are talking about?RBerenguel wrote:
Time delayed nets are different. But yes, a neural net is essentially a non-linear interpolator which autodetects features.
The nets were trained on the whole GoGoD, other test networks on a subset of GoGoD and KGS high dan games. Just the games of a player or a few would be too little (Go played ~900 games, at 300 moves per game that's just 2.7 e 5 moves) to train the network, way too little (hard to tell a minimum number, but when the net could well have 10k nodes, you need a lot more data than nodes)
Training was done on a GPU during 2 or so days, but the cost of getting a result from a net like this is essentially "0". Think about it: it's just computing some easy nonlinear functions and some large (but not huge) matrix/vector products. Peanuts, really. Training is expensive, classifying afterwards is incredibly cheap. A MCTS does a lot of heavy lifting on the other hand.
It has been suggested (reddit, computer go mailing list) to use such a net as a move generator function for the MCTS pruning. This can be huge for computer go, I think, as it is, easily 1 stone stronger. Probably slightly more, just by plugging this in front of MCTS without more thinking/work.
I'm not sure the program will even identify there is a ladder...snorri wrote:How would such a program handle ladders?
In the article, we can read this :daal wrote:Here's an article about the research: http://www.technologyreview.com/view/53 ... irst-time/
Watch out, it can already beat GnuGo 90 % of the time...
I disagree. It could fight just like we fight in a blitz game: we don't have time to properly read and understand the fight, we spot key points and play them and hope for the best.Krama wrote:I believe these networks couldn't be used in actual local fighting but could be used to play whole board moves.
Once you finish a local fight you ask the network where to play the biggest move etc.
There is no way (no matter how much games they put in the network) it can be good at complex local fights or even fights that turn into whole board fights.
You would need to somehow program the thing so that it "knows" when it is fighting and when it is doing something else.
Then ask it to give you few good moves and then proceed to analyze them with MTC method.