it's not just tenuki
- djhbrown
- Lives in gote
- Posts: 392
- Joined: Tue Sep 15, 2015 5:00 pm
- Rank: NR
- GD Posts: 0
- Has thanked: 23 times
- Been thanked: 43 times
it's not just tenuki
MCTS bots play the percentages - that's what statistical sampling means.
- Attachments
-
- a.sgf
- (1.44 KiB) Downloaded 629 times
Last edited by djhbrown on Tue May 02, 2017 12:35 am, edited 1 time in total.
-
Mike Novack
- Lives in sente
- Posts: 1045
- Joined: Mon Aug 09, 2010 9:36 am
- GD Posts: 0
- Been thanked: 182 times
Re: it's not just tenuki
It is difficult to understand that you sometime seem to understand that the bots aren't "thinking" like we do while at other times seem to imagine that human style thinking is the only way to go.djhbrown wrote:MCTS bots play the percentages - that's what statistical sampling means.............. because no opponent with half a brain would be so dumb as to tenuki during a forced sequence........ they don't have emotional reactions like we humans - it's only that it looks like that to us. Whereas, in fact, they are Quixotic all the time, even when playing at their best.........
And it's not just tenuki, as i found out the hard way this morning against Hirabot33 (and Lee Sedol found out the hard way in game 2 against Alphago).
At move 32 in the above game, Hirabot33 played what looked to me to be an unbelievably stupid move. And followed it up with the even more banal 34. What on earth was going on its little bot-mind??
You shouldn't assume that "statistical sampling" (by including lines a "thinking human" wouldn't try) is necessarily bad. The bots might have different strengths. They don't need a "plan" but will be analyzing afresh each position. That means they might be better at making use of little bits of aji scattered over the board, none of which individually appear to offer very much (and so the human can't plan around them) but that collectively add up to an advantage that will eventually materialize.
Those odd moves and odd tenukis might be good moves, just ones too difficult for a human to see the point of because the benefit is remote. There isn't SPECIFIC plan that the move affects.
Maybe the way to try to look at this is to think back when you would have a hard time understanding a correct (good) tenuki. Say you are playing in a joseki sequence and all of a sudden the opponent tenukis. At some point you learned to look at that (odd) move and recognized that if you didn't respond you would suffer a disadvantage at that location but the move also was a ladder breaker affecting the way you were playing the joseki. As a human player you were able to recognize the "plan" involved. In other words, the human opponent could conceive of that particular tenuki and you to recognize why it would work.
But now suppose it was one of these bots doing that. The reason might not be because of a potential ladder in the area you are now playing but the likelihood of several other ladders in areas not currently being played in (and the collective value of those might be more than the local loss in the area being played in).
- oren
- Oza
- Posts: 2777
- Joined: Sun Apr 18, 2010 5:54 pm
- GD Posts: 0
- KGS: oren
- Tygem: oren740, orenl
- IGS: oren
- Wbaduk: oren
- Location: Seattle, WA
- Has thanked: 251 times
- Been thanked: 549 times
Re: it's not just tenuki
32 looked like an obvious move to me. 34 requires a little bit of reading. I'm not sure you can compare this at all to Lee Sedol vs AlphaGo.djhbrown wrote: At move 32 in the above game, Hirabot33 played what looked to me to be an unbelievably stupid move. And followed it up with the even more banal 34. What on earth was going on its little bot-mind??
-
jeromie
- Lives in sente
- Posts: 902
- Joined: Fri Jan 31, 2014 7:12 pm
- Rank: AGA 3k
- GD Posts: 0
- Universal go server handle: jeromie
- Location: Fort Collins, CO
- Has thanked: 319 times
- Been thanked: 287 times
Re: it's not just tenuki
While I'm about your level and it's difficult to tell for sure, it looks to me like HiraBot gets a good result even if you play correctly because there are forcing moves to build white's outside wall. The bot has "decided" that the solidification of black's territory is worth the outside gain. Of course, if black makes a mistake the program may as well take the profit that is offered.
I think this is similar to many of AlphaGo's moves: the software has calculated that a small local loss is worth the global gain. This is what has made professionals and amateurs alike inspect the games with great detail: AlphaGo evaluates the position differently than most humans, and that means we have an opportunity to learn.
Remember that the neural networks that restrict the moves ALphaGo considers were developed through many, many iterations of self play. Since the ability of white and black to follow complex lines would be entirely equal, trick moves would be unlikely to show favorable results under these conditions.
I do think that many of the problems you are describing have been a part of existing bots, especially when the outcome of the game is mostly decided. For the most part, the addition of neural networks has limited this problem when the game is still competitive. But we must tread lightly as we begin studying the play of professional level bots. We shouldn't accept every move just because the bot played it (perhaps it is displaying some of the problems you highlight!), but neither should we reject moves because we don't immediately understand them (perhaps the move is right after all). Amateurs who play stronger players are familiar with this tension every time they play!
I think this is similar to many of AlphaGo's moves: the software has calculated that a small local loss is worth the global gain. This is what has made professionals and amateurs alike inspect the games with great detail: AlphaGo evaluates the position differently than most humans, and that means we have an opportunity to learn.
Remember that the neural networks that restrict the moves ALphaGo considers were developed through many, many iterations of self play. Since the ability of white and black to follow complex lines would be entirely equal, trick moves would be unlikely to show favorable results under these conditions.
I do think that many of the problems you are describing have been a part of existing bots, especially when the outcome of the game is mostly decided. For the most part, the addition of neural networks has limited this problem when the game is still competitive. But we must tread lightly as we begin studying the play of professional level bots. We shouldn't accept every move just because the bot played it (perhaps it is displaying some of the problems you highlight!), but neither should we reject moves because we don't immediately understand them (perhaps the move is right after all). Amateurs who play stronger players are familiar with this tension every time they play!
- djhbrown
- Lives in gote
- Posts: 392
- Joined: Tue Sep 15, 2015 5:00 pm
- Rank: NR
- GD Posts: 0
- Has thanked: 23 times
- Been thanked: 43 times
Re: it's not just tenuki
jeromie wrote:it looks to me like HiraBot gets a good result even if you play correctly because there are forcing moves to build white's outside wall.
My interest is this: Where does AI go from here?
- Attachments
-
- b.sgf
- (1.1 KiB) Downloaded 584 times
Last edited by djhbrown on Tue May 02, 2017 12:37 am, edited 1 time in total.
-
Kirby
- Honinbo
- Posts: 9553
- Joined: Wed Feb 24, 2010 6:04 pm
- GD Posts: 0
- KGS: Kirby
- Tygem: 커비라고해
- Has thanked: 1583 times
- Been thanked: 1707 times
Re: it's not just tenuki
I thought AlphaGo was still improving to this day through its self play.
be immersed
- EdLee
- Honinbo
- Posts: 8859
- Joined: Sat Apr 24, 2010 6:49 pm
- GD Posts: 312
- Location: Santa Barbara, CA
- Has thanked: 349 times
- Been thanked: 2070 times
-
pookpooi
- Lives in sente
- Posts: 727
- Joined: Sat Aug 21, 2010 12:26 pm
- GD Posts: 10
- Has thanked: 44 times
- Been thanked: 218 times
Re: it's not just tenuki
In commercial version, Zen is limited to 8 cores (2013 version, deep learning version could parallel even more processors) and Crazy Stone is limited to 64 cores (deep learning version, Remi answer this himself, was 32 cores in 2015 version) For experimental version Zen use two Xeon E5-2623 v3 x2 and four GeForce GTX TITAN X while Crazy Stone use Xeon, 18 cores, 36 threads, 2.9GHz. But I agree that these hardware can't even compare to the single machine version of AlphaGo (48 cpu, 8 gpu).djhbrown wrote: One obvious way to improve Alphago is to add yet more processors to increase the size of the samples and maybe a few more gerzillion self-play RL exercises (although i feel that RL's hill-climbing levels out pretty quickly). Alphago uses about 2000 parallel processors, whereas Zen and others are limited to about 4 or so. That's an increase of 3 orders of magnitude and may be worth 2 or even 3 stones at their level. Or it may not. We won't know until they play each other.
I think you already know that DeepMind eradicate game 4 bug by training AlphaGo even more, what do you think about this method? They said the bug is 'horizontal effect' but did not elaborate that term. It's like they're not quite sure either.djhbrown wrote: in http://papers.ssrn.com/sol3/papers.cfm? ... id=2818149 i showed that (2) just a little commonsense would have guided Alphago to finding a workable defence to Lee's magic wedge in game 4.
- djhbrown
- Lives in gote
- Posts: 392
- Joined: Tue Sep 15, 2015 5:00 pm
- Rank: NR
- GD Posts: 0
- Has thanked: 23 times
- Been thanked: 43 times
Re: it's not just tenuki
i would imagine that DM are currently more focussed on producing something useful in image analysis for differential diagnosis of medical conditions.
Last edited by djhbrown on Tue May 02, 2017 12:39 am, edited 1 time in total.
-
pookpooi
- Lives in sente
- Posts: 727
- Joined: Sat Aug 21, 2010 12:26 pm
- GD Posts: 10
- Has thanked: 44 times
- Been thanked: 218 times
Re: it's not just tenuki
Here https://www.reddit.com/r/baduk/comments ... _is_fixed/djhbrown wrote:i didn't know that; if you know of a public statement to that effect, please share it. as to "horizontal effect", i agree with them. as i said before, it's a kind of "horizon effect" - but a horizon width rather than depth. in the case of game 4 black 79, Alphago hadn't looked wide enough.pookpooi wrote: I think you already know that DeepMind eradicate game 4 bug by training AlphaGo even more, what do you think about this method? They said the bug is 'horizontal effect' but did not elaborate that term. It's like they're not quite sure either.
and here https://www.youtube.com/watch?v=LX8Knl0g0LE it's the last question of q&a section, so nearly the end of the video
- djhbrown
- Lives in gote
- Posts: 392
- Joined: Tue Sep 15, 2015 5:00 pm
- Rank: NR
- GD Posts: 0
- Has thanked: 23 times
- Been thanked: 43 times
Re: it's not just tenuki
thanks for the links, pookpooi.
re: fixing the bug
i enjoyed Fan Hui's anecdote about imagining that they wanted to wire him up to probe his brain while he was playing Go.
re: fixing the bug
i enjoyed Fan Hui's anecdote about imagining that they wanted to wire him up to probe his brain while he was playing Go.
Last edited by djhbrown on Tue May 02, 2017 12:39 am, edited 1 time in total.
- oren
- Oza
- Posts: 2777
- Joined: Sun Apr 18, 2010 5:54 pm
- GD Posts: 0
- KGS: oren
- Tygem: oren740, orenl
- IGS: oren
- Wbaduk: oren
- Location: Seattle, WA
- Has thanked: 251 times
- Been thanked: 549 times
Re: it's not just tenuki
For fun, I ran the game through to see what crazystone and zen thought. They also looked at Hirabot's moves for 32 and 34 early and then started moving away. So shapewise, it's a good move to look at for a first try and the stronger bots decide not to play them.
-
Kirby
- Honinbo
- Posts: 9553
- Joined: Wed Feb 24, 2010 6:04 pm
- GD Posts: 0
- KGS: Kirby
- Tygem: 커비라고해
- Has thanked: 1583 times
- Been thanked: 1707 times
Re: it's not just tenuki
Actually, I think Aja said that the "bug" was fixed simply by continuing self play. They didn't explicitly give information tailored to the situation, and let it just keep improving itself. Later, they presented the same board position, and the new version of AlphaGo found the correct answer.djhbrown wrote:as to "fixing the bug", it is conceivable that more RL trials would improve performance, but there is evidence that RL tails off asymptotically [1], so i guess they found a different way, by simply presenting the position after white 78 to the policy network, telling it that Kim's move of L10 is the correct reply. And telling the value network that the position after black L10 is a win for black.
RL may tail off at an asymptote, but I don't think AlphaGo has reached that point yet. So far, it appears to have continued improvement simply through self play.
be immersed
- djhbrown
- Lives in gote
- Posts: 392
- Joined: Tue Sep 15, 2015 5:00 pm
- Rank: NR
- GD Posts: 0
- Has thanked: 23 times
- Been thanked: 43 times
Re: it's not just tenuki
If you were in charge of training Alpha, and had new data from 5 games against one of the world's best players, it would be rather remiss of you not to tell Alpha to learn from that experience and instead just hope she would learn enough solely through self-play.
Last edited by djhbrown on Tue May 02, 2017 12:41 am, edited 1 time in total.
-
pookpooi
- Lives in sente
- Posts: 727
- Joined: Sat Aug 21, 2010 12:26 pm
- GD Posts: 10
- Has thanked: 44 times
- Been thanked: 218 times
Re: it's not just tenuki
If I'm in charge with AlphaGo then I'd do what you recommend, directly feed the correct positions, force AlphaGo to learn.
But the real question is, is it that easy? What's more convenience for programmers between doing that or let AlphaGo selfplay correct itself? DeepMind knows, I don't.
But the real question is, is it that easy? What's more convenience for programmers between doing that or let AlphaGo selfplay correct itself? DeepMind knows, I don't.