Life In 19x19
http://www.lifein19x19.com/

The Shodan Go Bet
http://www.lifein19x19.com/viewtopic.php?f=18&t=2646
Page 4 of 5

Author:  John Fairbairn [ Tue Dec 28, 2010 3:57 pm ]
Post subject:  Re: The Shodan Go Bet

My own impression was that the games were not a great advertisement for the computer, but they were not much of an advertisement for human play either. I think we need to remember just how weak a 2-dan amateur really is.

MFOG did a very good job of passing the Turing test early on, but since it appears to have a large store of human knowledge in the form of fuseki and joseki libraries, that's possibly deceptive.

I would also expect John Tromp to learn something about the computer's weaknesses from these two games (in fact, I had the sense that he was testing it even in Game 2), whereas the computer will learn nothing from them. However, I understand that the program did not have all the hardware it expected and that may be changed for the other games.

Author:  tapir [ Tue Dec 28, 2010 4:29 pm ]
Post subject:  Re: The Shodan Go Bet

hyperpape wrote:
I'm not sure what you meant, tapir, but just to be clear, Tromp didn't play J14 and J16 in the latter game. The computer played as White.


I meant J15 and J17 :) John Tromp was clearly testing the bot here - as he was winning that game already. But J18 in the first game was played in the very beginning - and I certainly would not play such a move against a human opponent I consider as being of my strength (maybe it's just that I am weak): What I want to say, John Tromp plays as if he utterly disrespects the playing strength of the bot and for now he stands unrefuted.

Author:  Mike Novack [ Tue Dec 28, 2010 6:24 pm ]
Post subject:  Re: The Shodan Go Bet

I think we are seeing about what we should exect to be seeing.

The hardware for the first two games wasn't all that much more powerful than what Fotland would call a "standard machine" and so the program wouldn't be much over 1 dan. We should expect Tromp to be able to beat a 1 dan and as soon as he has a significant advantage in the game MFOG will begin to play wildly "go for broke" since it only considers won or lost, not by what margin.

Ignore that the bot playing this program on KGS has a 2 dan rating. I believe that is running on a machine about six times more powerful.

Author:  SpongeBob [ Tue Dec 28, 2010 6:50 pm ]
Post subject:  Re: The Shodan Go Bet

John Fairbairn wrote:
My own impression was that the games were not a great advertisement for the computer, but they were not much of an advertisement for human play either. I think we need to remember just how weak a 2-dan amateur really is.

Come on, take it easy ... John Tromp is quite a strong player, from my perspective and I guess also for the majority of the folks watching.

Author:  hyperpape [ Tue Dec 28, 2010 8:17 pm ]
Post subject:  Re: The Shodan Go Bet

Mike Novack wrote:
The hardware for the first two games wasn't all that much more powerful than what Fotland would call a "standard machine" and so the program wouldn't be much over 1 dan.


Fotland's proprietary term, or can I call it a standard machine too?

Author:  John Fairbairn [ Wed Dec 29, 2010 6:13 am ]
Post subject:  Re: The Shodan Go Bet

Quote:
Come on, take it easy ... John Tromp is quite a strong player, from my perspective and I guess also for the majority of the folks watching.


No. This is scientific research not a vanity contest. If this was medicine instead of go, I hardly think you would be tempted to go to a 2-dan amateur "doctor". In fact, when you consider the number of mistakes amateur dan players make (and I'm one of them), if this was driving a car instead of go, there'd be constant pile-ups.

Just think for a moment how small a proportion of his life a 2-dan amateur has spent on studying go.

Author:  lorill [ Wed Dec 29, 2010 6:20 am ]
Post subject:  Re: The Shodan Go Bet

and to drive ? I bet he used more time on go than on driving a car...

Author:  Mike Novack [ Wed Dec 29, 2010 6:46 am ]
Post subject:  Re: The Shodan Go Bet

Perhaps all creators of go playing software, especially if this is software for sale, should indicate on what power machine any claims of the strength of the software are based. Fotland does tell if you ask. And it is why he calls MFOG 12.021 "1 dan" even though the bot playing on KGS holds a 2 dan rank.

Is his definition of "standard" reasonable?
Well it's about the strength of a good current "consumer grade" laptop or a "professional grade" one from several years ago*. Not unreasonable to expect that his consumers might have available a machine of that power. But we should note that this might be more power than a current "consumer grade" netbook would have. Anybody considering buying any software needs to find out how it can be expected to perform on their available hardware.

Is "standard" universal?
Of course not. A different creator of software might target a different consumer market as primary. But as I noted, should tell you. In my opinion it would have been wrong/deceptive for Fotland to have called MFOG 12.021 "2 dan" just because it might be able to play at that level on KGS when running on a workstation many times more powerful than he could expect his customers to possess. But notice that this isn't quite the same as the "system requirements" (minimum machine for some software application). For example, MFOG 12.021 would run on a weaker machine, just slower (for the same time control, weaker) but this would not affect the non-playing aspects of the application (problems, for example).

Is there a problem? (for the consumers)
Yes, I think so. But we can scarcely blame the developers for the lack of understanding among users that the fact that their old desktop or a netbook performs perfectly well browsing the internet or doing other non computationally intesive tasks doesn't mean it will work well compared to a more powerful machine when a "crunch" is involved. Probably >99% of users never use their computers for a computationally intensive task.

Can you call it a "standard machine"?
Of course. It's up to you to decide what sort of machine you consider adequate for go playing software. Look, this can be important when we compare go software. The programs using MCTS are computationally intesive but the ones using just an AI are often not and need comparatively less computer power. This affects the relative strength we would expect if the machine power is dropped below what the MCTS program needs but which is adequate for the AI. Thus I would expect the difference in strength between MFOG 12.021 and gnugo to be more if both were running on my T61 compared to both running on this old desktop (fine for browsing and email but only about 1/4 of a machine "standard" for MFOG 12). That's because the gnugo would remain at the same strength but the MFOG 12 would play weaker.



* For example, my Lenovo ThinkPad T61's would qualify as "standard" for MFOG 12.021 (bought reconditioned from Lenovo, they are quite a few years old but still a match for most current "consumer grade" machines)

Author:  flOvermind [ Wed Dec 29, 2010 7:06 am ]
Post subject:  Re: The Shodan Go Bet

Tromp won the third game, this time against MFOG running on the Amazon cloud, using 26 ECU.

Seems like the game was closer this time, and the computer lost in the endgame. That's unexpected, I always thought a precise endgame was the strongest part of computers ;).

Author:  John Fairbairn [ Wed Dec 29, 2010 7:10 am ]
Post subject:  Re: The Shodan Go Bet

Quote:
running on the Amazon cloud, using 26 ECU.


What is the significance of this for non-computer people, please? E.g. is it Ferrari-level computer compared to the family-saloon level most of us have on our desks?

Author:  Li Kao [ Wed Dec 29, 2010 7:21 am ]
Post subject:  Re: The Shodan Go Bet

1ECU corresponds to about 1GHz singlecore. So 26 ECU about as strong as single CPUs get(Something like a hyperthreaded 3GHz Quadcore).
http://aws.amazon.com/about-aws/whats-n ... instances/

Author:  flOvermind [ Wed Dec 29, 2010 7:37 am ]
Post subject:  Re: The Shodan Go Bet

20 ECU is roughly equal to a fast 8-core CPU. Afaik, that's currently the fastest consumer single CPU system that you can buy.

Interpolating that, 26 ECU would be slightly faster than a 10-core CPU.

Author:  yoyoma [ Wed Dec 29, 2010 11:45 am ]
Post subject:  Re: The Shodan Go Bet

Tromp swept the computer 4-0. See below games 3 and 4. Game 3 the computer did several squeezes, and eventually created a large moyo. I didn't count to see if the computer had a chance if it held onto it, but Tromp broke into it anyways and won. Game 4 Tromp killed a group in the lower left corner. The computer made some fights and managed to kill some stones in the upper left area, but really Tromp was in control the entire game.




Author:  shapenaji [ Wed Dec 29, 2010 12:15 pm ]
Post subject:  Re: The Shodan Go Bet

if he went 4-0, the performance result for the bot during THIS test, seems to be below 1d. (that's not to say that we can verify this hypothesis with just one test, it's just that a 1d result would be winning 1.3 games.)

Author:  Harleqin [ Wed Dec 29, 2010 2:31 pm ]
Post subject:  Re: The Shodan Go Bet

I must say that John Tromp seems to play quite calmly and with very few blunders for a 2 dan. His last tournament in the European Go Database is from 2005, and he has a rating of 2231 from then. I would hazard a guess that he is really 3 dan or even stronger now.

Author:  SpongeBob [ Wed Dec 29, 2010 3:15 pm ]
Post subject:  Re: The Shodan Go Bet

shapenaji wrote:
if he went 4-0, the performance result for the bot during THIS test, seems to be below 1d. (that's not to say that we can verify this hypothesis with just one test, it's just that a 1d result would be winning 1.3 games.)

I was going to raise this question, given the name Shodan-Bet and John's strength of 1k at the time when the bet was made, if the bot would qualify at least as a shodan. Your statistical reasoning supports my feeling that the bot is not quite there, yet.

Author:  Joaz Banbeck [ Wed Dec 29, 2010 3:16 pm ]
Post subject:  Re: The Shodan Go Bet

His last move of game 4 looks like a blunder. He plays a 1 point gote move which he cannot defend without converting a 10-point semeai into a seki. But C19 is clearly sente and bigger. A 2k should see that. I'm assuming that he does. He seems openly contemptuous toward it.

And it does not seem like even 1D to me.

Author:  Li Kao [ Wed Dec 29, 2010 3:34 pm ]
Post subject:  Re: The Shodan Go Bet

At the end of game 4 Tromp had time pressure, which might explain some weak moves. And the program seems to have misjudged the life&death of the lower corners and thus thought for most of the game that it were ahead.

Author:  Harleqin [ Thu Dec 30, 2010 4:28 am ]
Post subject:  Re: The Shodan Go Bet

Joaz Banbeck wrote:
His last move of game 4 looks like a blunder. He plays a 1 point gote move which he cannot defend without converting a 10-point semeai into a seki. But C19 is clearly sente and bigger. A 2k should see that. I'm assuming that he does. He seems openly contemptuous toward it.


If you mean Black's move, you may have missed that White had "thrown in" a stone there just before, so that Black had no other sensible move.

Author:  Mike Novack [ Thu Dec 30, 2010 7:25 am ]
Post subject:  Re: The Shodan Go Bet

SpongeBob wrote:
shapenaji wrote:
if he went 4-0, the performance result for the bot during THIS test, seems to be below 1d. (that's not to say that we can verify this hypothesis with just one test, it's just that a 1d result would be winning 1.3 games.)

I was going to raise this question, given the name Shodan-Bet and John's strength of 1k at the time when the bet was made, if the bot would qualify at least as a shodan. Your statistical reasoning supports my feeling that the bot is not quite there, yet.


No, as shapenaji was clear to point out, an insufficient number of games from which to draw any* statistical inference even were the games between a human 1 dan and a 2 dan. We need to also remember that the expectation that a 1 dan will win 1/3 of the games against a 2 dan is empirical and based upon human players. It is entirely possible that the deviation between games is either greater or lesser for computer relative to human. I suspect the latter (that it is more likely that the human player sometimes puts together a great game well above his/her average performance than that the computer manages that feat).

* Consider an honest coin. If flipped four times the most likely (single) outcome is 2-2. But it is more likely than not the outcome will be one of the other possibilities (odds (5:3). In other words, looking at this in reverse, we should not express an opinion about the honesty of the coin based upon four trials because we do not expect the most likely result for an honest coin.

Page 4 of 5 All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/