Crazy Stone Deep Learning first impressions

dfan · Post by **dfan** » Wed Jun 08, 2016 6:23 am

Mike Novack wrote:
dfan wrote:
I've been working my way up through its ranks and have won my last five games (W+R vs 4k, B+8.5 vs 3k, B+R vs 3k, W+14.5 vs 2k, W+R vs 2k). I'm curious how its strength setting is calibrated.
Are you playing with time controls?

No. When I choose a time control I am unable to choose the rank of the engine.

When any of these programs are playing with time controls (X amount of time for Y moves) and so when coming up computes how much analysis they can do according to the power of the hardware the program finds itself running on then the strength levels cannot be absolute, just relative.

On the other hand, playing without time controls, setting how much analysis to use or what strength level to play at cannot know how much time will be required per move. Will simply use as much time as that depth of analysis requires when running on that hardware. For a given level of strength, hardware power and real time per move will be inversely proportional.

Yes. I assumed that playing at "2k" would use a basically fixed amount of compute and just take as long as it needed to do that much work. On my computer, it plays essentially instantaneously.

It just occurred to me that you might have misunderstood what I meant by "I'm curious how its strength setting is calibrated." I meant that I am curious how it has been determined that a particular setting should be called 2k rather than 4k or 1d, not curious about the mechanism by which its strength is regulated (though that is interesting too, of course).

Kap · Post by **Kap** » Wed Jun 08, 2016 7:33 am

dfan wrote: Yes. I assumed that playing at "2k" would use a basically fixed amount of compute and just take as long as it needed to do that much work. On my computer, it plays essentially instantaneously.

It just occurred to me that you might have misunderstood what I meant by "I'm curious how its strength setting is calibrated." I meant that I am curious how it has been determined that a particular setting should be called 2k rather than 4k or 1d, not curious about the mechanism by which its strength is regulated (though that is interesting too, of course).

Relevant posts by Rémi:

Rémi wrote:
Kirby wrote:How are weaker levels calibrated? Is it purely a function of time spent per move?
No. Weaker levels play instantly.

All the levels up to 1k are pure neural networks. They play almost instantly. I collected game records of weak players from KGS, and produced neural networks that imitate them. So the weak levels are considerably more human-like than before. Strength was calibrated by playing games on KGS.

As you can see on the attached sgf, it produces a very human-like way of being weak. I am really happy with the result.

It is interesting that the 13k neural network was produced by learning from game records of 15k players. I suppose it may have become stronger because it plays instantly, and because it can read ladders better.

Rémi wrote:
Kirby wrote:Thanks, Rémi. That's very fascinating. I think making the weaker levels "naturally weak" is a great idea.

How do levels between 1d and 7d work?
1d and beyond is MCTS as usual, except that the neural network makes it a lot stronger.

Rémi wrote:
Marcel Grünauer wrote:This sounds very exciting!

On faster hardware, does "Crazy Stone Deep Learning" play better or just faster?

I'm asking because the iPad version has been said to have the same strength on all models.
The top levels are in fact defined as time per move, so they will be stronger with more powerful hardware.

from http://www.lifein19x19.com/forum/viewto ... 18&t=12342.

- Kap

dfan · Post by **dfan** » Wed Jun 08, 2016 8:00 am

Interesting, thank you! I am surprised that it is doing no MCTS at all, though I know that the neural-network-only version of AlphaGo was already dan level, so it makes sense. It does perhaps explain a little more how it missed some tactical tricks.

I am pleased to see that the strengths were calibrated by playing games on KGS. One possible reason that it seems to be stronger than the level of player it is imitating (at least in Rémi's example of the 13k setting being trained on 15k games) is a sort of "mob intelligence" principle; a thousand 15k players voting on moves will probably play better than a single 15k player playing by himself, and the training of a neural network effectively leads it to imitate a ton of players voting.

I look forward to seeing how it plays when I get to the 1d setting. I wonder whether its style will abruptly shift.

Bill Spight · Post by **Bill Spight** » Wed Jun 08, 2016 9:52 am

dfan wrote:ne possible reason that it seems to be stronger than the level of player it is imitating (at least in Rémi's example of the 13k setting being trained on 15k games) is a sort of "mob intelligence" principle; a thousand 15k players voting on moves will probably play better than a single 15k player playing by himself, and the training of a neural network effectively leads it to imitate a ton of players voting.

Good point!

CnP · Post by **CnP** » Thu Jun 09, 2016 1:51 am

dfan wrote: I look forward to seeing how it plays when I get to the 1d setting. I wonder whether its style will abruptly shift.

Just a minor example, here's an even game I played against CS on 3k. I'm black. White's move 16 seems dodgy and running the full strength analysis mode suggests P3 as a better move.. however I sort of lost all respect for CS 3k when it kept pushing from below with the sequence 22 - 34 instead of capturing the black stones earlier. After that I got sloppy, played some stupid moves in addition to my usual stupid moves

but ended up winning by 84.5. CS 1D doesn't seem to play like that (and the analysis mode would have captured the stones) so that's what I'm running it with (no point in reinforcing bad play).

Bill Spight · Post by **Bill Spight** » Thu Jun 09, 2016 2:42 am

CnP wrote:
dfan wrote:I sort of lost all respect for CS 3k when it kept pushing from below with the sequence 22 - 34 instead of capturing the black stones earlier.

BTW,

at J-03 captures the Black stones in a loose ladder.

Or is that what you are alluding to?

bloosqr · Post by **bloosqr** » Thu Jun 09, 2016 7:43 pm

Out of curiosity.. I have been playing deep learning / crazy stone on my computer and while I do enjoy playing this on a proper screen rather than an ipad/iphone I have to admit I miss the medals / ranking - progress that the ipad / phone apps have to get a sense of accomplishment and improvement (the many faces of go app also had a sense of ones own rank as one played games as well). The deep learning app feels quite raw in the sense that all one seems to be able to do is play the game w/ certain settings. Is there any way of getting a sense of where ones own skill level is etc compared to the software? I just miss having some sense of ones own improvement to be honest while playing against the app itself.

CnP · Post by **CnP** » Fri Jun 10, 2016 5:21 am

Bill Spight wrote:
CnP wrote:
dfan wrote:I sort of lost all respect for CS 3k when it kept pushing from below with the sequence 22 - 34 instead of capturing the black stones earlier.
BTW, at J-03 captures the Black stones in a loose ladder.

Or is that what you are alluding to?

yes (I think) - I was expecting w26 @ J-03, I didn't really expect those black stones to live but white kept pushing and I wanted to see what it would do.

Krama · Post by **Krama** » Fri Jun 10, 2016 2:13 pm

How good is the engine against long semai battles?

If I recall the 2013 version couldn't read semais so if both groups had let's say 15 and 16 liberties it would play wrongly. The estimation would show that CS is winning 99.9% but then until endgame when I would start filling those liberties the estimation would jump from 99% won to 99% lost.

d7urban · Post by **d7urban** » Sun Jun 12, 2016 1:22 am

Krama wrote:How good is the engine against long semai battles?

If I recall the 2013 version couldn't read semais so if both groups had let's say 15 and 16 liberties it would play wrongly. The estimation would show that CS is winning 99.9% but then until endgame when I would start filling those liberties the estimation would jump from 99% won to 99% lost.

Seemed to misjudge my semeai at around 7 liberties or more. From move 209+ Really weird.

(SGF created from an 8 hour CSDL analysis using Amtiskaw's script from post ¤20. Triangles are preferred move)

6d-9s-g1.oxps_analysis.sgf: (14.43 KiB) Downloaded 613 times

Krama · Post by **Krama** » Sun Jun 12, 2016 2:04 am

d7urban wrote:
Krama wrote:How good is the engine against long semai battles?

If I recall the 2013 version couldn't read semais so if both groups had let's say 15 and 16 liberties it would play wrongly. The estimation would show that CS is winning 99.9% but then until endgame when I would start filling those liberties the estimation would jump from 99% won to 99% lost.
Seemed to misjudge my semeai at around 7 liberties or more. From move 209+ Really weird.

(SGF created from an 8 hour CSDL analysis using Amtiskaw's script from post ¤20. Triangles are preferred move)
6d-9s-g1.oxps_analysis.sgf

A thread on reddit says 6d version can't even read out ladders.

I guess if you play correctly CS can play 7 dan level, but if you try out semai battles or ladder variations it becomes a 15 kyu.

Life In 19x19

Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions

Re: Crazy Stone Deep Learning first impressions