KataGo self-play learning game samples

lightvector · Post by **lightvector** » Tue Aug 06, 2019 8:57 pm

Bill Spight wrote: One thing that struck me in the write-up by the Elf team is that they pointed out that Elf started off by getting good at the endgame. That makes sense, because with self play it did not have any opponent who played the opening or middle game well to imitate, unlike humans.

Motivated by this quote from Bill, here are some sample games for KataGo from the early learning stages, if you're interested in getting a sense of what order it learned things in.

Gating games
These are games played between different versions during gating matches, where a new neural net version plays the older one for the right to be the new official net. You'll notice the newer net in each match is a heavy favorite to win, since early on each new net has learned much more than the previous one.

4K games: http://eidogo.com/#wCgSU471
8K games: http://eidogo.com/#C2vYR7HK
12K games: http://eidogo.com/#2aN6xVYCI
20K games: http://eidogo.com/#AfsltiIt
30K games: http://eidogo.com/#3mbo5xdZl
40K games: http://eidogo.com/#4eYtPBeyV
50K games: http://eidogo.com/#wXBdZszo
60K games: http://eidogo.com/#xTulGX7K
80K games: http://eidogo.com/#BQfsuXEe
100K games: http://eidogo.com/#3IOTU4Ngh
130K games: http://eidogo.com/#CgBCniU7
160K games: http://eidogo.com/#vkVp9pMB
200K games: http://eidogo.com/#1afrLiFPX
280K games: http://eidogo.com/#27KfN2avC

Self-play games
These are games played in self-play. Self-play is significantly randomized than the gating games, including often playing a bunch of much more random moves (based on what the net thinks is still okay, even if it wouldn't be the top choice) at the start of the game to vary the opening. And occasionally including a total garbage move just to add variety to the kinds of positions. Also, in self-play games, resignation is not enabled, the game goes all the way to the end.

Random: http://eidogo.com/#rjDJTWIZ
10K games: http://eidogo.com/#3EAlt99K
20K games: http://eidogo.com/#1HyivmRu
35K games: http://eidogo.com/#21hkQ1M64
50K games: http://eidogo.com/#4g7svdUMZ
65K games: http://eidogo.com/#CEBY9FR3
80K games: http://eidogo.com/#2ZBZM3jr
95K games: http://eidogo.com/#ABs9otk8
115K games: http://eidogo.com/#2qEyEmOeF
135K games: http://eidogo.com/#12AkPD1XB
160K games: http://eidogo.com/#Bhg5EOy2
200K games: http://eidogo.com/#3pJnZFRHv
280K games: http://eidogo.com/#18ouhNaJz

Notes:
"XK games" means that very approximately the game was sampled around the time of X many real self-play games of learning following an initial seed of some tens of thousands of basically pure-random games. Also, only about 40% of X is on 19x19 boards, the rest were shorter games played on smaller boards from 9x9 to 18x18.

This was with a 6-block network, before KataGo switched to larger sizes. At 280K games, the network is probably overall mid-amateur dan or so. Probably strong-amateur dan in certain kinds of whole-board judgement, and much weaker in life and death and probably outright blind to various kinds of capturing races and large-scale fights, because 6 blocks is too few layers for the neural net to properly "perceive" large groups and dragons.

In the these early stages, latency matters a lot in the training process. The number of games is quite a bit larger than is actually "needed" to train up to that strength of play. This is because you're to some degree limited by the turnaround time for new neural net -> play games -> train on games -> make new neural net more than the actual number of new games, but your GPUs are going to generate some thousands of games in parallel anyways, so you might as well have there be thousands of games each loop even if you didn't need them all. Later in training, as you go to bigger net sizes and learning slows down, then you're more game-data constrained than latency-constrained and you will actually need all those thousands of games to make progress.

Edit: fixed a few corrupted sgfs

Tryss · Post by **Tryss** » Wed Aug 07, 2019 5:22 am

Interesting, it started playing in the corners before the 30k games mark (gating matches)

lightvector · Post by **lightvector** » Wed Aug 07, 2019 4:56 pm

So, do self-play bots first learn the endgame?

It seems to me at least for KataGo (which admittedly is a bit different than other zeroish bots) the very earliest progress is primarily determined by being better at crazy all-out contact fights, and by learning basic instinct and more efficient shape. For example, it feels to me like in the 20K games gating match the newer net (white) has figured out that it's good to let black squeeze the toothpaste and the older net (black) hasn't really figured out that this is bad. Some flavor of an opening emerges soon after that point.

Anyone else have any observations or thoughts?

mhlepore · Post by **mhlepore** » Wed Aug 07, 2019 5:35 pm

Super interesting games - thank you for posting them. My thoughts:

1) Watching the bot improve is strange in relation to watching a human improve. For example, my wife plays Go (she's maybe 7kyu), and she squeezes the toothpaste all the time, yet she's pretty good at life and death. Which seems to be the opposite of this bot, who knows squeezing the toothpaste is bad at 20k games, but still plays many extra moves to kill an obviously dead corner at 30k games.

2) I wonder how board size would impact the order in which the bot learns the various tactics/strategies.

Bill Spight · Post by **Bill Spight** » Wed Aug 07, 2019 8:03 pm

lightvector wrote:So, do self-play bots first learn the endgame?

Thanks to your game records, my own experience, and the fact that, as far as I can tell, the bots are worst at the endgame, among the phases of the game, I doubt it. OC, if the Elf team's claim is from a pro, then OK.

Question: Is the early play in the corner in the 4K game an artifact? For instance, of starting on smaller boards?

In one of my experiments with go learning programs I did not give the programs the ability to query the board, and ran an evolution program while I went to a movie. After 100 generations the programs had learned to make diagonal rows of one space eyes. That was an artifact caused by the fact that if a play was made inside an opponent's one space eye, it became a pass.

Curiously, when I gave the programs the ability to query any point on the board, they never used that function.

It seems to me at least for KataGo (which admittedly is a bit different than other zeroish bots) the very earliest progress is primarily determined by being better at crazy all-out contact fights, and by learning basic instinct and more efficient shape.

Interesting.

That may explain making long strings at one point, which is a good defense against capture -- if you haven't learned to make one space eyes.

Later on, programs placed stones in proximity instead of simply connecting them. Such groups could connect or capture stones if threatened. The ability to connect is certainly important in the endgame, but is not particularly an endgame skill.

I got the impression that KataGo likes to make strong groups. Does that translate into a thick style of play at the highest levels?

Life In 19x19

KataGo self-play learning game samples

KataGo self-play learning game samples

Re: KataGo self-play learning game samples

Re: KataGo self-play learning game samples

Re: KataGo self-play learning game samples

Re: KataGo self-play learning game samples