I find the given summary of results a little tricky hard to read because different rows switch which side the value is quoted for. So here's one where all values are
white's perspective:
Code:
Japanese rules 6.5 komi Seki: -0,1 lead 47.5% winrate corner variation: 1,0 lead 60,2% winrate
Chinese rules 7.5 komi Seki: 1,0 lead 57,2% winrate corner variation: 1,3 lead 64,4% winrate
Chinese rules 6.5 komi Seki: 1,2 lead 54,1% winrate corner variation: 0,4 lead 49,7% winrate
Chinese rules 5.5 komi Seki: -0,9 lead 43,3% winrate corner variation: -0,5 lead 50,1% winrate
Firstly, you should be aware that KataGo's judgments on complex sekis have a lot more noise/bias than usual. The problem is that there are a lot of weird seki shapes, and they're not all that common, so you don't get a lot of data. In the case of sekis, KataGo's score estimation may be "off" by easily one third to one half of a point, and sometimes by around a point. The winrate judgment can also be off by anywhere from a few percent to rarely as much as 10%.
I think KataGo might be mildly worse in certain sekis than other bots due to being faced with a much harder problem - understanding in general how sekis interact with variable komi and variable board sizes and variable rules, instead of being able to specialize to just 7.5 komi 19x19 Tromp-Taylor. As has become apparent here, seki can interact with the parity of the board in a confusing way and the rules in a nontrivial way, and the neural net finds this hard too! And sekis are specifically a point of major difference between Japanese and Chinese or Tromp-Taylor rules. However, it's a one-time bias for any board position with an unusual seki - you should be able to continue to "trust" (or not) KataGo's evaluations for everything else on the board subsequently as you would normally, since the seki position will be shared in all the variations elsewhere. And also the error will probably diminish as you get close to the end of the game.
Okay having said all that, let's talk about the evaluations in this specific position.
Corner variationThe corner variation evaluations look good and consistent to me:
Code:
Japanese rules 6.5 komi 1,0 lead 60,2% winrate (ASSUME we take this as baseline, then...)
Chinese rules 7.5 komi 1,3 lead 64,4% winrate (vs baseline, we expect: 1,5 lead, noticeably > 60% winrate)
Chinese rules 6.5 komi 0,4 lead 49,7% winrate (vs baseline, we expect: 0,5 lead, about 50% winrate)
Chinese rules 5.5 komi -0,5 lead 50,1% winrate (vs baseline, we expect: -0,5 lead, about 50% winrate)
Assuming we take Japanese rules 6.5 komi as a baseline, then by standard parity arguments Chinese 7.5 komi should be about a half a point better for white, so we'd expect "1,5" lead instead, and a corresponding bump in winrate. We see "1,3" instead, and bump in winrate as expected. So a possible error of perhaps 0.2 points. And the last two results fall right in line with expectations. The score/lead should drop by 1 point each time, but the winrate should only drop during 7.5 -> 6.5. Winrate should basically not drop from 6.5 -> 5.5 because of parity. And since the baseline expected lead is symmetric ("0,5" vs "-0,5"), that means that actually 50% winrate is what we expect for both. Which is what we get!
So what we see is a variability in score estimation by about 0.1 or 0.2, and several tenths of a percent in winrate, but KataGo clearly understands the parity and is overall pretty consistent. And it's considering the different positions with their rules and komis entirely independently, without "knowing" that it's going to be compared against itself for consistency.
Seki variationCode:
Japanese rules 6.5 komi -0,1 lead 47.5% winrate
Chinese rules 7.5 komi 1,0 lead 57,2% winrate
Chinese rules 6.5 komi 1,2 lead 54,1% winrate
Chinese rules 5.5 komi -0,9 lead 43,3% winrate
For the score/lead estimates, this is obviously much worse. With the seki on the board, again we'd expect Chinese 7.5 komi to be 0.5 points better than Japanese due to parity, and a further 2 points better due to this seki for a total of 2.5 points. So if we take as a baseline of -0,1 for Japanese 6.5 komi, we'd naively expect "2,4" for Chinese 7.5 komi instead of "1,0".
However, if you look at the ownership map, KataGo guesses a substantial chance that seki collapses - somewhere from a 10% to a 20% chance (the net isn't precisely sure). This would either be due to a ko fight, or due to one of the bordering groups getting traded away later. Also, given the frequency with which bots force each other into difficult ko fights when the game is on the line, it's quite possible that black plays the point-losing threats in some of the games where the seki doesn't collapse, which would also eliminate the 2 points gap - the threats may be losing, but they're also decently large, making them useful in some kos.
If we blindly suppose that, say, 25% of the time of the time either the seki is eliminated or the gap is closed by point-losing threats, then we'd expect something more like "1,9" instead of "2,4" for Chinese 7.5 komi, and the remainder of the difference is KataGo's score judgment being off by a bit. I'd guess the Japanese rules case is responsible for part of the error in score judgment (i.e. properly
not counting the seki points). But if we simply assume the Japanese result is the true baseline, then we'd expect "1,9", "0,9" and "-0,1" as the three results, and actual observed by YakaGo was "1,0", "1,2", and "-0,9". So consistency errors less than 1 point, but close to it.
The winrate values actually look reasonable to me - they're probably overall biased a little in a way that's hard to know, but
relatively speaking they seem better than the lead estimates here. The total winrate drop from 7.5 komi to 5.5 komi is about 14%, and the total drop in the earlier corner variation case was also 14%, so that matches well. And KataGo seems to understand that the parity has changed! (this is really cool, I wasn't sure KataGo would get this right). Due to the odd dame in the seki, now it's 7.5 and 6.5 komi that are equivalent, not 6.5 and 5.5, so this time the large winrate drop should happen from 6.5 -> 5.5, instead of 7.5 -> 6.5, and that's what it shows.
There actually still is a small winrate drop from 7.5 -> 6.5, but I think this is also reasonable! Because whereas odd-dame seki are uncommon, once one does occur, as we discussed earlier the chance that it collapses is not trivial, so some of the time the game will in fact end up back on the original parity where 7.5 and 6.5 are different instead of the same.
Okay, that's my analysis.
