Good territory scoring rules for training computers?
Posted: Wed Dec 13, 2017 1:24 pm
As far as I know, all major attempts to train neural nets for Go have done so in area-scoring rules. For example, AlphaGo used a variation on Tromp-Taylor rules. The critical factor is that area-scoring rules allow one to play out to capture all dead stones and fully resolve the board position without affecting the final result. This makes it easy to completely automate the end-of-game to determine all status of all groups and territories for computer training. But of course, this causes problems when applying the neural net to play with territory scoring rules. For example Deep Zen I seem to recall had problems multiple times in close games in Japanese rules due to this, since its value net was only trained under area scoring rules, causing Deep Zen to screw up in what would have been an 0.5-margin territory game. On Deepmind's side, they completely ignored the problem by just never trying to use AlphaGo for any territory scoring rules.
Are there good rules that still allow playout to resolve all statuses on the board but where correct play prior to that cleanup and the final result match what correct play and the final result would be under most territory scoring rules?
Something like Tromp-Taylor rules except with a button ("Button Go") seem promising. Except that there are still simple or common cases where correct play in Button Go diverges from what it would be in true territory scoring rules, often involving a final ko, right? This seems to make it not ideal for this purpose, since presumably that divergence would still cause problems.
How about instead of a button, one used Tromp-Taylor rules with the following modifications?
* Every time a player makes a move on the board, that player also loses 1 point (passes do not lose points).
* After two consecutive passes, the game does not end but enters a cleanup phase, where moves no longer cause players to lose points.
* After two consecutive passes in the cleanup phase, the game ends immediately. A player's final score is the Tromp-Taylor score minus all points lost by making moves, plus any komi set for that player.
* (optionally also disallow suicide)
Are there any simple pathologies with these rules that could make correct play or the final result differ significantly from what it would be under most territory-scoring rules?
Are there good rules that still allow playout to resolve all statuses on the board but where correct play prior to that cleanup and the final result match what correct play and the final result would be under most territory scoring rules?
Something like Tromp-Taylor rules except with a button ("Button Go") seem promising. Except that there are still simple or common cases where correct play in Button Go diverges from what it would be in true territory scoring rules, often involving a final ko, right? This seems to make it not ideal for this purpose, since presumably that divergence would still cause problems.
How about instead of a button, one used Tromp-Taylor rules with the following modifications?
* Every time a player makes a move on the board, that player also loses 1 point (passes do not lose points).
* After two consecutive passes, the game does not end but enters a cleanup phase, where moves no longer cause players to lose points.
* After two consecutive passes in the cleanup phase, the game ends immediately. A player's final score is the Tromp-Taylor score minus all points lost by making moves, plus any komi set for that player.
* (optionally also disallow suicide)
Are there any simple pathologies with these rules that could make correct play or the final result differ significantly from what it would be under most territory-scoring rules?