Good territory scoring rules for training computers?

lightvector · Post by **lightvector** » Wed Dec 13, 2017 1:24 pm

As far as I know, all major attempts to train neural nets for Go have done so in area-scoring rules. For example, AlphaGo used a variation on Tromp-Taylor rules. The critical factor is that area-scoring rules allow one to play out to capture all dead stones and fully resolve the board position without affecting the final result. This makes it easy to completely automate the end-of-game to determine all status of all groups and territories for computer training. But of course, this causes problems when applying the neural net to play with territory scoring rules. For example Deep Zen I seem to recall had problems multiple times in close games in Japanese rules due to this, since its value net was only trained under area scoring rules, causing Deep Zen to screw up in what would have been an 0.5-margin territory game. On Deepmind's side, they completely ignored the problem by just never trying to use AlphaGo for any territory scoring rules.

Are there good rules that still allow playout to resolve all statuses on the board but where correct play prior to that cleanup and the final result match what correct play and the final result would be under most territory scoring rules?

Something like Tromp-Taylor rules except with a button ("Button Go") seem promising. Except that there are still simple or common cases where correct play in Button Go diverges from what it would be in true territory scoring rules, often involving a final ko, right? This seems to make it not ideal for this purpose, since presumably that divergence would still cause problems.

How about instead of a button, one used Tromp-Taylor rules with the following modifications?
* Every time a player makes a move on the board, that player also loses 1 point (passes do not lose points).
* After two consecutive passes, the game does not end but enters a cleanup phase, where moves no longer cause players to lose points.
* After two consecutive passes in the cleanup phase, the game ends immediately. A player's final score is the Tromp-Taylor score minus all points lost by making moves, plus any komi set for that player.
* (optionally also disallow suicide)

Are there any simple pathologies with these rules that could make correct play or the final result differ significantly from what it would be under most territory-scoring rules?

Bill Spight · Post by **Bill Spight** » Wed Dec 13, 2017 4:39 pm

lightvector wrote:As far as I know, all major attempts to train neural nets for Go have done so in area-scoring rules. For example, AlphaGo used a variation on Tromp-Taylor rules. The critical factor is that area-scoring rules allow one to play out to capture all dead stones and fully resolve the board position without affecting the final result. This makes it easy to completely automate the end-of-game to determine all status of all groups and territories for computer training. But of course, this causes problems when applying the neural net to play with territory scoring rules. For example Deep Zen I seem to recall had problems multiple times in close games in Japanese rules due to this, since its value net was only trained under area scoring rules, causing Deep Zen to screw up in what would have been an 0.5-margin territory game. On Deepmind's side, they completely ignored the problem by just never trying to use AlphaGo for any territory scoring rules.

I have not looked anything up, but as I recall, the problem for Zen was the 6.5 komi instead of the 7.5 komi. Even though it used the 6.5 komi to score its rollouts. (I hope that I misunderstood.)

Are there good rules that still allow playout to resolve all statuses on the board but where correct play prior to that cleanup and the final result match what correct play and the final result would be under most territory scoring rules?

Back in the '90s I anticipated this problem, and proposed some rules on a mailing list. They did not catch on.

Something like Tromp-Taylor rules except with a button ("Button Go") seem promising. Except that there are still simple or common cases where correct play in Button Go diverges from what it would be in true territory scoring rules, often involving a final ko, right? This seems to make it not ideal for this purpose, since presumably that divergence would still cause problems.

Yeah, button go is a hybrid of area and territory scoring.

How about instead of a button, one used Tromp-Taylor rules with the following modifications?
* Every time a player makes a move on the board, that player also loses 1 point (passes do not lose points).

If a board play loses 1 point you have chilled go, which is fine in theory, but is worse as regards ko than territory scoring.

Edit: Oh, I see what you are doing. You are chilling area scoring to get territory scoring.

I may still have my suggestion around somewhere, but here is the thing. For "true" territory scoring you want ko fights to end at territory zero. For that to happen you want, not a button that lose ½ pt., but a large number of buttons that do not affect the score; i.e., virtual dame. Like dame they should lift ko bans, so that kos are resolved at temperature zero. That still leaves questions like scoring Bent Four in the Corner and Three Points without Capturing, but these can be resolved in an encore such as Lasker-Maas or Spight rules have. That is not exactly like Japanese or Korean scoring, since you may be able to score some points in seki or have ko fight at temperature -1. But you should be able to program such rules fairly easily.

And they would be "true" territory rules.

RobertJasiek · Post by **RobertJasiek** » Thu Dec 14, 2017 1:31 am

Simply speaking, good territory scoring rules do not and cannot exist because regular and playout phases need different rules and pass stones as an attempt to have the same rules fail leading to pass-fights. Button go changes go so far for territory scoring to be rejected by proponents of territory scoring. The Simplified Japanese Rules are the IMO best candidate but they do need different rules for the phases and hardcore proponents would still object due to remaining changes. They want at least 99.99% of the exceptional cases to behave as in the illogical official either Japanese or Korean rules (whose behaviours differ) and this conflict with the need of programs for logical rules availavable for implementation within reasonable time and reasonably low computational complexity of mere application of the rules can never be solved. From a rules POV, things are solved as approximations above. However, it is a political question whether to force programs to lose because their programmers do not waste very much time on implementing an approximation to illogical rules, about as much time as needed for creating a atrong program during the regular phase.

Bill Spight · Post by **Bill Spight** » Thu Dec 14, 2017 11:36 am

As we know, it is not easy to program Japanese and Korean rules. The Japanese '89 rules are an attempt at a rationalized rule set, but they seem ambiguous to me. I have not seen the latest Korean rules, but the earlier rules that I saw seemed to me to require judgement to implement. Humans can resolve ambiguity and exercise judgement, but computer programs, neural nets aside, are more like the Good Soldier Schweik. Clarity and precision are important. Simplicity is also important, if you are going to play thousands of games per second.

lightvector wrote: Are there good rules that still allow playout to resolve all statuses on the board but where correct play prior to that cleanup and the final result match what correct play and the final result would be under most territory scoring rules?

Since the main territory rules are Japanese and Korean, my guess is no.

However, there are other territory rules without the special cases of Japanese and Korean rules. (BTW, by referring to special cases I do not mean to criticize those rules, just to say that they are not the most general or simple territory rules.

)

Something like Tromp-Taylor rules except with a button ("Button Go") seem promising. Except that there are still simple or common cases where correct play in Button Go diverges from what it would be in true territory scoring rules, often involving a final ko, right? This seems to make it not ideal for this purpose, since presumably that divergence would still cause problems.

I agree. Under "true" territory rules kos should be resolved at temperature 0. With Button Go they may be resolved at lower temperatures.

As I said earlier, to have kos resolved at temperature zero you need some temperature zero plays if and when you run out of dame. I called these plays virtual dame. Like actual dame, they must lift any ko and superko bans. Trump-Taylor rules are area rules, and so simply providing virtual dame is not enough, because you also have to make it so that it does not matter who gets the last dame. The ½ pt. button solves that problem.

Both the virtual dame and the button may be implemented as passes. Below is how that might be done with area counting.

With area counting the button gains ½ pt., and the virtual dame each gain 1 pt. There are two phases to the game; in phase one the players play regular territory go; in phase two they play regular area go to eliminate any dead stones, and they perhaps take one way dame. Playing area go to eliminate dead stones is consistent with territory scoring. Doing so will not alter the territory score. Taking one way dame will affect the territory score, but there you go.

Taking the button separates the two phases.

What does it mean to take the button? Since early passes, as virtual dame, lift ko bans, how do you end phase one? One way would be with three consecutive passes, as the first pass lifts any ko ban, the second and third passes show that neither player "wants" to play in a ko or superko, or make any other board play, so three consecutive passes could end phase one. However, one player might play something like Sending Two Returning One every time the opponent passed, so that there is no sequence of consecutive passes. Hence my rule: Phase one ends when the same player passes a second time in the same board position. This phase ending pass is equivalent to the button, and so gains only ½ pt. After this, the second phase continues with passes gaining nothing. My preference is for passes to lift ko and superko bans, and to end this phase the same way, but most people seem to prefer ending play with two consecutive passes. The difference matters only in very rare cases. You can also count no territory in seki, to approach Japanese and Korean rules.

Edit: It may be possible, with these rules, for the players to collaborate at produce a never ending game. You can alter the rules to take care of that, but for training purposes why bother?

RobertJasiek · Post by **RobertJasiek** » Thu Dec 14, 2017 11:49 am

I have a copy of Korean Rules a few years old. They are still very illogical.

moha · Post by **moha** » Thu Dec 14, 2017 4:17 pm

lightvector wrote:* Every time a player makes a move on the board, that player also loses 1 point (passes do not lose points).

Nice and interesting idea, kind of reverse AGA. Besides minor things like onesided dame it may actually get close to what you want in theory. Could it ever be advantageous to play first in the 2nd phase?

OTOH, in practice I wonder if this would appeal to bot authors. The two phases need different strategies, so either two NNs or at least an extra phase bit or feature plane (and even worse, some play/prisoner count). And that identical board positions need different policy distributions (or value estimates) may even reduce bot strength. But who knows, Zen authors may still prefer this to their hacks.

For the same purpose, real Japanese-style rules would obviously be much less practical but theoretically may still be possible. For example use two phases, and define the score as the territory score of the board position after the first two passes, with dead stones defined as strings with all stones on what is the opponent's pass-alive area after the second two passes. (So the bot would play a mandatory cleanup/dispute phase internally.)

RobertJasiek · Post by **RobertJasiek** » Thu Dec 14, 2017 9:11 pm

moha wrote:
lightvector wrote:* Every time a player makes a move on the board, that player also loses 1 point (passes do not lose points).
Nice and interesting idea, kind of reverse AGA. Besides minor things like onesided dame it may actually get close to what you want in theory.

I think such a rule (without also using the area scoring defining rule to let White make the last move) leads to pass-fights. If so, it is NOT reverse AGA and is NOT what one wants and is NOT just minor things being different.

real Japanese-style rules would obviously be much less practical but theoretically may still be possible.

"Real" Japanese-style rules are computationally arbitrarily complex because of demanding perfect playout play. Sampling approximations are not good enough. Programs need (mathematical) proof play or complete checking of all variations to verify statuses. There are my Japanese 2003 Rules, which can be worked out to an algorithm, so theoretically possible - yes. Computationally possible for the general position? No. Proof play (not to mention complete checking of all variations) is too complex in most positions. Note that ANY position can be a scoring position and it must be possible in practice to score it WITHOUT APPROXIMATION, because that is what "real" Japanese-style rules demand.

pookpooi · Post by **pookpooi** » Thu Dec 14, 2017 9:27 pm

It might be easier to train AI to be super good at playing White with 0.5 komi in Chinese rule and use that even game Japanese rule when playing White, and train playing black at 13.5 komi.

Bill Spight · Post by **Bill Spight** » Fri Dec 15, 2017 1:50 am

I thought it might be interesting to play out the following position according to the rules I have suggested here. It is moha's example against button go in this post: viewtopic.php?p=224689#p224689

Click Here To Show Diagram Code: [go]$$cm37 Straightforward play $$ - - - - - - - - - - $$ | 4 O 2 . . . . . . | $$ | W X O O O O O . . | $$ | 3 1 X X X X O . . | $$ | X X X . X O O . . | $$ | . . . X X O . O . | $$ | . . . X O O . . . | $$ | . . . . X O O O O | $$ | . . . X . X X X X | $$ | . . . . . . . . . | $$ - - - - - - - - - -[/go]

,

= pass

is the button pass. There is no need for an encore, as there are no dead stones to capture. Further plays or passes would not alter the score.

White gets 37 pts. on the board plus 1 pt. for the pass,

, plus 7 pts. for komi, for a total of 45 pts.
Black gets 44 pts. on the board plus 1 pt. for the pass,

, plus ½ pt. for the "button" pass,

, for a total of 45½ pts. Black wins by ½ pt.

Under Japanese and Korean rules with 6½ komi Black gets 24 pts. and White gets 23½ pts. Black wins by ½ pt.

Now let's look at possible play when White prolongs the ko fight.

Click Here To Show Diagram Code: [go]$$cm37 Ko fight $$ - - - - - - - - - - $$ | 5 O 4 . . . . . . | $$ | W X O O O O O . . | $$ | 3 1 X X X X O . . | $$ | X X X . X O O . . | $$ | . . . X X O . O . | $$ | . . . X O O . . . | $$ | . . 7 6 X O O O O | $$ | . . . X . X X X X | $$ | . . . . . . . . . | $$ - - - - - - - - - -[/go]

= pass,

= ko,

,

= pass

Click Here To Show Diagram Code: [go]$$cm47 Ko fight $$ - - - - - - - - - - $$ | 1 O O . . . . . . | $$ | O X O O O O O . . | $$ | X X X X X X O . . | $$ | X X X . X O O . . | $$ | . . . X X O . O . | $$ | . . . X O O . . . | $$ | . . X . X O O O O | $$ | . . . X 2 X X X X | $$ | . . . . 3 . . . . | $$ - - - - - - - - - -[/go]

= ko,

= pass,

= ko,

,

= pass

White gets 37 pts. on the board plus 3 pts. for passes, plus 7 pts. komi, for a total of 47 pts.
Black gets 44 pts. on the board plus 3½ pts. for passes, for a total of 47½ pts. The ko is resolved at temperature zero by territory scoring (temperature one by area scoring), and then Black takes the button for ½ pt.

lightvector · Post by **lightvector** » Fri Dec 15, 2017 7:21 am

RobertJasiek wrote:
moha wrote:
lightvector wrote:* Every time a player makes a move on the board, that player also loses 1 point (passes do not lose points).
Nice and interesting idea, kind of reverse AGA. Besides minor things like onesided dame it may actually get close to what you want in theory.
I think such a rule (without also using the area scoring defining rule to let White make the last move) leads to pass-fights. If so, it is NOT reverse AGA and is NOT what one wants and is NOT just minor things being different.

I'm curious now what you had in mind. Robert, could you give an example of a such a pass fight using these rules?

moha wrote: OTOH, in practice I wonder if this would appeal to bot authors. The two phases need different strategies, so either two NNs or at least an extra phase bit or feature plane (and even worse, some play/prisoner count). And that identical board positions need different policy distributions (or value estimates) may even reduce bot strength. But who knows, Zen authors may still prefer this to their hacks.

Personally, I'd want my value net to be adaptable to a reasonable range of komi, and would randomly pick various komi in a certain range when generating training data (such as via self-play) if I were trying to make a strong Go bot. The komi would need to be an input to the neural net, so it's no trouble to also merge the play/prisoner count difference into that. But yeah, it does make things a bit more complex.

RobertJasiek · Post by **RobertJasiek** » Fri Dec 15, 2017 7:38 am

I am too busy for checking pass-fight examples but you can check the standard examples on Sensei's.

Bill Spight · Post by **Bill Spight** » Fri Dec 15, 2017 9:13 am

lightvector wrote: Personally, I'd want my value net to be adaptable to a reasonable range of komi, and would randomly pick various komi in a certain range when generating training data (such as via self-play) if I were trying to make a strong Go bot.

Yes, and some handicap games, too, against earlier, weaker versions of itself.

Also, I came up with my "same player passes twice in the same position" rule in the '90s for straight area scoring. With the rules I propose here, Sending Two Returning One produces the same area position, but the subsequent pass gains one point, so the three consecutive pass rule to end phase one works.

Phase two can follow Tromp-Taylor rules.

Edit: It has been awhile. Sending Two Returning One is not the only way one player might defeat the three pass rule. So pass by the same player in the same position seems to be the way to go.

moha · Post by **moha** » Sat Dec 16, 2017 10:05 am

If this is just about rules for bot training (in light of Zen's problems with komi hacks), what would be an acceptable minimum? Would eliminating the differences with dame parity AND endgame ko temperature enough for a bot to be safe? (onesided dame aside)

For example, it may be possible to trick a bot playing under such emulation in close games (if the game is actually Japanese). By getting it play all dame in 2nd phase, it can be made think it now wins by several points. Then it may answer some tricky threat in its territory in point losing way (a sure win is better than a big win, right?

)

lightvector wrote:
moha wrote:And that identical board positions need different policy distributions (or value estimates) may even reduce bot strength. But who knows, Zen authors may still prefer this to their hacks.
Personally, I'd want my value net to be adaptable to a reasonable range of komi, and would randomly pick various komi in a certain range when generating training data (such as via self-play) if I were trying to make a strong Go bot. The komi would need to be an input to the neural net, so it's no trouble to also merge the play/prisoner count difference into that.

Beyond komi, the correct moves (policy net) are different in the two phases / temperatures, this would be the bigger hurdle I think.

Btw this also has some connection to my earlier idea of button variant ("less stones played (before first two passes) win ties, B wins if still tie"). For ties (no adjustment of bigger differences), rewarding less stones played is like penalizing more stones played.

I also think this goes past bots. Since Japanese-style rules with a single phase CAN easily be used for most human games (with nice properties like no dame fill), two-phase rules that leave this first phase intact would be of real value (most players will be unaware of extra rules anyway, so they are best only applied to additional phases / problem cases). For example, could something (like either of your or Bill's idea) be worked out in a way that it would only apply IF either player resumes the game after the first double pass (normal Japanese scoring on agreement otherwise), AND correct play at the end of the 1st phase remains unchanged?

RobertJasiek wrote:
moha wrote:real Japanese-style rules would obviously be much less practical but theoretically may still be possible.
"Real" Japanese-style rules are computationally arbitrarily complex because of demanding perfect playout play.

I think they also work reasonably well with using the players' playouts. Like my idea above: "use two phases, and define the score as the territory score of the board position after the first two passes, with dead stones defined as strings with all stones on what is the opponent's pass-alive area after the second two passes". Not perfect OC, but the bigger problem is the same dual strategy issue as above.

Bill Spight · Post by **Bill Spight** » Sat Dec 16, 2017 12:57 pm

moha wrote:If this is just about rules for bot training (in light of Zen's problems with komi hacks), what would be an acceptable minimum? Would eliminating the differences with dame parity AND endgame ko temperature enough for a bot to be safe? (onesided dame aside)

I suspect that for bot training, button go would be good enough. The questions about ko, one way dame, etc., occur infrequently enough that I suspect that they would have little effect on training. OC, that is an empirical question.

For example, it may be possible to trick a bot playing under such emulation in close games (if the game is actually Japanese). By getting it play all dame in 2nd phase, it can be made think it now wins by several points. Then it may answer some tricky threat in its territory in point losing way (a sure win is better than a big win, right? )

Playing neutral dame in the second phase is an error, as a rule. But, as we know, bots make larger endgame errors. So, sure, a bot could be fooled. Since nobody knows how to eliminate the larger errors, I don't think that minor changes to the rules would make much difference.

moha wrote:For ties (no adjustment of bigger differences), rewarding less stones played is like penalizing more stones played.

Good point.

I also think this goes past bots. Since Japanese-style rules with a single phase CAN easily be used for most human games (with nice properties like no dame fill), two-phase rules that leave this first phase intact would be of real value (most players will be unaware of extra rules anyway, so they are best only applied to additional phases / problem cases). For example, could something (like either of your or Bill's idea) be worked out in a way that it would only apply IF either player resumes the game after the first double pass (normal Japanese scoring on agreement otherwise), AND correct play at the end of the 1st phase remains unchanged?

Back in the '90s, Lasker-Maas rules, Berlekamp's rules, and my rules, all of which have an encore (second phase, possibly optional) were devised to be played by humans. Back in the '70s I also wrote some rules to be used by humans. In the '60s Ikeda devised a number of territory rule sets with encores, also for human use.

moha · Post by **moha** » Sat Dec 16, 2017 2:25 pm

Bill Spight wrote:
For example, it may be possible to trick a bot playing under such emulation in close games (if the game is actually Japanese). By getting it play all dame in 2nd phase, it can be made think it now wins by several points. Then it may answer some tricky threat in its territory in point losing way (a sure win is better than a big win, right? )
Playing neutral dame in the second phase is an error, as a rule. But, as we know, bots make larger endgame errors.

The bot may pass on odd dame in the 1st phase, which can be seen as correct since it has higher winrate (allows an opponent to err by passing as well). Then playing dame in 2nd is also correct.

I also think this goes past bots. Since Japanese-style rules with a single phase CAN easily be used for most human games (with nice properties like no dame fill), two-phase rules that leave this first phase intact would be of real value (most players will be unaware of extra rules anyway, so they are best only applied to additional phases / problem cases). For example, could something (like either of your or Bill's idea) be worked out in a way that it would only apply IF either player resumes the game after the first double pass (normal Japanese scoring on agreement otherwise), AND correct play at the end of the 1st phase remains unchanged?
Back in the '90s, Lasker-Maas rules, Berlekamp's rules, and my rules, all of which have an encore (second phase, possibly optional) were devised to be played by humans. Back in the '70s I also wrote some rules to be used by humans. In the '60s Ikeda devised a number of territory rule sets with encores, also for human use.

Sure, but I wonder if any of these fit? Most of them change correct play (dame needs to be played, some even have further artifacts), and yours change the 1st phase (doesn't stop on two passes).

Actually I'm not sure if it's even possible to met those conditions.

Life In 19x19

Good territory scoring rules for training computers?

Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?

Re: Good territory scoring rules for training computers?