Ars Technica article about go

Javaness2 · #1

It features our old friend KataGo and, in the given example, a refusal to mark your stones as dead

https://arstechnica.com/information-tec ... -amateurs/

Mike Novack · #2

I think this paper involves a misunderstanding about counting when a game is done (and possible disputes about whether stones ARE dead. Gleave et al don't seem to understand this.

lightvector · #3

The result is valid, but either the paper authors in their interview for that ArsTechnica article didn't communicate the following details, or the writer of the article itself misleadingly chose to omit those details:

* It's an exploit specific to tromp-taylor, i.e. strict computer rules that literally have no provisions for life/death agreement beyond "please capture everything otherwise it's automatically alive". It doesn't apply to any rules humans use.
* For now, it only applies to the raw net or low-playout searches. Decent chance that improves if they bother to optimize the exploit further, although unclear how much.

The above details make the result not really of practical interest to Go players, but given those details, the result is legitimate and of potential interest from a machine-learning-robustness perspective.

It's been interesting to see this kind of news making the rounds, and fun to watch what happens.

But I do feel a bit sad about the misunderstandings that have propagated. I posted some longer thoughts here: https://forums.online-go.com/t/potentia ... hexahedron

Mike Novack · #4

lightvector wrote:

The result is valid, but either the paper authors in their interview for that ArsTechnica article didn't communicate the following details, or the writer of the article itself misleadingly chose to omit those details:

* It's an exploit specific to tromp-taylor, i.e. strict computer rules that literally have no provisions for life/death agreement beyond "please capture everything otherwise it's automatically alive". It doesn't apply to any rules humans use.
* For now, it only applies to the raw net or low-playout searches. Decent chance that improves if they bother to optimize the exploit further, although unclear how much.

Ah, but has the AI been retrained to this rule variant?

Or, are they making a significant claim, that our current neural nets are unable to (by self play) to learn this rule for "who won". I can imagine a large number of rule variants that might be harder to learn (*) I suppose I will have to look at the paper.

(*) Think of the (human) party game "pass the scissors" where a couple of people in the circle know the rule (the ones who initiate the game). It usually takes quite a few passes around the circle before everybody has figured it out << it's whether their legs are cross or open, not whether the scissors are closed or open >>

CDavis7M · #5

It's hard to know whether it is the article that is misleading or the research that is misleading... Seeing as its from Ars Technica I'm guessing it's more the article.

...Ah my mistake, it's both. I read the summary of the research and it's misleading because it's based on the "misleading" Tromp-Taylor rules which someone thinks are "as simple and elegant as possible." The BGG ratings would not agree.

----------

In other news, I (black) just played KataGo (white) and it thinks it has a such huge advantage that it can pass. But actually I win because we were playing Gomoku.

kvasir · #6

Being curious I checked out the public peer reviews and I don't think the paper is really on track to be accepted for ICLR, then again it is early in the review process. Apparently the authors are eager and will update the paper with new experiments, promising more interesting results.

It is a bit odd how Ars Technica says the researchers "published a paper" when it is literally only submitted and is actually getting negative reviews. Anything (almost) could happen with the paper at this point. It could be rejected or if it is accepted it could be revised considerably first and occasionally there are very good papers that don't seem to be accepted anywhere until the arxiv preprint has been cited many many times.

Maybe it's best to check on it again only after the peer review is concluded.

luigi · #7

From Adversarial Policies Beat Superhuman Go AIs:

Quote:

With 2048 visits, KataGo's Latest network plays at a superhuman level. Nonetheless, our adversary still achieves a 77.6% win rate against Latest and a 72.4% win rate against Latestdef. Games against Latestdef are shown below.

Here is one of them. It doesn't seem the trivial type that has been discussed in this thread, as Black (KataGo's opponent) has no dead stones on the board at the end of the game:

dhu163 · #8

That loss looks embarrassing. The mistakes are very low level, beneath 5k in terms of what happened on the board. Presumably something strange is happening off the board. The score estimate also shouldn't be jumping by 25 every move.

Perhaps we can call this the victim going to sleep and losing interest or old age while the younger side requires targeting weak points when they aren't strong enough overall, like hacking. Or perhaps it is the other way around. They are so wise that they know the weak points in advance or perhaps hacked the training before.

Especially move 260 seems laughable. Unless one can see something higher going on.

As mentioned before, an AI's performance in situations it isn't used to can be very unexpected, showing its internal biases, if modelling as deterministic bot, even if this was fair. It seems this may be a job for a medic if they deem it worth it. Physicists and engineers would probably do better than mathematicians at understanding this.

For me, it looks suggestive of superko, two headed dragons, snakes, gato, hurricanes, switches, fuse.

Reading the other thread, it sounds like the point they make is that capturing races are hard, especially when exploiting the visual weak point in design. Namely when one small move makes such a big difference to the position and where similar shapes might be expected to be dead or alive. This seems to be a standard vulnerability to a crystal.

Go is simply too big to solve it all. Blind spots can be solved in appropriate priority order but priority only rises if attacked. Predator-prey models have been studied for some time.

kvasir · #9

White is actually losing the capturing race at move 171 though the adversary doesn't see it. There are some moves to prolong white's death, I don't think it is that much, but maybe it is enough to prevent the estimates from converging to the values quickly enough (and the difference in the outcome extreme enough). I ran analysis with a katago with different weights to that of the reported experiment and while it identified the mistake at 170 clearly enough it took it about 300k playouts to adjust the point estimate for move 171 to black winning (judged by me watching it keenly in katrain), the win rate did adjusted more quickly to black's favor but it didn't go very very high.

I suppose if you can create situations that have wrong estimates, repeatedly in the same game, that you can have a good chance to induce mistakes. Maybe it is something us normals (not top-100 European professionals, or how the paper put it) can learn to do. I imagine it as a more involved solitaire.

PC_Screen · **#10**

This new exploit has already been noted by lightvector and KataGo now implements similar positions to these (only the cyclical ones, not the passing ones) into a small percentage of its training games (roughly 0.08% of games). An example of a training game with a cyclic topology group position: https://katagotraining.org/sgfplayer/training-games/34889642/

luigi · **#11**

Isn't 185 a mistake, though? And isn't White simply alive after 192? Why does KataGo end up losing that group?

dhu163 · **#12**

Yes. IIUC, katago doesn't think in terms of concepts such as "life" but rather in terms of QM like connections between input and output, which is predominantly shape and control estimate. It prefers the route of least work until it starts losing rewards.

Maybe, as I think others have suggested, the 1d interactions between neurons is too simplistic. Consider how humans use words and concepts which are much more dimensions. This allows more fluidity while still maintaining rigidity when appropriate (god associations?) with their own security (e.g. a special unit for this cyclic topology stuff).

On the other hand, learning and organised interaction looks like a much more difficult problem in this framework. 1d number interaction worked well for a one player game (even a 2 player game like Go can be seen as a 1 player game against the God of optimal play) with a loss function and backpropagation. To go beyond that is in part to leave Go and step into real life.

edit: Perhaps there is a ghost of Feynman: can you hear the shape of a drum?
or the recent fusion progress making the news.

lightvector · **#13**

See this post a couple weeks ago:
https://forums.online-go.com/t/potentia ... hexahedron

The issue specifically is groups that wrap back to themselves, and not any other thing that you might think, as far as I know. For example, large capturing races can be hard for bots of course, and ko can also be hard, but neither of those is what's happening here.

To my best understanding all current AlphaZero-style neural nets learn one wrong algorithm or another for determining life and/or group strength. I also tested other independently trained nets such as ELF and LZ a couple years ago and they also had tons of trouble with cyclic groups. So as of a few years ago, two headed-dragon (https://senseis.xmp.net/?TwoHeadedDragon) situations were found to be a frequent misevaluation for AZ bots, just like ladders, this adversary is just the first time someone automated a bot to play for these kinds of positions.

To oversimplify a bit, I believe one algorithm a net can tend to learn is roughly equivalent to, "start anywhere on the group and walk along the group in every direction counting eyes and/or liberties until you reach the end of the group in every direction", and if the total of all the directions gives 2 eyes or lots of liberties (or at least, more eyes/liberties than neighboring groups), the group is deemed alive/strong. The net handles a small wraparound just fine (e.g. the stones that loop around a small eye), but when a group connects cyclically back to itself on a large scale, the problem is that such an algorithm never hits a dead end - it just keeps walking around the cycle over and over and therefore it double, triple, quadruple,.. counts all the liberties or eyes.

Such a naive algorithm works on 99.9%+ of the data in natural games, and among the times when it in theory doesn't work the group is often still alive anyways by chance, e.g. two headed dragons that genuinely do have enough liberties or eyes, given that many "false" eyes can now also work for life. So the net never has much pressure to learn a much more difficult algorithm that takes into account cycles.

They're also rare enough in pro games that it's hard to find enough "natural" examples to train on. But with the help of this adversary, it's much easier to generate lots of semi-random examples, so there's now an ongoing very early experiment to see if adding a tiny % of these positions to training will force the net to learn it.

lightvector · **#14**

By the way, I find it fascinating that still nobody has yet figured out how to get strong nets to understand ladders either in Go, which is also a failure to learn a beginner-level concept and even much more common. KataGo "cheats" by hardcodedly telling the net when a ladder works, but that also has downsides (e.g. the net not actually understanding it, and therefore making poor predictions or inefficient reading leading up to the ladder when it's not yet formed, or not always knowing when the ladder should truly be played out even though it doesn't work, or other nonstandard cases etc).

If the game had turned out just a little differently, if in an alternate reality it turned out there were a concept like ladders but yet a bit more common and a bit more unfixable by sheer brute force, and there not being an easy algorithm for hardcodedly telling the net, maybe bots wouldn't even be clearly superhuman yet in "normal" games. Even as it is, AI in Go is still very puzzling. ;-)

dhu163 · **#15**

This is why its so difficult for me to quit Go.

It sounds like you are talking about needing an "end if" clause to a while loop? An id, that recognises a chain as being the same as the one that first started the search?

__
On ladders. I've said before that they are the only Go pattern/shape that transmits into empty space without influence decaying.

re your point, I think ladders are special as we have 2d search space where a major component of evaluation depends on a 1d search (i.e. algorithm), which allows a significant speedup, and since the pattern is basically the same every time, only one word is required for this concept.

I think it is possible to imagine a situation where the horizon effect is still in play, so that we initially have a 1d search but after which much still depends on 2d search. Perhaps I'm thinking of a migration problem where survival depends not just on the journey but also on what there is on the other side. However, this doesn't really seem to occur in Go.

I think ladders are so hard for nets to learn because the loss from making a mistake escalate with every extra move added. So the risk is too high compared to entropy.

__
aside on proverbs. weak points attract moves so they are more valuable to attack. However, defending is more subtle as connecting might be too slow (or not even possible with 1 move), but direct defence of one chain, even with forcing moves, may be inefficient, taking too many moves to defend it, and heavy especially if there were alternative forcing moves. Instead mitigation by trying to connect to other big points seems important, and letting the opponent choose the attack side first. As with Tian Ji, don't try to win a fight where the opponent is strong and instead sabaki, i.e. move resources (which are moves in this case) elsewhere.

In general don't play for the centre unless the opponent is low because the centre needs many moves to make, so in order to get its value you will have to keep playing from there giving up territory. This is why you invade 3-3 even when you have influence and use the influence to work with reductions later.
__
more on evaluating temperature of the board. e.g. 4-4 vs 6-3. Later 3-6 defence and 4-3 follow up. Once both are settled, centre becomes like dame, expected to be shared evenly, so the focus is on what is possible on the sides. One move pushes the boundary 3+3+2 stones in, perhaps +1 from the corner, but uses up connection weak points. It prevents the opponent's profit from attack, which perhaps at least consists of forcing you to defend the stone from the side (or worse, the centre) with further follow ups which are perhaps capturing your stone in 2 moves which is perhaps an extra 5 stones with more from the centre. There is also the value of being thicker which sets up double attacks on the centre. We expect the total to come to 14 stones.

luigi · **#16**

lightvector wrote:

See this post a couple weeks ago:
https://forums.online-go.com/t/potentia ... hexahedron

The issue specifically is groups that wrap back to themselves, and not any other thing that you might think, as far as I know. For example, large capturing races can be hard for bots of course, and ko can also be hard, but neither of those is what's happening here.

To my best understanding all current AlphaZero-style neural nets learn one wrong algorithm or another for determining life and/or group strength. I also tested other independently trained nets such as ELF and LZ a couple years ago and they also had tons of trouble with cyclic groups. So as of a few years ago, two headed-dragon (https://senseis.xmp.net/?TwoHeadedDragon) situations were found to be a frequent misevaluation for AZ bots, just like ladders, this adversary is just the first time someone automated a bot to play for these kinds of positions.

To oversimplify a bit, I believe one algorithm a net can tend to learn is roughly equivalent to, "start anywhere on the group and walk along the group in every direction counting eyes and/or liberties until you reach the end of the group in every direction", and if the total of all the directions gives 2 eyes or lots of liberties (or at least, more eyes/liberties than neighboring groups), the group is deemed alive/strong. The net handles a small wraparound just fine (e.g. the stones that loop around a small eye), but when a group connects cyclically back to itself on a large scale, the problem is that such an algorithm never hits a dead end - it just keeps walking around the cycle over and over and therefore it double, triple, quadruple,.. counts all the liberties or eyes.

Such a naive algorithm works on 99.9%+ of the data in natural games, and among the times when it in theory doesn't work the group is often still alive anyways by chance, e.g. two headed dragons that genuinely do have enough liberties or eyes, given that many "false" eyes can now also work for life. So the net never has much pressure to learn a much more difficult algorithm that takes into account cycles.

They're also rare enough in pro games that it's hard to find enough "natural" examples to train on. But with the help of this adversary, it's much easier to generate lots of semi-random examples, so there's now an ongoing very early experiment to see if adding a tiny % of these positions to training will force the net to learn it.

Fascinating stuff, thanks.

luigi · **#17**

lightvector wrote:

two headed-dragon (https://senseis.xmp.net/?TwoHeadedDragon)

Someone mentioned there that these shapes are more frequent in Toroidal Go. Maybe KataGo should be trained on that Go variant as well?

Javaness2 · **#18**

The article is behind a paywall, but it seems there is a new chapter in this story
https://www.ft.com/content/175e5314-a7f ... 3219f433a1

One wonders why

sorin · **#19**

The update is that they found a way to beat KataGo after giving it 9 stones handicap.

Article: https://goattack.far.ai/pdfs/go_attack_paper.pdf
KGS games: https://gokgs.com/gameArchives.jsp?user ... 23&month=1

Example game:

kvasir · **#20**

kvasir wrote:

Being curious I checked out the public peer reviews and I don't think the paper is really on track to be accepted for ICLR, then again it is early in the review process. Apparently the authors are eager and will update the paper with new experiments, promising more interesting results.

It is a bit odd how Ars Technica says the researchers "published a paper" when it is literally only submitted and is actually getting negative reviews. Anything (almost) could happen with the paper at this point. It could be rejected or if it is accepted it could be revised considerably first and occasionally there are very good papers that don't seem to be accepted anywhere until the arxiv preprint has been cited many many times.

Maybe it's best to check on it again only after the peer review is concluded.

The paper was rejected.

Ars Technica article about go

Who is online