Bot strength

wineandgolover · #1

Has anyone found a convincing way to put AlphaGo Lee, AlphaGo Master, and AlphaGo Zero in a hierarchy with KataGo, Golaxy, and Fineart?

I remember reading last year that Golaxy's developers were very confident they and a FineArt had surpassed AlphaGo, though I’m not sure how they’d know. I guess they could review AlphaGo's self-plays and try to find positive surprises and mistakes.

How confident are we that KataGo has or has not surpassed AlphaGo?

gennan · #2

KataGo may have surpassed AlphaGo under equal conditions (millions of playouts per move). But a vast majority of KataGo users don't have the hardware to support such high number of playouts.
If we say that KataGo is stronger than AlphaGo, many may assume that KataGo on their mediocre laptop with only 1000 playouts per move is stronger than AlphaGo with millions of playouts per move and this may not be true.

Uberdude · #3

Even with only tens of thousands of playouts, I think LeelaZero and KataGo are stronger than AlphaGo Lee during its match. I say this without solid proof, but my evidence is reviewing those games and where they differ the given sequence seem convincing reasons. Also stronger versions of AlphaGo identified AGLee making mistakes e.g the joseki shock peep in game 2 was an overplay and bad if Lee resisted which both AG teaching tool and LZ agree on. Also there are similarities in the preferences of the bots as they evolved and LZ used to like hanging connection in high approach to 3-4 but doesn't anymore (because it's not sente, solid is) and AG Zero has same preference so the fact AG Lee plays it is further evidence it's weaker and not so far along the evolution path.

RobertJasiek · #4

gennan wrote:

KataGo may have surpassed AlphaGo under equal conditions (millions of playouts per move).

Roughly what hardware and how much thinking time per move do allow millions of playouts per move, IYO?

And · #5

gennan wrote:

KataGo may have surpassed AlphaGo under equal conditions (millions of playouts per move). But a vast majority of KataGo users don't have the hardware to support such high number of playouts.
If we say that KataGo is stronger than AlphaGo, many may assume that KataGo on their mediocre laptop with only 1000 playouts per move is stronger than AlphaGo with millions of playouts per move and this may not be true.

also "vast majority users don't have" the opportunity to test the AlphaGo! what do you compare with what?

jann · #6

According to DeepMind the strongest version of AlphaGo was AlphaGo Zero 40b.

It is very likely that even KataGo surpassed its strength by now (on hw parity), since AGZ worked without liberty and ladder input, which should definitely amount to a noticeable bonus (effective net size increase) when present. Not being score-blind should also give strength increase (training with aux input-output gives stronger results even when the aux part is not used later).

FineArt and Golaxy is likely even further ahead at the moment. OC, the practical question is hardware.

gennan · #7

RobertJasiek wrote:

gennan wrote:

KataGo may have surpassed AlphaGo under equal conditions (millions of playouts per move).

Roughly what hardware and how much thinking time per move do allow millions of playouts per move, IYO?

I'm no expert. I only know some anecdotes:

In August 2020 @goame reported he got roughly 100k playouts per minute with KataGo 40-block 384 channel network running on 2x RTX2080 Ti and 64 GB RAM.

In 2017 DeepMind made their AlphaGo teaching tool (an opening database) and it seems they got roughly 1M playouts per minute with AlphaGo Master running on their hardware. I don't know what that was, perhaps 4 TPUs? It must have been pretty powerful.

wineandgolover · #8

Attachment:

7594EA93-8927-4F49-9FEE-7ECE2D6BB862.jpeg [ 77.44 KiB | Viewed 7109 times ]

I see that somebody on reddit tried to answer this (not rigorously) a couple of months ago.

https://www.reddit.com/r/baduk/comments ... g_for_ais/

gennan · #9

I saw that post too, but it looks like the absolute Elo ratings used there have no relation to other go rating systems. Only the relative Elo ratings may have some meaning, but the meaning is not much more than a simple ranking IMO (ordering the list by strength).

Mike Novack · **#10**

Not only failing to report at what "number of visits" but also "real time" (time control)

It is not just equality of visits that matters (if that measure used) because the number chosen might be before a "knee" fdor one but not another.

And of course "real time" is the true/correct measure since that can change number of visits and not necessarily equally. I consider "equal real time" to be the correct measure, since go is played with time controls. If we want to compare to human players, that must be a speeds used for human go. If asking whether a program is up the strength of a top 9p that time control should be what might be used for a top pro title challenge game. Say a minute/move.

wineandgolover · **#11**

Regarding the table above, the poster did make clear it was just for fun. He did the best he could, given the lack of direct comparison. I was shocked when lightvector published his final ELO rating comparisons to prior versions and to LZ and Elf. The table incorporates these real world comparisons. Where do you think AG fits in?

I get the equal time argument, but it would be easier to defend equal time and equivalent hardware. After all, AG Lee used tons of TPU's (playouts) to overcome its relative weakness. Even after AG-Lee, Deepmind used 4TPU's which is super-fast. FineArt supposedly uses hundreds of gpu's.

IMHO, to understand the true strength of the bot, you shouldn’t handicap with time. Let FineArt use all their GPU's but let fat-Katago do so too. Or give KataGo all the time it wants.

Finally and separately, the closed bots have a big advantage. They have access to the best open bots. I remember rumors that within weeks of the recent g-170 katago release, at least one closed bot started playing a sequence it hadn’t played before, one that katago favored. It makes sense they would be strongest. Please note that I haven’t confirmed these rumors. I’d love to see some evidence!

Bot strength

Who is online