5-year retrospective of AI

John Fairbairn · #1

I have seen a report from Korea which looks back at five years of AI in go and suggests the following ratings. I haven't bothered to look at the methodology (but based on GoRatings, I gather), but the headlines look interesting enough to share.

AlphaGo Zero is top at 5185 (but based only on beating Alphao Lee 100-0 and AlphaGoMaster 89-11).

KataGo and Fine Art are second/third, but both above 5000.

Then (in the list - maybe not on real life) comes Sin Chin-seo at 3831. He is predicted to be the first human to breach 4000.

AlphaGo Lee is next at 3739 and ahead of AlphaGo Master at 3665 (couldn't see any reason for this reversal compared to AlphaGo Zero's results). Ke Jie is sandwiched in-between at 3728.

Of past masters, Yi Se-tol at his peak was 3583, Yi Ch'ang-ho 3569, Cho Hun-hyeon 3462 and Cho Chikun 3402.

Among current masters, as of 6 December 2021) Sin and Ke are top. Pak Cheong-hwan is 3rd (3723), Gu Zihao (who just wn the Agon Cupwinners Match, is 4th at 3665. The first Japanese player is Iyama at 3588 (9th).

No doubt there will be arguments about individual data points, and how much of a gap is a gap, but what seems indisputable is that one human rating has already surpassed AlphaGo Lee's, and others are on the verge.

jlt · #2

John Fairbairn wrote:

Then (in the list - maybe not on real life) comes Sin Chin-seo at 3831. He is predicted to be the first human to breach 4000.

AlphaGo Lee is next at 3739 and ahead of AlphaGo Master at 3665

This order Sin Chin-seo > AlphaGo Lee > AlphaGo Master is hard to believe.

pajaro · #3

AlphaGo Lee is the version that played Lee Sedol?

It sounds right to think that it has already been matched or surpassed by several humans. In that game, Lee Sedol lost 1-4. It was a shock, but now, any pro would love to at least beat once the top AI out there.

If Lee Sedol had known his opponent, perhaps he could have done a little better.

jlt · #4

Maybe Alphago Lee's rating was based on the assumption that it wins 80% of its games against a player with the same rating as Lee Sedol in 2016?

However Lee Sedol may just have been lucky to come up with a position that Alphago didn't know how to handle.

Anyway if someone estimated that Shin Jinseo is about Alphago Lee's level, it wouldn't sound absurd. But Shin = Alphago Lee + 100 = Alphago master + 160 sounds doubtful to me.

Uberdude · #5

jlt wrote:

Maybe Alphago Lee's rating was based on the assumption that it wins 80% of its games against a player with the same rating as Lee Sedol in 2016?

However Lee Sedol may just have been lucky to come up with a position that Alphago didn't know how to handle.

Both privately from DeepMind employees and in the AlphaGo documentary it is mentioned that AlphaGo Lee version did sometimes suffer the kind of delusions as game 4, around 1 in 5 or so IIRC, and rooting out the causes and fixing them was one of the key services Fan Hui provided with his Go skills. So that 4-1 result is a fair representation of its skill. Indeed if Lee Sedol had the opportunity to practice against AG beforehand, I daresay somebody of his calibre could even raise the delusion rate. He also elicited a blunder out of Handol.

boucif · #6

jlt wrote:

Maybe Alphago Lee's rating was based on the assumption that it wins 80% of its games against a player with the same rating as Lee Sedol in 2016?

However Lee Sedol may just have been lucky to come up with a position that Alphago didn't know how to handle.

Anyway if someone estimated that Shin Jinseo is about Alphago Lee's level, it wouldn't sound absurd. But Shin = Alphago Lee + 100 = Alphago master + 160 sounds doubtful to me.

Very doubtful indeed. No way any current pro would have any hope winning against Master currently. Or maybe am I missing something?

jlt · #7

In that case, a rating of Alphago Lee at 3739 looks a reasonable estimate. On the other hand, Alphago Master beat top human players 60-0. Even though the games were fast (30 seconds per move), I doubt that Shin Jinseo is stronger than that.

lightvector · #8

I am confident that AlphaGoMaster being rated so low is a typo or other mistake. It's unambiguously wrong given the (very limited) data that is published.

Google's own publications consistently put AlphaGoMaster by their internal measurements as anywhere from 300-500 Elo weaker than AlphaGoZero... but easily 1000 Elo stronger than AlphaGoLee. And by the very same numbers that John quotes above, AlphaGoMaster did better against AlphaGoZero than AlphaGoLee did against AlphaGoZero (winning 11 instead of 0). That and the game series vs humans (which it all won) are all the data we have to go on since no version of AlphaGo was ever released. Every data point in that collection puts AlphaGoMaster far above AlphaGoLee, so unless they somehow have some nonpublic info plus all of Deepmind's publications are massively misleading...

As for something that I wouldn't call indisputable but still very likely:

I'm relatively sure that AlphaGoZero is weaker than KataGo, Fine Art, Golaxy, etc. Because these all have the benefit of using AlphaGoZero's own published methods which work pretty much exactly as well as described in their paper, plus many more improvements in the last 4 years that weren't known back then, and have trained long enough at this point that compute isn't an issue either. And while researcher degrees of freedom are a thing, I think Deepmind is pretty good - they certainly aren't going to misrepresent in their official paper what algorithm they used when they claim a particular result, so to the degree that these projects replicated and then surpassed the original algorithm with major improvements (which they have), they're all going to be far stronger at this point.

AlphaGoZero's closest faithful public replication is the fantastic early-standards-setting Leela Zero (both 40 blocks), which trained for about 21 million games instead of 29 million games but otherwise matched AlphaGoZero's algorithm pretty well. Which if we assume Leela Zero roughly matched a similar training curve, would only put Leela Zero perhaps 100-300 Elo behind AlphaGoZero. But even Leela Zero had some later search improvements (notably, better puct scaling, LCB) beyond AlphaGoZero, which would close any gap a little. All the most modern bots are far stronger than Leela Zero, by much more than a mere 100-300 Elo, so this also is a second mild data point that modern bots should be past AlphaGoZero by now.

A third piece is that AlphaGoZero almost certainly didn't have any special guards against Mi Yuting's flying dagger or some ladder issues or a number of other traps that are known now that weren't known back then, which pure-zero bots may sometimes have difficulty with regardless of how much training they do. These kinds of weaknesses don't reflect in a pure Elo model well because they are nontransitive (A and B may be equal in strength to each other except A happens to prefer flying daggers and B doesn't, and then A will do far better against a "zero" bot vulnerable to it than B will), and might also cause issues in a modern playing field. Might even allow even mere humans to sometimes win vs something like AlphaGoZero with enough trials to probe for different weaknesses, again in a way that any Elo comparisons wouldn't normally capture.

The Leela Chess Zero community has had to deal with people overelevating AlphaZero-Chess too (in their case, they have even more direct evidence that our modern techniques have improved a little beyond the original AlphaZero/AlphaGoZero, because they can test against the known Stockfish version and configuration that AlphaZero itself tested against to compare results. AlphaZero is amazing already for being the foundation for all the modern best agents in these kinds of games, no need to weaken that by overreaching to claiming something that starts to strain credibility.)

Lastly, as for AlphaGo Lee being roughly matched the very best current human players:

That seems relatively plausible to me. I mean for point of intuition, Leela Zero also started winning against human pros some time back around when it was 10 blocks when run on beefy GPUs and was dominant by the point where it switched to 15, but I would also not confidently bet against human pros nowadays at doing well against some of the 10-block or early 15-block LZ nets even on good hardware.

Yay ratings quibbles...

RobertJasiek · #9

"improvements in the last 4 years that weren't known back then"

What functional improvements are these?

thirdfogie · **#10**

John, thanks very much for the technical retrospective. It is hard to believe it is
nearly 6 years since the Leamington Go Club reviewed AlphaGo Lee's second game
on a real board.

It would also be interesting to hear about Asian views on the wider effects of Go AI,
such as the prestige and income of professional players and teachers, online cheating and
whether amateurs are also improving faster.

I do use KataGo to analyse games, but it matters little in practice whether it is
15.5 or 15.75 stones stronger than me. That is not, of course, a criticism of those who
work on and study the AIs.

lightvector · **#11**

RobertJasiek wrote:

"improvements in the last 4 years that weren't known back then"
What functional improvements are these?

A variety of things, some of more important, some of them more minor but collectively still making a noticeable difference:
* Root softmax temperature for improving the long-term converge behavior of AlphaZero (otherwise it tends to become overconfident in the opening and prematurely converge)
* Better neural net architectures - adding squeeze excite or some other global pooling mechanism to the neural net (you see this also in SE nets in image processing or in things like mobilenet v3).
* Adding auxiliary terms to encourage good play beyond just win/loss, and also giving a richer training signal. This includes move-left-head in Chess, and score maximization and ownership prediction in Go.
* To use the LC0 terms for them, "policy focus" and "value focus" (KataGo is successfully using one of these, although LC0 I think is still figuring one of them out) - basically training is more efficient if you train more on positions where the net was more wrong. Similarly you can do various things with blunder correction.
* Deepmind themselves later published that you can improve the PUCT formula they had originally for the MCTS by adding a log scaling term. Leela Zero independently tested this too and found it was an improvement.
* Using LCB for move selection is also a strength boost.

There are at least twice as many improvements as just the ones I've listed here in total, but I'll stop there. As for the closed-source bots FineArt and Golaxy and maybe some other bots, we don't know all the details of what they're doing, but very likely they're doing a number of these and possibly some other improvements that aren't known publicly.

RobertJasiek · **#12**

Very interesting, thanks! Now I know at least what keywords to look up. Sounds like a lot of successful programming and testing work.

5-year retrospective of AI

Who is online