I am confident that AlphaGoMaster being rated so low is a typo or other mistake. It's unambiguously wrong given the (very limited) data that is published.
Google's own publications consistently put AlphaGoMaster by their internal measurements as anywhere from 300-500 Elo weaker than AlphaGoZero... but easily 1000 Elo stronger than AlphaGoLee. And by the very same numbers that John quotes above, AlphaGoMaster did better against AlphaGoZero than AlphaGoLee did against AlphaGoZero (winning 11 instead of 0). That and the game series vs humans (which it all won) are all the data we have to go on since no version of AlphaGo was ever released. Every data point in that collection puts AlphaGoMaster far above AlphaGoLee, so unless they somehow have some nonpublic info plus all of Deepmind's publications are massively misleading...
As for something that I wouldn't call indisputable but still very likely:
I'm relatively sure that AlphaGoZero is weaker than KataGo, Fine Art, Golaxy, etc. Because these all have the benefit of using AlphaGoZero's own published methods which work pretty much exactly as well as described in their paper, plus many more improvements in the last 4 years that weren't known back then, and have trained long enough at this point that compute isn't an issue either. And while researcher degrees of freedom are a thing, I think Deepmind is pretty good - they certainly aren't going to misrepresent in their official paper what algorithm they used when they claim a particular result, so to the degree that these projects replicated and then surpassed the original algorithm with major improvements (which they have), they're all going to be far stronger at this point.
AlphaGoZero's closest faithful public replication is the fantastic early-standards-setting Leela Zero (both 40 blocks), which trained for about 21 million games instead of 29 million games but otherwise matched AlphaGoZero's algorithm pretty well. Which if we assume Leela Zero roughly matched a similar training curve, would only put Leela Zero perhaps 100-300 Elo behind AlphaGoZero. But even Leela Zero had some later search improvements (notably, better puct scaling, LCB) beyond AlphaGoZero, which would close any gap a little. All the most modern bots are far stronger than Leela Zero, by much more than a mere 100-300 Elo, so this also is a second mild data point that modern bots should be past AlphaGoZero by now.
A third piece is that AlphaGoZero almost certainly didn't have any special guards against Mi Yuting's flying dagger or some ladder issues or a number of other traps that are known now that weren't known back then, which pure-zero bots may sometimes have difficulty with regardless of how much training they do. These kinds of weaknesses don't reflect in a pure Elo model well because they are nontransitive (A and B may be equal in strength to each other except A happens to prefer flying daggers and B doesn't, and then A will do far better against a "zero" bot vulnerable to it than B will), and might also cause issues in a modern playing field. Might even allow even mere humans to sometimes win vs something like AlphaGoZero with enough trials to probe for different weaknesses, again in a way that any Elo comparisons wouldn't normally capture.
The Leela Chess Zero community has had to deal with people overelevating AlphaZero-Chess too (in their case, they have even more direct evidence that our modern techniques have improved a little beyond the original AlphaZero/AlphaGoZero, because they can test against the known Stockfish version and configuration that AlphaZero itself tested against to compare results. AlphaZero is
amazing already for being the foundation for all the modern best agents in these kinds of games, no need to weaken that by overreaching to claiming something that starts to strain credibility.)
Lastly, as for AlphaGo Lee being roughly matched the very best current human players:
That seems relatively plausible to me. I mean for point of intuition, Leela Zero also started winning against human pros some time back around when it was 10 blocks when run on beefy GPUs and was dominant by the point where it switched to 15, but I would also not confidently bet against human pros nowadays at doing well against some of the 10-block or early 15-block LZ nets even on good hardware.
Yay ratings quibbles...
