Elf OpenGo paper released

tigerman · Post by **tigerman** » Wed Feb 13, 2019 11:17 am

Thanks dfan! I found the part that you mentioned in the paper:

Ladders create a predictable sequence of moves for each player that can span the entire board. In contrast to humans, Silver et al. (2017) observes that deep RL models learn these tactics very late in training. To gain further insight into this phenomenon, we curate a dataset of 100 ladder scenarios (as described in Appendix B.2) and evaluate the model’s ladder handling abilities throughout training.

In general, ladder play improves with a fixed learning rate and degrades after the learning rate is reduced before once again recovering. While the network improves on ladder moves, it still makes mistakes even late in training. In general, increasing the number of MCTS rollouts from 1,600 to 6,400 improves ladder play, but mistakes are still made.

Bonobo · Post by **Bonobo** » Wed Feb 13, 2019 11:53 am

Just to expand on the above info (without me understanding a word of all that AI blurb

) …

Qucheng Gong in the Facebook Group “Go (Baduk, Weiqi) Players on Facebook”:

We have released ELF Opengo's all selfplay data and checkpoint models, a paper, a windows executable, as well as its analysis on all professional games ever played!
https://arxiv.org/abs/1902.04522
https://facebook.ai/developers/tools/elf-opengo

(Emphasis added by me)

Uberdude · Post by **Uberdude** » Wed Feb 13, 2019 12:34 pm

There is indeed a lot more than just that first paper!

- Home page: https://facebook.ai/developers/tools/elf-opengo
- Blog post: https://ai.facebook.com/blog/open-sourc ... -research/
- Online explorer of Elf's view on mistakes, winrate drops etc on large corpus of pro games from history from GoGoD, pick a player, metric, see pretty graphs (note it uses GoGoD's names so Lee Changho is Yi Ch'ang-ho): https://dl.fbaipublicfiles.com/elfopeng ... index.html
- Readme about their pro game analysis: https://dl.fbaipublicfiles.com/elfopeng ... sis/README
- Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD! (Hope JF is ok with this) Big caveat that it assumes 7.5 komi so win% will be off for older games, how valid move suggestions will be is debatable: https://dl.fbaipublicfiles.com/elfopeng ... _sgfs.gzip
- Windows binary (1.3GB!) to play against the real Elf OpenGo rather than LZ converted weights, we can finally see if real Elf suffers from same lack of exploration: https://dl.fbaipublicfiles.com/elfopeng ... ngo_v2.zip

jaca · Post by **jaca** » Wed Feb 13, 2019 1:52 pm

mhlepore wrote:I hope I don't get in trouble for answering in the wrong place:

not with me

we need a little light relief with all this heavy stuff about weights and whatnot that makes my brain hurt

mhlepore wrote:I believe the keima increases black's ability to take sente.

ah, yes, tx, that's what St Mike was saying.

mhlepore wrote:Which seems to be a thing now.

it's always been a thing - sente is like your car keys, lose it and you have to walk back with a heavy load on your back

mhlepore wrote:I also don't think it is a given that white is going to butt heads. I've seen O-02 played as well

yes, yes, it's all coming back to me now, except for one thing: it's O2 (or, even better, o2) - you and Billy boy keep your unnecessary Courier Old monotype dashes and leading zeros to yourselves! Before we know it, you will be insisting that all English words are 8 bits long!! Is this what they mean by New World Order?! Big Brother = Bill?!!! omg....

[edit] just had a look at the moves past 43 - it's an ugly ugly game, black seems to be playing suboptimal moves to start kos wherever he can, and some of white's moves look downright simple-minded - so how come they can beat me with one eye closed standing on one leg balancing a bucket of water on their heads?

Gomoto · Post by **Gomoto** » Wed Feb 13, 2019 2:33 pm

43? 42!

Uberdude · Post by **Uberdude** » Wed Feb 13, 2019 4:19 pm

Elfv2 converted weights in LZ do at least not want to run out the working ladder in the pro game 4 (Elfv1 does) after a few thousand playouts, though there was a brief flash of blue there in Lizzie so if you are unlucky with your choice of low playouts (and 1600 is in the region) maybe it will. Some other interesting titbits:
- Elfv2 is back to like other bots in thinking white is better on the empty board Elfv1 was unusual in thinking black was better.
- In parallel 4-4, outside approach opening Elfv2 no longer thinks keima answer is a bad -7% like v1 did (with then 3-3 invade other white 4-4).

John Fairbairn · Post by **John Fairbairn** » Thu Feb 14, 2019 12:50 pm

Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD! (Hope JF is ok with this)

It was all dealt with legitimately and above board, using the Spring 2018 edition, after I was approached by Facebook. I left that as the last stand-alone edition of the database so that it would remain in synch with Facebook's version, which does not include all of the metadata. It took rather longer than I expected (about 6 months) for the Facebook project to complete, but I will leave the Spring 2018 edition up for some time to come for those who want to acquire a matching and fully metadata-ed edition. SmartGo, as mentioned, has the true latest public (but not stand-alone) version, which is about 3% bigger already. My own version is bigger still, with some new Dosaku and Doetsu games just found!

BTW In one of my conversations with a programmer at Facebook, he said that no komi would certainly cause problems about the reliability of evaluations but he felt that for the early part of the game it was not likely to make much difference. I won't say which programmer it was in case he wants to change his mind about that.

Bill Spight · Post by **Bill Spight** » Thu Feb 14, 2019 1:49 pm

John Fairbairn wrote:BTW In one of my conversations with a programmer at Facebook, he said that no komi would certainly cause problems about the reliability of evaluations but he felt that for the early part of the game it was not likely to make much difference. I won't say which programmer it was in case he wants to change his mind about that.

Well, human players changed their minds about

with komi versus without.

bernds · Post by **bernds** » Thu Feb 14, 2019 4:57 pm

Uberdude wrote:- Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD

... and sadly, since SGF isn't sufficiently standardized, the annotations are only two numbers in the comments.

If you are on Linux, you can convert a file with the following command into something q5go can understand and produce a winrate graph for:

Code: Select all

cat inputfile.sgf | sed   's,C\[\([.0-9]\+\)$,QLZV\[\1:,'  |tr -d '\n' |sed -e 's,US\[GoGoD,FG[257:]US[GoGoD,' -e 's,QLZV.\([0-9.]\+\):\([0-9.]\+\)],\nQLZV[\2:\1],g' >outputfile.sgf

This isn't perfect, ideally you'd want to mark the variations as figures, but I don't really see a way to do this from the command line. I have some local changes to automatically mark figures and diagrams but that isn't quite ready to be pushed yet...

ez4u · Post by **ez4u** » Thu Feb 14, 2019 8:14 pm

The analysis tool is interesting. However, the readme file contains the following statement.
"... Importantly, you can see humanity's improvement in the game in 2016, when Go AIs came onto the scene and taught humans to play at a higher level. Also notice the harm that the large historical event of WWII did to the game..." [emphasis added]
This is hilarious if you look carefully at the graph. The big dip does not coincide with WWII. It is the New Fuseki Era that caused the "harm".

hyperpape · Post by **hyperpape** » Fri Feb 15, 2019 7:37 am

This suggests an interesting, albeit vague, question: is there a way to assess whether humans learned something in 2016, filtering out the "easy moves"?

What I mean by "easy moves" is that if a move appears in a fuseki that was played by AlphaGo/LZ/ELF, and I copy it, I have "played better", but who cares? Once we're out of my opening book, I may or may not continue to make the moves the AI will approve of. I think it only makes sense to say I've learned if my moves are better in cases where I'm not just copying.[0]

Filtering the "opening book" would be an easy task, but it's probably not adequate. There are local patterns that also can be copied, and fusekis that differ only minimally from one that in an opening book. What we are really after is the use of those patterns in cases that require judgment about what the patterns accomplish.

By the way, when I say "just copying", I mean that to be a pretty low bar. I don't mean to say professionals must have an elaborate theory for why a new move works. Just that there has to be that level of judgment--even if the player is saying "where would the AI play?", that has to be a question, rather than coming straight from memory.

Anyway, I think the answer is probably yes. From commentaries, I get the feeling that professionals have changed more than just rote copying of the AI moves. However, I wonder if there's a way to measure it.

[0] Well, if I'm a professional--if I'm me, we know the answer is that I won't.

Kirby · Post by **Kirby** » Fri Feb 15, 2019 11:34 am

hyperpape wrote:This suggests an interesting, albeit vague, question: is there a way to assess whether humans learned something in 2016, filtering out the "easy moves"?

The best I can think of is to measure how similar a given player is in their decision making to that of a particular version of a bot, e.g., by measuring the average and variation of changes in expected winning percentages for that player's moves. This is a heuristic, but not a definite answer, since a future version of a given bot may end up with a different idea of what's good and bad.

There are other problems, too. If a bot says that your move drops the winning percentage by 10%... What does it really mean about what you've learned? Sometimes part of learning is playing worse first, before you can play better. You can learn why the move you're experimenting with is bad, for example.

Probably still the most accurate way to track progress is to measure how often you win against a given level of opponent over time, though, that also has its problems. You may get better at winning against 5d player A, but not get better at winning against 5d player B...

Tough stuff... ¯\_(ツ)_/¯

Calvin Clark · Post by **Calvin Clark** » Fri Feb 15, 2019 12:31 pm

ez4u wrote: This is hilarious if you look carefully at the graph. The big dip does not coincide with WWII. It is the New Fuseki Era that caused the "harm".

Brilliant!

But I'm also curious what happened in 1980, where there is a spike in "bad moves", "very bad moves" ^*, etc. even into the third set of 60 moves into the game. Komi change? Or did ELF just dislike Chinese players? Some of these may be artifacts of the kinds of games that were available to collect in GoGoD at the time. I'd be interested in John's view on that phenomenon.

* This definition is tricky. First, a human probability is not the same as an AI one. Second, attempts to do this crudely in chess unfairly punish more tactical players, because go strength is not just about making fewer mistakes but also about provoking your opponent to make bigger ones. The only thing that's really a mistake is going from a winning position to losing one, but that naturally happens when some strong players take the game out into a chaotic street melee as they are wont to do. Third, as Bill Spight has pointed out, the networks are trained to win, not to evaluate.

But it's fun to have the data, so thanks to the ELF OpenGo team for sharing!

Uberdude · Post by **Uberdude** » Fri Feb 15, 2019 12:43 pm

The Elf win % drop and other metrics explorer is interesting, but there's a lot of caveats. For example, here is Elfv2's winrate from a recent tournament game of me (4d EGF) vs a 1d EGF (who used to be 4d BGA). He made me think with some tough fighting, but according to Elf I made only 1 significant mistake over 10% winrate drop (and Elf gives quite big swings), and once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.

: Simons Wall elfv2.PNG (173.48 KiB) Viewed 11180 times

For comparison, here's Iyama Yuta 9p vs Yamashita 9p's recent Kisei game winrate from Elf. Loads of mistakes from both all over the place (about 10 >10% each). I wouldn't claim this means I played better than them in my game: I had a more mismatched game against an opponent who didn't challenge me so much so they were facing more difficult positions in which to find the best move than I was, and consequently doing worse at it. Also I expect pro games will tend to be more evenly matched than a 4d vs 1d, but still the phenomenon of one player going to 99% fairly quickly (which for Elf might just be a 5 point lead) and thus no space left for winrate variations will happen.

: Iyama Yamashita elfv2.PNG (206.5 KiB) Viewed 11180 times

And to avoid the "Japanese players are weak" criticism, here's Shin Jinseo vs Gu Zihao, not as mistakes as Iyama, but still quite a few.

: Gu vs Shin elfv2.PNG (161.66 KiB) Viewed 11169 times

Uberdude · Post by **Uberdude** » Fri Feb 15, 2019 12:47 pm

bernds wrote:
Uberdude wrote:- Containing this link to a 3 GB gzip of Elf's analysis of 100k pro games from GoGoD
... and sadly, since SGF isn't sufficiently standardized, the annotations are only two numbers in the comments.

Trying to open these SGF in Lizzie makes it hang!

Life In 19x19

Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released