Elf OpenGo paper released

dfan · Post by **dfan** » Fri Feb 15, 2019 1:07 pm

Calvin Clark wrote:This definition is tricky. First, a human probability is not the same as an AI one. Second, attempts to do this crudely in chess unfairly punish more tactical players, because go strength is not just about making fewer mistakes but also about provoking your opponent to make bigger ones.

Indeed, when researchers have evaluated historical chess players by having computers rate their moves, Capablanca comes out better than expected (not that he was a slouch in the first place), because his simple style meant that he had fewer opportunities to make mistakes, compared to, say, someone like Kasparov who played in a maximal dynamic style.

Bill Spight · Post by **Bill Spight** » Fri Feb 15, 2019 1:34 pm

Uberdude wrote:once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.

That's why the log of the odds ratio is a more informative measure.

Uberdude · Post by **Uberdude** » Fri Feb 15, 2019 1:52 pm

Bill Spight wrote:
Uberdude wrote:once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.
That's why the log of the odds ratio is a more informative measure.

Even then I think the quality and precision of the bot's suggestions is reduced: if it says one move is 99.675 and another is 99.784 does can you really believe those sig figs? It just wants a safe win (0.5 points as seen from move 100 could be safe) and caa play slack moves whereas a human might want to keep pressing the advantage for a comfier margin. A better approach would be to be able to add in some dynamic komi to get the winrate back near 50% and then analyse from that modified board state. Unfortunately the Elf converted to LZ weights (at least v0 / v1) don't play nicely with the dynamic komi modified version of LZ engine.

Uberdude · Post by **Uberdude** » Fri Feb 15, 2019 3:59 pm

What is the story behind this graph? It shows the averaged biggest mistake (in win % drop) for all players over time. There probably wasn't much data back in 1700, and then we have them making bigger mistakes to around 1780, then getting better down to a trough of mistakes around 1860 to 1895. Is this seen as a golden age of Japanese go, you've got the last few years of Shusaku to start and Shuei at the end, though I presume he indivudually was a small part of the corpus. Checking the stats for just him he averaged around 24%, quite a bit lower than the trough at 27%. Then we have mistakes getting bigger again at turn of the century, is this the collapse of the Go houses? As ez4u mentioned the peak in mistakes in the 1930s is the Shin Fuseki era BEFORE WW2. And in modern times the reduction in biggest mistakes seems quite nicely correlated with reducing time limits

: Biggest mistake al players over time.PNG (36.97 KiB) Viewed 11188 times

John Fairbairn · Post by **John Fairbairn** » Sat Feb 16, 2019 2:59 am

I'm not sure the above graph tells us much. My own background in statistics is not much more than reading books like Freakonomics on long-haul flights, but I have hugely more knowledge of how the database was constructed and what it contains. Both of those things tell me to treat the graph with a great amount of caution.

Among the factors that potentially affect the results are these. This is not a complete list but does run roughly chronologically.

1. The very early games include a large number of Chinese games under ancient rules. Apart from the fact that these usually start with 4 stones on the board, which restricts the style of play somewhat, and are without komi, they also have group tax, which may be a distorting element. But the single biggest distortion is that the dates of these games are usually unknown (and even the dates of the players can be quite unknown). I therefore catalogued the games under the date, or estimated date, of the publications in which I found the games. This means, for example, that there are very many games labelled 1700. That's not when they were played.

2. Old games in general are likewise affected by being with no komi and often with handicaps. Thee handicaps include not just stones but the series type of handicaps (e.g. taking Black in 2 games out of 3). Since a series handicap was defined by current grades only initially, but could then change between those two same players (not to mention that grades were set largely on the basis of politics) and not others, there must have been many cases where the wrong handicap was used. In general, too, no komi is not just a problem with training bots but I expect it also encourages White to make wilder moves, and thus bigger potential mistakes.

3. At times such as the late Edo/Meiji period in Japan, there were fewer games because of less sponsorship and other external factors. But also there may be gaps in the record. For example, I have not got round to doing the complete games of Shuwa yet.

4. At any period with the older players, the corpus is likely to focus on the star players, via their collected games. This means many games from very early in their careers. Nowadays the proportion of games by weaker players is likely to be much less because there are just so many games by strong players to collect instead.

5. In the 1930s, as has been observed, there was a spike that can be considered to coincide with Shin Fuseki. Intuitively, I suppose we would expect many mistakes then as players started experimenting. But there may be a further factor. That period has been of special interest to me and so over the past 2-3 years I have been adding lots of games from this era. These are generally by weaker players (more mistake prone? More experimental?) and so in this period we get both more data and also data covering a much wider range of players (e.g. the Oteai B Section) than in other historical eras. There is also a trend towards the use of komi in this era, but weird ones such 2.5 points.

6. As regards the war period, there is actually very little data. Apart from disruption from bombs, and players being sent off to fight, paper was scarce and publication of games was minimal (many now known were reconstructed later from personal records of the players).

7. I don't think time limits have much to do with anything in this graph. For a start, as one example, in the days of 13 hours each, there were distortions at both ends of the scale. On the one hand many players would use the extra time to take a shower, pop out to the shops or have a snooze upstairs. At the other end of the scale, where a player did use all his time, it became apparent that this carried significant health risks, presumably making mistakes likelier, and so time limits were reduced significantly without any external pressure from sponsors or public.

8. Since I get to choose what goes in the database and I don't like games at Mickey Mouse time limits, I have at times tended to ignore these (at any era).

9. If we look at very recent times, several factors leap out at me for initial consideration. One is that the proportion of Chinese and Korean players represented has increased very significantly. Whether you accept Cho Hun-hyeon's view that this has coincided with a horde of programmatically trained clones is up to you, but I think what is beyond question is that the level of play has not just increased but differences in both strength and styles between players have become smaller, so that there are much fewer mismatches (with their potential for bigger mistakes?) than in the past.

10. I suspect there is also a flattening effect in modern times due to increases in komi. And of course Elf is trained on current komi, so is perhaps more likely, on average, to find big mistakes in games which are not at 7.5 komi? (Or even to report as mistakes moves in games at other komis which were not really wrong?)

Bill Spight · Post by **Bill Spight** » Sat Feb 16, 2019 8:41 am

Uberdude wrote:
Bill Spight wrote:
Uberdude wrote:once I had a big lead there's no room for change in winrate to reveal any subsequent big or small mistakes from him, and only huge mistakes from me will take me away from 99% win.
That's why the log of the odds ratio is a more informative measure.
Even then I think the quality and precision of the bot's suggestions is reduced: if it says one move is 99.675 and another is 99.784 does can you really believe those sig figs?

Well, I don't really believe that win rates are win rates, anyway.

I have an open mind about that, except for Leela 11. Playing around with Deep Leela, it seems to me that the win rate estimates for the player who is ahead are underestimates, as I expected. As for the stronger bots, I can't say.

But if we take the log odds ratios we get 5.73 and 6.14, respectively, for a difference of 0.41. By comparison the log odds ratio for 60% is 0.41 and the log odds ratio for 50% is, OC, 0. So the play with a win rate of 99.675% instead of 99.784% could be just as bad a mistake as a play with a win rate of 50% instead of 60%. Quien sabe?

But, as you say, surely the errors are larger as we approach 100% or 0%.

Edit: Also, the use of changes in win rate estimates between moves instead of comparison of win rate estimates for the same possible moves introduces the complication that the estimates should approach 0 or 1 as the game continues. That's probably a small factor early on, but in the endgame it could be significant.

dfan · Post by **dfan** » Sat Feb 16, 2019 9:20 am

Bill Spight wrote:That's why the log of the odds ratio is a more informative measure.

By the way, the log of the odds ratio is what these networks actually produce under the hood. As people have noted here, you don't want to have to expend lots of energy making your network produce values bounded between 0 and 1 that precisely hit targets like 0.98 or 0.99. So instead you have the network produce an unbounded value, let's call it x, and then run x through the sigmoid function 1/(1 + e^(-x)) to produce a probability p between 0 and 1. Solving for x, you end up with x = log (p/(1-p)), which is the log odds.

Bill Spight · Post by **Bill Spight** » Sat Feb 16, 2019 9:50 am

dfan wrote:
Bill Spight wrote:That's why the log of the odds ratio is a more informative measure.
By the way, the log of the odds ratio is what these networks actually produce under the hood.

Great minds think alike.

smartgo · Post by **smartgo** » Tue Feb 19, 2019 9:07 am

If you want to look at ELF’s analysis of pro games but don’t want to download gigs of data, you can now download the annotated SGF for a specific game:
https://smartgo.com/gogod.html

And · Post by **And** » Wed Feb 20, 2019 10:13 am

changed ELF OpenGo Windows binary https://dl.fbaipublicfiles.com/elfopeng ... ngo_v2.zip
I did not find the change log, but the size decreased to 1 GB and now the 43 version of Sabaki. In the previous version worked on my computer only elf_cpu_full, but compared to elf_v2 + lz it plays much weaker. The new version does not work at all. Has anyone tried ELF OpenGo Windows binary?

Bill Spight · Post by **Bill Spight** » Tue Mar 26, 2019 7:58 am

Oh, one more thing about Elf and the New Fuseki. Elf likes the New Fuseki. Yes, it found big mistakes then, but it also found a large number of matches in the first 60 moves to its own choices, peaking in the 1930s. The the number of such early matches dropped until after WWII, then rebounding gradually into the 1970s and 80s. No wonder AlphaGo seemed like the ghost of Go Seigen.

Bill Spight · Post by **Bill Spight** » Tue Mar 26, 2019 8:31 am

BTW, sports fans. I was unable to download the Elf analysis files, here: - Annotated SGFs:

s3://dl.fbaipublicfiles.com/elfopengo/analysis/data/varied_models_commentary_sgfs.gzip

Safari complains about the s3: prefix.

vier · Post by **vier** » Tue Mar 26, 2019 4:14 pm

Bill Spight wrote:I was unable to download

s3://dl.fbaipublicfiles.com/elfopengo/analysis/data/varied_models_commentary_sgfs.gzip

Safari complains about the s3: prefix.

For me replacing s3: by https: worked.

Bill Spight · Post by **Bill Spight** » Tue Mar 26, 2019 4:26 pm

vier wrote:
Bill Spight wrote:I was unable to download

s3://dl.fbaipublicfiles.com/elfopengo/analysis/data/varied_models_commentary_sgfs.gzip

Safari complains about the s3: prefix.
For me replacing s3: by https: worked.

Many thanks.

Maybe Facebook is hiring.

Bill Spight · Post by **Bill Spight** » Thu Mar 28, 2019 2:30 am

By replacing s3: with https: after unzipping I got 78 folders, each with the same games from 2011 - 2015.

Edit: Apparently the commentary on the GoGoD files are elsewhere. Attempting download now.

Edit2: Success!

Life In 19x19

Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released

Re: Elf OpenGo paper released